Using Apply Function in Pandas | Apply Method in Pandas

The .apply() method in Pandas is a powerful tool that allows you to apply a function along an axis of the DataFrame or Series. This method is incredibly versatile, enabling both row-wise and column-wise operations, and can be used to apply both simple and complex functions. Understanding how to use .apply() effectively can significantly enhance your data manipulation and analysis capabilities in Pandas.

‍

Basic Usage of `.apply()`

The basic syntax of .apply() is:

DataFrame.apply(func, axis=0, args=(), **kwargs)

func: The function to apply to each column or row.
axis: Specifies the axis along which the function is applied:
- axis=0: Apply a function to each column (default).
- axis=1: Apply a function to each row.
args: Positional arguments to pass to the function.
**kwargs: Additional keyword arguments to pass to the function.

‍

Applying a Function to Each Column

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': range(1, 6),
    'B': range(10, 0, -2),
    'C': range(10, 15)
})

# Apply np.sum function to each column
print(df.apply(np.sum))

‍

Applying a Function to Each Row

# Apply np.sum function to each row
print(df.apply(np.sum, axis=1))

‍

Applying Custom Functions

.apply() becomes particularly powerful when you need to apply custom functions to your data.

‍

Example: Subtracting the Minimum from the Maximum in Each Row

def custom_range(x):
    return x.max() - x.min()

print(df.apply(custom_range, axis=1))

‍

Using `.apply()` with Additional Arguments

You can pass additional arguments and keyword arguments to the function being applied.

‍

Example: Adding a Constant Value to Each Element

def add_custom_value(x, add_value):
    return x + add_value

print(df.apply(add_custom_value, args=(5,)))

‍

`.apply()` on DataFrame vs. Series

While .apply() can be used on both DataFrames and Series, the behavior slightly differs. On a DataFrame, .apply() can work across all columns or rows. On a Series, it applies the function to each element.

‍

Example: Applying a Function to a Series

# Applying a function that increases each element by 10% on Series 'A'
print(df['A'].apply(lambda x: x * 1.1))

‍

Considerations and Alternatives

Performance: While .apply() is very flexible, it may not always be the most performant option, especially for large datasets. Vectorized operations with Pandas or NumPy functions are often faster.
Alternatives: For specific use cases (like arithmetic operations, string manipulations, etc.), Pandas provides vectorized functions and methods (like .str., .dt., etc.) that can be more efficient.
.applymap() for Element-wise Operations: For applying a function element-wise on a DataFrame, consider using .applymap() instead.

‍

Conclusion

The .apply() method is a cornerstone of Pandas' functionality, offering the flexibility to apply both predefined and custom functions across DataFrames and Series. Whether you're performing simple arithmetic operations, complex row-wise or column-wise transformations, or applying conditional logic, .apply() provides the means to execute these tasks in an intuitive and powerful manner.

‍

Case Study: Analyzing Sales Performance with Pandas `.apply()` Method

Scenario

A retail company wants to analyze its sales performance over the past year. The dataset contains sales transactions across different stores, including the date of sale, store ID, product category, and sales amount. The goal is to identify top-performing categories, adjust strategies for underperforming ones, and understand seasonal trends.

‍

Dataset Overview

The dataset, named sales_data.csv, includes the following columns:

Date: The date of the transaction.
StoreID: Identifier for the store.
Category: The category of the product sold (e.g., Electronics, Clothing, Furniture).
Amount: The sales amount in USD.

‍

Objectives

Calculate the total sales for each product category.
Determine the month with the highest sales for each category.
Identify the store with the highest sales in each category.
Analyze seasonal sales trends and identify any outliers.

‍

Analysis

‍

Step 1: Load the Data

import pandas as pd

sales_data = pd.read_csv('sales_data.csv', parse_dates=['Date'])

‍

Step 2: Total Sales by Category

total_sales_by_category = sales_data.groupby('Category')['Amount'].sum()
print(total_sales_by_category)

‍

Step 3: Month with Highest Sales for Each Category

First, extract the month from the date and create a new column.

sales_data['Month'] = sales_data['Date'].dt.month

Then, use .apply() to find the month with the highest sales for each category.

def get_top_month(group):
    return group.groupby('Month')['Amount'].sum().idxmax()

top_month_by_category = sales_data.groupby('Category').apply(get_top_month)
print(top_month_by_category)

‍

Step 4: Store with Highest Sales in Each Category

def get_top_store(group):
    return group.groupby('StoreID')['Amount'].sum().idxmax()

top_store_by_category = sales_data.groupby('Category').apply(get_top_store)
print(top_store_by_category)

‍

Step 5: Analyzing Seasonal Sales Trends

First, categorize sales data into seasons.

def categorize_season(month):
    if month in [12, 1, 2]:
        return 'Winter'
    elif month in [3, 4, 5]:
        return 'Spring'
    elif month in [6, 7, 8]:
        return 'Summer'
    else:
        return 'Fall'

sales_data['Season'] = sales_data['Month'].apply(categorize_season)

Analyze sales trends by season for each category.

seasonal_trends = sales_data.groupby(['Category', 'Season'])['Amount'].sum().unstack()
print(seasonal_trends)

‍

Step 6: Identifying Outliers

Use .apply() with a lambda function to identify sales amounts significantly higher than the category average.

def identify_outliers(row):
    category_average = sales_data[sales_data['Category'] == row['Category']]['Amount'].mean()
    return 'Outlier' if row['Amount'] > category_average * 1.5 else 'Normal'

sales_data['SalesType'] = sales_data.apply(identify_outliers, axis=1)
print(sales_data[['Date', 'Category', 'Amount', 'SalesType']])

`‍`

Conclusion

This analysis provided valuable insights into the sales performance of different product categories, highlighting top-performing months and stores for each category. Additionally, the seasonal trends analysis helped identify key periods for sales, while outlier detection pointed out transactions that significantly deviated from the norm. This information can guide strategic decisions to boost sales and improve inventory management.

‍

import pandas as pd
import numpy as np

# Creating example data
np.random.seed(0)
dates = pd.date_range('2023-01-01', periods=120, freq='D')
data = {
    'Date': dates,
    'StoreID': np.random.choice(['Store1', 'Store2', 'Store3'], size=120),
    'Category': np.random.choice(['Electronics', 'Clothing', 'Furniture'], size=120),
    'Amount': np.random.randint(100, 2000, size=120)
}
sales_data = pd.DataFrame(data)

# Calculate total sales by category
total_sales_by_category = sales_data.groupby('Category')['Amount'].sum()

# Extract month from date
sales_data['Month'] = sales_data['Date'].dt.month

# Define function to get top month for each category
def get_top_month(group):
    return group.groupby('Month')['Amount'].sum().idxmax()

# Apply function to get top month by category
top_month_by_category = sales_data.groupby('Category').apply(get_top_month)

# Define function to get top store for each category
def get_top_store(group):
    return group.groupby('StoreID')['Amount'].sum().idxmax()

# Apply function to get top store by category
top_store_by_category = sales_data.groupby('Category').apply(get_top_store)

# Categorize sales data into seasons and analyze trends
def categorize_season(month):
    if month in [12, 1, 2]:
        return 'Winter'
    elif month in [3, 4, 5]:
        return 'Spring'
    elif month in [6, 7, 8]:
        return 'Summer'
    else:
        return 'Fall'

sales_data['Season'] = sales_data['Month'].apply(categorize_season)
seasonal_trends = sales_data.groupby(['Category', 'Season'])['Amount'].sum().unstack()

# Identify outliers
def identify_outliers(row):
    category_average = sales_data[sales_data['Category'] == row['Category']]['Amount'].mean()
    return 'Outlier' if row['Amount'] > category_average * 1.5 else 'Normal'

sales_data['SalesType'] = sales_data.apply(identify_outliers, axis=1)

# Display results
total_sales_by_category, top_month_by_category, top_store_by_category, seasonal_trends, sales_data.head()

OUTPUT:

(Category
Clothing       40430
Electronics    39073
Furniture      35911
Name: Amount, dtype:int64,
Category
Clothing       4
Electronics    2
Furniture      4
dtype: int64,
Category
Clothing       Store3
Electronics    Store1
Furniture      Store3
dtype: object,
Season       Spring Winter
Category                   
Clothing      23599  16831
Electronics   15953  23120
Furniture     19377  16534,
        Date StoreID     Category Amount  Month  Season SalesType
0 2023-01-01  Store1    Clothing    1212      1 Winter    Normal
1 2023-01-02  Store2   Furniture     133      1 Winter    Normal
2 2023-01-03  Store1    Clothing     745      1 Winter    Normal
3 2023-01-04  Store2 Electronics     332      1 Winter    Normal
4 2023-01-05  Store2   Furniture     867      1 Winter    Normal)
The analysis using the example data produced the following insights:

‍

Total Sales by Category

Clothing: $40,430

Electronics: $39,073

Furniture: $35,911

‍

Top Month by Category

Clothing and Furniture: April (Month 4)

Electronics: February (Month 2)

‍

Top Store by Category

Clothing and Furniture: Store3

Electronics: Store1

‍

Seasonal Trends

Spring saw the highest sales for Clothing (23,599)andFurniture(19,377), while Winter was the top season for Electronics ($23,120

Lesson Assignment

Challenge yourself with our lab assignment and put your skills to test.

# Python Program to find the area of triangle

a = 5
b = 6
c = 7

# Uncomment below to take inputs from the user
# a = float(input('Enter first side: '))
# b = float(input('Enter second side: '))
# c = float(input('Enter third side: '))

# calculate the semi-perimeter
s = (a + b + c) / 2

# calculate the area
area = (s*(s-a)*(s-b)*(s-c)) ** 0.5
print('The area of the triangle is %0.2f' %area)

Try Now for Free

Understanding the Apply Method in Pandas

Sign Up For Free

Basic Usage of `.apply()`

Applying a Function to Each Column

Applying a Function to Each Row

Applying Custom Functions

Example: Subtracting the Minimum from the Maximum in Each Row

Using `.apply()` with Additional Arguments

Example: Adding a Constant Value to Each Element

`.apply()` on DataFrame vs. Series

Example: Applying a Function to a Series

Considerations and Alternatives

Conclusion

Case Study: Analyzing Sales Performance with Pandas `.apply()` Method

Scenario

Dataset Overview

Objectives

Analysis

Step 1: Load the Data

Step 2: Total Sales by Category

Step 3: Month with Highest Sales for Each Category

Step 4: Store with Highest Sales in Each Category

Step 5: Analyzing Seasonal Sales Trends

Step 6: Identifying Outliers

`‍`

Conclusion

Total Sales by Category

Top Month by Category

Top Store by Category

Seasonal Trends

Related Courses and Paths

Full Stack Data Analytics Career Path

Full Stack Generative AI Career Path- Beginners

Full Stack AWS Cloud Computing Career Path

AI Assistant For Help

Flexible Mobile Coding

Project Development Support

On-Demand Documentation

Ready to become a Data Scientist that industry loves to hire? Apply Now.

Understanding the Apply Method in Pandas

Sign Up For Free

Basic Usage of .apply()

Applying a Function to Each Column

Applying a Function to Each Row

Applying Custom Functions

Example: Subtracting the Minimum from the Maximum in Each Row

Using .apply() with Additional Arguments

Example: Adding a Constant Value to Each Element

.apply() on DataFrame vs. Series

Example: Applying a Function to a Series

Considerations and Alternatives

Conclusion

Case Study: Analyzing Sales Performance with Pandas .apply() Method

Scenario

Dataset Overview

Objectives

Analysis

Step 1: Load the Data

Step 2: Total Sales by Category

Step 3: Month with Highest Sales for Each Category

Step 4: Store with Highest Sales in Each Category

Step 5: Analyzing Seasonal Sales Trends

Step 6: Identifying Outliers

‍

Conclusion

Total Sales by Category

Top Month by Category

Top Store by Category

Seasonal Trends

Related Courses and Paths

Full Stack Data Analytics Career Path

Full Stack Generative AI Career Path- Beginners

Full Stack AWS Cloud Computing Career Path

AI Assistant For Help

Flexible Mobile Coding

Project Development Support

On-Demand Documentation

Ready to become a Data Scientist that industry loves to hire? Apply Now.

Basic Usage of `.apply()`

Using `.apply()` with Additional Arguments

`.apply()` on DataFrame vs. Series

Case Study: Analyzing Sales Performance with Pandas `.apply()` Method

`‍`