Pivot Tables in Pandas

Pivot tables

in Pandas provide a way to summarize and analyze your data set in a spreadsheet-like format. They are incredibly versatile for aggregating, sorting, and filtering data. Let's explore how to use pivot tables in Pandas, from basic to advanced concepts.

Advanced: Custom Aggregates with aggfunc

You're not limited to predefined functions. You can define your own aggregation function.

# Custom aggregation function to find the range of salaries
def range_salary(series):
    return series.max() - series.min()

pivot_custom_agg = pd.pivot_table(df,
                                  values='Salary',
                                  index='Department',
                                  aggfunc=range_salary)
print(pivot_custom_agg)

Scenario: Sales Data Analysis

Imagine we have a dataset containing sales transactions over a year for a multi-branch company. This dataset includes the transaction date, branch location, department, salesperson, and sales amount. We want to analyze monthly sales trends per branch, performance by department within each branch, and identify top-performing salespersons.

import pandas as pd
import numpy as np

# Sample data
data = {
    'Date': pd.date_range(start='2023-01-01', periods=120, freq='D'),
    'Branch': np.random.choice(['North', 'South', 'East', 'West'], 120),
    'Department': np.random.choice(['Electronics', 'Clothing', 'Home & Garden'], 120),
    'Salesperson': np.random.choice(['Alice', 'Bob', 'Charlie', 'Diane'], 120),
    'Sales': np.random.randint(100, 1000, size=120)
}

df = pd.DataFrame(data)

# Ensure 'Date' is datetime type
df['Date'] = pd.to_datetime(df['Date'])
df.head()

Analyzing Monthly Sales Trends Per Branch

monthly_sales = pd.pivot_table(df,
                               values='Sales',
                               index=[pd.Grouper(key='Date', freq='M'), 'Branch'],
                               aggfunc=np.sum)
print(monthly_sales)

Performance by Department Within Each Branch

department_performance = pd.pivot_table(df,
                                        values='Sales',
                                        index=['Branch', 'Department'],
                                        columns=pd.Grouper(key='Date', freq='M'),
                                        aggfunc=np.sum,
                                        fill_value=0)
print(department_performance)

Identifying Top-Performing Salespersons

top_salespersons = pd.pivot_table(df,
                                   values='Sales',
                                   index=['Branch', 'Salesperson'],
                                   aggfunc={'Sales': [np.sum, np.mean, 'count']},
                                   fill_value=0).sort_values(by=['sum'], ascending=False)
print(top_salespersons)
# Pivot table to calculate monthly sales
monthly_growth = pd.pivot_table(df,
                                values='Sales',
                                index=pd.Grouper(key='Date', freq='M'),
                                aggfunc=np.sum)

# Calculate month-over-month growth rate
monthly_growth['Growth Rate'] = monthly_growth.pct_change().fillna(0)
print(monthly_growth)
# Sample DataFrames
df_patients = pd.DataFrame({
    'Patient ID': [101, 102, 103],
    'Name': ['Alice', 'Bob', 'Charlie'],
    'DOB': ['1990-01-01', '1985-05-12', '1978-11-30'],
    'Gender': ['F', 'M', 'M']
})

df_records = pd.DataFrame({
    'Record ID': ['R1', 'R2', 'R3'],
    'Patient ID': [101, 102, 101],
    'Diagnosis': ['Flu', 'Cold', 'Allergy'],
    'Treatment Date': ['2023-03-01', '2023-03-05', '2023-03-08']
})

# Merging DataFrames
df_merged_healthcare = pd.merge(df_patients, df_records, on='Patient ID')

df_merged_healthcare

Conclusion

These examples demonstrate the power of pivot tables in Pandas for advanced data analysis. By manipulating and aggregating data in various ways, we can extract meaningful insights about sales trends, department performance, and individual contributions within a company. The ability to customize aggregation functions and leverage hierarchical indexes allows for in-depth exploration of complex datasets.

Lesson Assignment
Challenge yourself with our lab assignment and put your skills to test.
# Python Program to find the area of triangle

a = 5
b = 6
c = 7

# Uncomment below to take inputs from the user
# a = float(input('Enter first side: '))
# b = float(input('Enter second side: '))
# c = float(input('Enter third side: '))

# calculate the semi-perimeter
s = (a + b + c) / 2

# calculate the area
area = (s*(s-a)*(s-b)*(s-c)) ** 0.5
print('The area of the triangle is %0.2f' %area)
Sign up to get access to our code lab and run this code.