Pivot Tables in Pandas | Create and Use Panda Pivot Tables

Creating and Using Pivot Tables in Pandas

Pandas provide a way to summarize and analyze your data set in a spreadsheet-like format. They are incredibly versatile for aggregating, sorting, and filtering data. Let's explore how to use pivot tables in Pandas, from basic to advanced concepts.

20 Videos

No Coding Experience Required

45 Assignments

Self Paced

Pivot tables

‍

in Pandas provide a way to summarize and analyze your data set in a spreadsheet-like format. They are incredibly versatile for aggregating, sorting, and filtering data. Let's explore how to use pivot tables in Pandas, from basic to advanced concepts.

‍

Advanced: Custom Aggregates with aggfunc

You're not limited to predefined functions. You can define your own aggregation function.

‍

# Custom aggregation function to find the range of salaries
def range_salary(series):
    return series.max() - series.min()

pivot_custom_agg = pd.pivot_table(df,
                                  values='Salary',
                                  index='Department',
                                  aggfunc=range_salary)
print(pivot_custom_agg)

‍

Scenario: Sales Data Analysis

Imagine we have a dataset containing sales transactions over a year for a multi-branch company. This dataset includes the transaction date, branch location, department, salesperson, and sales amount. We want to analyze monthly sales trends per branch, performance by department within each branch, and identify top-performing salespersons.

import pandas as pd
import numpy as np

# Sample data
data = {
    'Date': pd.date_range(start='2023-01-01', periods=120, freq='D'),
    'Branch': np.random.choice(['North', 'South', 'East', 'West'], 120),
    'Department': np.random.choice(['Electronics', 'Clothing', 'Home & Garden'], 120),
    'Salesperson': np.random.choice(['Alice', 'Bob', 'Charlie', 'Diane'], 120),
    'Sales': np.random.randint(100, 1000, size=120)
}

df = pd.DataFrame(data)

# Ensure 'Date' is datetime type
df['Date'] = pd.to_datetime(df['Date'])
df.head()

‍

Analyzing Monthly Sales Trends Per Branch

monthly_sales = pd.pivot_table(df,
                               values='Sales',
                               index=[pd.Grouper(key='Date', freq='M'), 'Branch'],
                               aggfunc=np.sum)
print(monthly_sales)

‍

Performance by Department Within Each Branch

department_performance = pd.pivot_table(df,
                                        values='Sales',
                                        index=['Branch', 'Department'],
                                        columns=pd.Grouper(key='Date', freq='M'),
                                        aggfunc=np.sum,
                                        fill_value=0)
print(department_performance)

‍

Identifying Top-Performing Salespersons

top_salespersons = pd.pivot_table(df,
                                   values='Sales',
                                   index=['Branch', 'Salesperson'],
                                   aggfunc={'Sales': [np.sum, np.mean, 'count']},
                                   fill_value=0).sort_values(by=['sum'], ascending=False)
print(top_salespersons)

# Pivot table to calculate monthly sales
monthly_growth = pd.pivot_table(df,
                                values='Sales',
                                index=pd.Grouper(key='Date', freq='M'),
                                aggfunc=np.sum)

# Calculate month-over-month growth rate
monthly_growth['Growth Rate'] = monthly_growth.pct_change().fillna(0)
print(monthly_growth)

# Sample DataFrames
df_patients = pd.DataFrame({
    'Patient ID': [101, 102, 103],
    'Name': ['Alice', 'Bob', 'Charlie'],
    'DOB': ['1990-01-01', '1985-05-12', '1978-11-30'],
    'Gender': ['F', 'M', 'M']
})

df_records = pd.DataFrame({
    'Record ID': ['R1', 'R2', 'R3'],
    'Patient ID': [101, 102, 101],
    'Diagnosis': ['Flu', 'Cold', 'Allergy'],
    'Treatment Date': ['2023-03-01', '2023-03-05', '2023-03-08']
})

# Merging DataFrames
df_merged_healthcare = pd.merge(df_patients, df_records, on='Patient ID')

df_merged_healthcare

‍

Conclusion

These examples demonstrate the power of pivot tables in Pandas for advanced data analysis. By manipulating and aggregating data in various ways, we can extract meaningful insights about sales trends, department performance, and individual contributions within a company. The ability to customize aggregation functions and leverage hierarchical indexes allows for in-depth exploration of complex datasets.

Lesson Assignment

Challenge yourself with our lab assignment and put your skills to test.

# Python Program to find the area of triangle

a = 5
b = 6
c = 7

# Uncomment below to take inputs from the user
# a = float(input('Enter first side: '))
# b = float(input('Enter second side: '))
# c = float(input('Enter third side: '))

# calculate the semi-perimeter
s = (a + b + c) / 2

# calculate the area
area = (s*(s-a)*(s-b)*(s-c)) ** 0.5
print('The area of the triangle is %0.2f' %area)

Try Now for Free

Creating and Using Pivot Tables in Pandas

Sign Up For Free

Pivot tables

Advanced: Custom Aggregates with aggfunc

Scenario: Sales Data Analysis

Analyzing Monthly Sales Trends Per Branch

Performance by Department Within Each Branch

Identifying Top-Performing Salespersons

Conclusion

Related Courses and Paths

Full Stack Data Analytics Career Path

Full Stack Generative AI Career Path- Beginners

Full Stack AWS Cloud Computing Career Path

AI Assistant For Help

Flexible Mobile Coding

Project Development Support

On-Demand Documentation

Ready to become a Data Scientist that industry loves to hire? Apply Now.