Pivot Tables in Pandas
Pivot tables
in Pandas provide a way to summarize and analyze your data set in a spreadsheet-like format. They are incredibly versatile for aggregating, sorting, and filtering data. Let's explore how to use pivot tables in Pandas, from basic to advanced concepts.
Advanced: Custom Aggregates with aggfunc
You're not limited to predefined functions. You can define your own aggregation function.
# Custom aggregation function to find the range of salaries
def range_salary(series):
return series.max() - series.min()
pivot_custom_agg = pd.pivot_table(df,
values='Salary',
index='Department',
aggfunc=range_salary)
print(pivot_custom_agg)
Scenario: Sales Data Analysis
Imagine we have a dataset containing sales transactions over a year for a multi-branch company. This dataset includes the transaction date, branch location, department, salesperson, and sales amount. We want to analyze monthly sales trends per branch, performance by department within each branch, and identify top-performing salespersons.
import pandas as pd
import numpy as np
# Sample data
data = {
'Date': pd.date_range(start='2023-01-01', periods=120, freq='D'),
'Branch': np.random.choice(['North', 'South', 'East', 'West'], 120),
'Department': np.random.choice(['Electronics', 'Clothing', 'Home & Garden'], 120),
'Salesperson': np.random.choice(['Alice', 'Bob', 'Charlie', 'Diane'], 120),
'Sales': np.random.randint(100, 1000, size=120)
}
df = pd.DataFrame(data)
# Ensure 'Date' is datetime type
df['Date'] = pd.to_datetime(df['Date'])
df.head()
Analyzing Monthly Sales Trends Per Branch
monthly_sales = pd.pivot_table(df,
values='Sales',
index=[pd.Grouper(key='Date', freq='M'), 'Branch'],
aggfunc=np.sum)
print(monthly_sales)
Performance by Department Within Each Branch
department_performance = pd.pivot_table(df,
values='Sales',
index=['Branch', 'Department'],
columns=pd.Grouper(key='Date', freq='M'),
aggfunc=np.sum,
fill_value=0)
print(department_performance)
Identifying Top-Performing Salespersons
top_salespersons = pd.pivot_table(df,
values='Sales',
index=['Branch', 'Salesperson'],
aggfunc={'Sales': [np.sum, np.mean, 'count']},
fill_value=0).sort_values(by=['sum'], ascending=False)
print(top_salespersons)
# Pivot table to calculate monthly sales
monthly_growth = pd.pivot_table(df,
values='Sales',
index=pd.Grouper(key='Date', freq='M'),
aggfunc=np.sum)
# Calculate month-over-month growth rate
monthly_growth['Growth Rate'] = monthly_growth.pct_change().fillna(0)
print(monthly_growth)
# Sample DataFrames
df_patients = pd.DataFrame({
'Patient ID': [101, 102, 103],
'Name': ['Alice', 'Bob', 'Charlie'],
'DOB': ['1990-01-01', '1985-05-12', '1978-11-30'],
'Gender': ['F', 'M', 'M']
})
df_records = pd.DataFrame({
'Record ID': ['R1', 'R2', 'R3'],
'Patient ID': [101, 102, 101],
'Diagnosis': ['Flu', 'Cold', 'Allergy'],
'Treatment Date': ['2023-03-01', '2023-03-05', '2023-03-08']
})
# Merging DataFrames
df_merged_healthcare = pd.merge(df_patients, df_records, on='Patient ID')
df_merged_healthcare
Conclusion
These examples demonstrate the power of pivot tables in Pandas for advanced data analysis. By manipulating and aggregating data in various ways, we can extract meaningful insights about sales trends, department performance, and individual contributions within a company. The ability to customize aggregation functions and leverage hierarchical indexes allows for in-depth exploration of complex datasets.
# Python Program to find the area of triangle
a = 5
b = 6
c = 7
# Uncomment below to take inputs from the user
# a = float(input('Enter first side: '))
# b = float(input('Enter second side: '))
# c = float(input('Enter third side: '))
# calculate the semi-perimeter
s = (a + b + c) / 2
# calculate the area
area = (s*(s-a)*(s-b)*(s-c)) ** 0.5
print('The area of the triangle is %0.2f' %area)