in Pandas provide a way to summarize and analyze your data set in a spreadsheet-like format. They are incredibly versatile for aggregating, sorting, and filtering data. Let's explore how to use pivot tables in Pandas, from basic to advanced concepts.
You're not limited to predefined functions. You can define your own aggregation function.
# Custom aggregation function to find the range of salaries
def range_salary(series):
return series.max() - series.min()
pivot_custom_agg = pd.pivot_table(df,
values='Salary',
index='Department',
aggfunc=range_salary)
print(pivot_custom_agg)
Imagine we have a dataset containing sales transactions over a year for a multi-branch company. This dataset includes the transaction date, branch location, department, salesperson, and sales amount. We want to analyze monthly sales trends per branch, performance by department within each branch, and identify top-performing salespersons.
import pandas as pd
import numpy as np
# Sample data
data = {
'Date': pd.date_range(start='2023-01-01', periods=120, freq='D'),
'Branch': np.random.choice(['North', 'South', 'East', 'West'], 120),
'Department': np.random.choice(['Electronics', 'Clothing', 'Home & Garden'], 120),
'Salesperson': np.random.choice(['Alice', 'Bob', 'Charlie', 'Diane'], 120),
'Sales': np.random.randint(100, 1000, size=120)
}
df = pd.DataFrame(data)
# Ensure 'Date' is datetime type
df['Date'] = pd.to_datetime(df['Date'])
df.head()
monthly_sales = pd.pivot_table(df,
values='Sales',
index=[pd.Grouper(key='Date', freq='M'), 'Branch'],
aggfunc=np.sum)
print(monthly_sales)
department_performance = pd.pivot_table(df,
values='Sales',
index=['Branch', 'Department'],
columns=pd.Grouper(key='Date', freq='M'),
aggfunc=np.sum,
fill_value=0)
print(department_performance)
top_salespersons = pd.pivot_table(df,
values='Sales',
index=['Branch', 'Salesperson'],
aggfunc={'Sales': [np.sum, np.mean, 'count']},
fill_value=0).sort_values(by=['sum'], ascending=False)
print(top_salespersons)
# Pivot table to calculate monthly sales
monthly_growth = pd.pivot_table(df,
values='Sales',
index=pd.Grouper(key='Date', freq='M'),
aggfunc=np.sum)
# Calculate month-over-month growth rate
monthly_growth['Growth Rate'] = monthly_growth.pct_change().fillna(0)
print(monthly_growth)
# Sample DataFrames
df_patients = pd.DataFrame({
'Patient ID': [101, 102, 103],
'Name': ['Alice', 'Bob', 'Charlie'],
'DOB': ['1990-01-01', '1985-05-12', '1978-11-30'],
'Gender': ['F', 'M', 'M']
})
df_records = pd.DataFrame({
'Record ID': ['R1', 'R2', 'R3'],
'Patient ID': [101, 102, 101],
'Diagnosis': ['Flu', 'Cold', 'Allergy'],
'Treatment Date': ['2023-03-01', '2023-03-05', '2023-03-08']
})
# Merging DataFrames
df_merged_healthcare = pd.merge(df_patients, df_records, on='Patient ID')
df_merged_healthcare
These examples demonstrate the power of pivot tables in Pandas for advanced data analysis. By manipulating and aggregating data in various ways, we can extract meaningful insights about sales trends, department performance, and individual contributions within a company. The ability to customize aggregation functions and leverage hierarchical indexes allows for in-depth exploration of complex datasets.
# Python Program to find the area of triangle
a = 5
b = 6
c = 7
# Uncomment below to take inputs from the user
# a = float(input('Enter first side: '))
# b = float(input('Enter second side: '))
# c = float(input('Enter third side: '))
# calculate the semi-perimeter
s = (a + b + c) / 2
# calculate the area
area = (s*(s-a)*(s-b)*(s-c)) ** 0.5
print('The area of the triangle is %0.2f' %area)