Seaborn is a powerful and versatile Python library that is specifically designed for creating statistical graphics in Python. It is built on top of Matplotlib, another highly popular Python visualization library, and integrates closely with pandas data structures, making it an essential tool for data analysis and exploration. Here's an introduction to Seaborn and its role in the Python visualization ecosystem.
Seaborn simplifies the process of creating beautiful and informative statistical graphics. Here are some key reasons why it has become a go-to library for data scientists and analysts:
Understanding Seaborn's data structures is key to effectively using the library for data visualization. Seaborn is designed to work well with pandas DataFrames, which are the most common data structure used for storing and manipulating tabular data in Python. By leveraging pandas DataFrames, Seaborn allows for efficient and intuitive plotting of data.
A pandas DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns). DataFrames are ideal for representing real data in Python, especially when dealing with complex datasets that include various data types.
Seaborn's functions are optimized to work with DataFrames, providing a seamless experience for data visualization:
Seaborn distinguishes between two types of data structures for plotting: long-form (or "tidy") data and wide-form data.
sns.relplot
, sns.catplot
, and sns.lmplot
are designed to work intuitively with tidy data, allowing you to easily map variables to different aspects of a plot (like the x and y axes, hues, sizes, and styles).When working with Seaborn, the general workflow involves:
By understanding and effectively utilizing pandas DataFrames in conjunction with Seaborn's plotting capabilities, you can create insightful and beautiful statistical visualizations with relatively little code.
Double-click (or enter) to edit
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Set the aesthetic style of the plots
sns.set_theme()
# Sample DataFrame
data = pd.DataFrame({
'Year': [2010, 2011, 2012, 2013, 2014],
'Sales': [12, 17, 22, 29, 37]
})
# Create a line plot
sns.lineplot(data=data, x='Year', y='Sales')
plt.show()
# Sample DataFrame
data = pd.DataFrame({
'Product': ['A', 'B', 'C', 'D'],
'Sales': [23, 17, 35, 29]
})
# Create a bar chart
sns.barplot(data=data, x='Product', y='Sales')
plt.show()
# Sample data
data = pd.DataFrame({
'Age': [22, 55, 62, 45, 21, 22, 34, 42, 42, 4, 2, 102, 95, 85, 55, 110, 120]
})
# Create a histogram
sns.histplot(data=data, x='Age', bins=10, kde=True)
plt.show()
Seaborn excels at creating beautiful, informative statistical graphics in Python with minimal code. However, the true power of Seaborn lies in its extensive customization capabilities, allowing you to tailor your plots for various contexts and audiences. This lesson will guide you through customizing Seaborn plots, focusing on aesthetics, labels, and themes.
Seaborn provides several built-in themes and styles to quickly change the appearance of plots. These styles can be applied globally using the sns.set_style()
function.
Example: Changing the Plot Style
sns.set_style("whitegrid") # Options include: "dark", "white", "darkgrid", "whitegrid", "ticks"
Adding informative labels and titles is crucial for making your plots understandable. Seaborn integrates with Matplotlib, allowing you to use Matplotlib's labeling functions to add and customize labels and titles.
Example: Adding Titles and Labels
ax = sns.barplot(x="day", y="total_bill", data=tips)
ax.set_title("Total Bill by Day")
ax.set_xlabel("Day of the Week")
ax.set_ylabel("Average Total Bill")
For more advanced customizations, you can directly use Matplotlib's functions. This is particularly useful for adjusting figure sizes, adding text, or fine-tuning the layout.
Example: Adjusting Figure Size and Adding Annotations
plt.figure(figsize=(10, 6)) # Adjust the figure size
ax = sns.lineplot(x="time", y="signal", data=data)
ax.set_title("Signal over Time")
ax.annotate("Peak", xy=(5, 10), xytext=(3, 15),
arrowprops=dict(facecolor='black', shrink=0.05))
Seaborn's set_theme()
function allows you to customize the appearance of your plots globally. This includes setting the color palette, font scale, and the aforementioned styles for a consistent look across all your plots.
Example: Setting a Theme
sns.set_theme(style="darkgrid", palette="muted", font_scale=1.2)
Choosing the right color palette can enhance the readability and aesthetic appeal of your plot. Seaborn offers a variety of color palettes, which can be set globally or specified for individual plots.
Example: Using Color Palettes
sns.set_palette("pastel")
# Specify palette for a single plot
sns.barplot(x="day", y="total_bill", data=tips, palette="Blues_d")
Customizing your Seaborn plots is straightforward yet powerful, with a range of options from simple style changes to detailed aesthetic adjustments. By combining Seaborn's statistical plotting capabilities with Matplotlib's customization features, you can create visually appealing, informative visualizations that effectively communicate your data's story. Experiment with different styles, themes, and customizations to discover the best ways to present your unique data insights.
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Sample data
data = pd.DataFrame({
'Month': ['January', 'February', 'March', 'April', 'May', 'June'],
'Sales': [200, 220, 250, 275, 300, 320]
})
# Set the aesthetic style
sns.set_theme(style="darkgrid")
# Create the line chart
plt.figure(figsize=(10, 6))
line_chart = sns.lineplot(data=data, x='Month', y='Sales', marker='o', color='blue', linewidth=2.5)
# Customizing the plot
line_chart.set_title('Monthly Sales', fontsize=16)
line_chart.set_xlabel('Month', fontsize=12)
line_chart.set_ylabel('Sales', fontsize=12)
line_chart.set_xticklabels(data['Month'], rotation=45)
plt.show()
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# Sample data
np.random.seed(10)
data = pd.DataFrame({
'City': ['City A', 'City B', 'City C', 'City D', 'City E', 'City F', 'City G', 'City H'],
'Temperature': np.random.randint(20, 35, size=8),
'Humidity': np.random.randint(40, 80, size=8),
'Pollution Index': np.random.randint(1, 100, size=8)
})
highlight = 'City C' # City to highlight
# Set the theme
sns.set_theme(style="whitegrid", palette="muted")
# Create the scatter plot
plt.figure(figsize=(10, 6))
scatter = sns.scatterplot(data=data, x='Temperature', y='Humidity',
size='Pollution Index', sizes=(50, 200),
hue='Pollution Index', style='City',
palette='coolwarm', legend="full")
# Customizing the legend
scatter.legend(title='Pollution Index',bbox_to_anchor=(1.05, 1), loc=2)
# Adding annotations
for i in range(data.shape[0]):
if data.iloc[i]['City'] == highlight:
plt.text(x=data.iloc[i]['Temperature']+0.5,
y=data.iloc[i]['Humidity'],
s=highlight,
fontdict=dict(color='red',size=10),
bbox=dict(facecolor='yellow',alpha=0.5))
# Adding titles and labels
plt.title('City Climate Characteristics', fontsize=16)
plt.xlabel('Average Temperature (°C)', fontsize=12)
plt.ylabel('Average Humidity (%)', fontsize=12)
# Show the plot
plt.show()
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Sample data
data = pd.DataFrame({
'Product': ['Product A', 'Product B', 'Product C', 'Product D'] * 2,
'Region': ['North', 'North', 'North', 'North', 'South', 'South', 'South', 'South'],
'Sales': [123, 432, 234, 321, 143, 423, 223, 312]
})
# Set the aesthetic style of the plots
sns.set_theme(style="whitegrid")
# Create the bar chart
plt.figure(figsize=(10, 6))
bar_chart = sns.barplot(data=data, x='Product', y='Sales', hue='Region', palette='viridis')
# Customizing the legend
bar_chart.legend(title='Region', bbox_to_anchor=(1.05, 1), loc=2)
# Adding annotations for each bar
for p in bar_chart.patches:
bar_chart.annotate(format(p.get_height(), '.1f'),
(p.get_x() + p.get_width() / 2., p.get_height()),
ha = 'center', va = 'center',
xytext = (0, 9),
textcoords = 'offset points')
# Adding titles and labels
plt.title('Sales by Product and Region', fontsize=16)
plt.xlabel('Product', fontsize=12)
plt.ylabel('Sales', fontsize=12)
# Show the plot
plt.tight_layout() # Adjusts the plot to ensure everything fits without overlap
plt.show()
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Sample data
data = pd.DataFrame({
'Math': [88, 92, 80, 89, 100, 80, 60, 70, 55, 77, 88, 82],
'Science': [72, 95, 78, 76, 88, 82, 92, 89, 95, 80, 82, 85],
'English': [90, 85, 88, 95, 60, 78, 82, 80, 80, 85, 90, 92]
})
# Melting the DataFrame to work well with sns.boxplot
data_melted = data.melt(var_name='Subject', value_name='Score')
# Set the aesthetic style of the plots
sns.set_style("whitegrid")
# Create the box plot
plt.figure(figsize=(10, 6))
box_plot = sns.boxplot(x='Subject', y='Score', data=data_melted, palette='Set2')
# Adding titles and labels
plt.title('Distribution of Student Scores by Subject', fontsize=16)
plt.xlabel('Subject', fontsize=12)
plt.ylabel('Scores', fontsize=12)
# Identifying outliers and annotating
for patch in box_plot.artists:
r, g, b, a = patch.get_facecolor()
patch.set_facecolor((r, g, b, .3)) # Making boxes translucent to highlight outliers
outliers = [(subject, score) for subject in data for score in data[subject] if score < (data[subject].quantile(0.25) - 1.5 * (data[subject].quantile(0.75) - data[subject].quantile(0.25))) or score > (data[subject].quantile(0.75) + 1.5 * (data[subject].quantile(0.75) - data[subject].quantile(0.25)))]
for outlier in outliers:
plt.text(x=outlier[0], y=outlier[1], s=f"{outlier[1]}", color='red')
plt.show()
They are particularly useful for comparing the distribution of data across categories, as they show both the summary statistics (similar to box plots) and the probability density (similar to KDE plots). This makes them an excellent choice for visualizing and comparing distributions that may have multiple peaks or varying spreads.
# Sample data
data = pd.DataFrame({
'Score': [88, 72, 90, 95, 78, 85, 82, 92, 88, 76, 88, 82, 100, 60, 70, 95, 80, 60, 78, 82],
'Subject': ['Math', 'Science', 'English', 'Math', 'Science', 'English', 'Math', 'Science', 'English', 'Math', 'Math', 'Science', 'English', 'Math', 'Science', 'English', 'Math', 'Science', 'English', 'Math'],
'Level': ['Advanced', 'Advanced', 'Advanced', 'Intermediate', 'Intermediate', 'Intermediate', 'Beginner', 'Beginner', 'Beginner', 'Advanced', 'Intermediate', 'Beginner', 'Advanced', 'Intermediate', 'Advanced', 'Beginner', 'Beginner', 'Intermediate', 'Advanced', 'Intermediate']
})
# Set the aesthetic style of the plots
sns.set_style("whitegrid")
# Create the violin plot
plt.figure(figsize=(12, 6))
violin = sns.violinplot(x='Subject', y='Score', data=data, palette='muted')
# Customizing the plot
plt.title('Score Distribution by Subject', fontsize=16)
plt.xlabel('Subject', fontsize=12)
plt.ylabel('Score', fontsize=12)
#plt.legend(title='Student Level', loc='upper left')
# Show the plot
plt.show()
sns.set_style("whitegrid")
# Create the violin plot
plt.figure(figsize=(12, 6))
violin = sns.violinplot(x='Subject', y='Score', data=data, palette='muted',split=True)
# Customizing the plot
plt.title('Score Distribution by Subject', fontsize=16)
plt.xlabel('Subject', fontsize=12)
plt.ylabel('Score', fontsize=12)
#plt.legend(title='Student Level', loc='upper left')
# Show the plot
plt.show()
The sns.regplot function in Seaborn is used to plot data and a linear regression model fit. It's a great tool for exploring the relationship between two variables, providing a quick and easy way to visualize whether there's a linear relationship and how strong that relationship might be. Here's a basic example to demonstrate how to use sns.regplot to create a scatter plot with a linear regression line.
# Generating sample data
np.random.seed(0)
hours_studied = np.random.rand(100) * 5 # Random data for hours studied
exam_scores = 50 + (hours_studied * 10) + np.random.normal(0, 5, 100) # Scores with added noise
# Creating a DataFrame
data = pd.DataFrame({'Hours Studied': hours_studied, 'Exam Score': exam_scores})
# Set the aesthetic style of the plots
sns.set_theme(style="whitegrid")
# Create the regression plot
plt.figure(figsize=(10, 6))
sns.regplot(x='Hours Studied', y='Exam Score', data=data, scatter_kws={'color': 'blue'}, line_kws={'color': 'red'})
# Customizing the plot
plt.title('Relationship Between Hours Studied and Exam Score', fontsize=16)
plt.xlabel('Hours Studied', fontsize=12)
plt.ylabel('Exam Score', fontsize=12)
# Show the plot
plt.show()
The sns.catplot function in Seaborn is a versatile tool for visualizing the distribution of values within categorical data. It can create several types of plots, including box plots, violin plots, bar plots, and more, making it highly useful for comparing levels within a categorical variable or across several categories. Here's a basic example to demonstrate how to use sns.catplot to create a categorical plot that can show, for instance, the distribution of exam scores across different exam subjects.
sns.catplot is powerful for its versatility and ability to visualize complex relationships within categorical data. By adjusting its parameters and the plot type (kind), you can tailor the visualization to your specific analysis needs, whether you're exploring distributions, comparing groups, or highlighting trends within categories.
# Sample data
data = pd.DataFrame({
'Subject': ['Math', 'Science', 'English', 'Math', 'Science', 'English', 'Math', 'Science', 'English'],
'Score': [88, 72, 90, 95, 78, 85, 82, 92, 88],
'Gender': ['Female', 'Male', 'Female', 'Male', 'Female', 'Male', 'Female', 'Male', 'Female']
})
# Set the aesthetic style of the plots
sns.set_theme(style="whitegrid")
# Create the categorical plot
g = sns.catplot(x='Subject', y='Score', hue='Gender', data=data, kind='box', height=5, aspect=1.5, palette='pastel')
# Customizing the plot
g.fig.suptitle('Exam Scores by Subject and Gender', fontsize=16)
g.set_axis_labels("Subject", "Score")
g.legend.set_title("Gender")
# Adjust the title position
plt.subplots_adjust(top=0.92)
# Show the plot
plt.show()
g = sns.catplot(x='Subject', y='Score', hue='Gender', data=data, kind='violin', height=5, aspect=1.5, palette='pastel')
# Customizing the plot
g.fig.suptitle('Exam Scores by Subject and Gender', fontsize=16)
g.set_axis_labels("Subject", "Score")
g.legend.set_title("Gender")
# Adjust the title position
plt.subplots_adjust(top=0.92)
# Show the plot
plt.show()
# Generating sample data
data = np.array([[5, 10, 15], [20, 25, 30], [35, 40, 45]])
rows = ['Row 1', 'Row 2', 'Row 3']
columns = ['Column 1', 'Column 2', 'Column 3']
# Creating a DataFrame to label rows and columns for the heatmap
df = pd.DataFrame(data, index=rows, columns=columns)
# Creating the heatmap
plt.figure(figsize=(10, 8))
heatmap = sns.heatmap(df, annot=True, cmap='coolwarm', linewidths=.5, cbar_kws={'shrink': .5})
# Customizing the plot
plt.title('Sample Heatmap', fontsize=20)
plt.xticks(rotation=45)
plt.yticks(rotation=45)
plt.show()
# Generating sample data
np.random.seed(10)
data1 = np.random.normal(loc=0, scale=1, size=400)
data2 = np.random.normal(loc=5, scale=2, size=400)
df = pd.DataFrame({'Data1': data1, 'Data2': data2})
# Creating the KDE plot
plt.figure(figsize=(10, 6))
sns.kdeplot(data=df, fill=True, common_norm=False, alpha=.5, linewidth=0)
# Customizing the plot
plt.title('KDE Plot of Data1 and Data2', fontsize=16)
plt.xlabel('Value', fontsize=12)
plt.ylabel('Density', fontsize=12)
plt.show()
also known as a letter value plot, is an enhanced version of a box plot introduced for better visualization of more complex distributions. It is particularly useful for larger datasets, as it provides a deeper insight into the shape of the distribution, especially in the tails. The boxen plot represents data distribution through a series of quantiles that reveal more information about the structure of the data, compared to traditional box plots that typically summarize data with medians, quartiles, and outliers.
Boxen plots are particularly useful when you need to understand more about the data beyond the central tendency and variability. They are a powerful tool for exploratory data analysis, especially when dealing with large datasets or when interested in the finer details of the data distribution. Seaborn and other data visualization libraries provide functions to easily create boxen plots, making them accessible for data analysts and scientists to incorporate into their analytical workflows.
import seaborn as sns
sns.set_theme(style="whitegrid")
diamonds = sns.load_dataset("diamonds")
clarity_ranking = ["I1", "SI2", "SI1", "VS2", "VS1", "VVS2", "VVS1", "IF"]
sns.boxenplot(
diamonds, x="clarity", y="carat",
color="b", order=clarity_ranking, width_method="linear",
)
# Sample data
np.random.seed(0)
data = pd.DataFrame({
'Group': np.repeat(['A', 'B', 'C'], 100),
'Value': (np.random.randn(300) * 100).astype(int)
})
# Creating the boxen plot
plt.figure(figsize=(10, 6))
sns.boxenplot(x='Group', y='Value', data=data, palette='coolwarm')
# Customizing the plot
plt.title('Boxen Plot of Values by Group', fontsize=16)
plt.xlabel('Group', fontsize=12)
plt.ylabel('Value', fontsize=12)
plt.show()
# Generating sample data
np.random.seed(0)
data = pd.DataFrame({
'X': np.random.rand(100) * 100,
'Y': np.random.rand(100) * 50 + 50 * np.random.rand(100),
'Category': np.random.choice(['Category 1', 'Category 2', 'Category 3'], 100)
})
# Creating the lmplot
sns.lmplot(x='X', y='Y', data=data, hue='Category', palette='Set1', aspect=1.5, height=7)
# Customizing the plot
plt.title('Linear Regression with lmplot by Category', fontsize=16)
plt.xlabel('X Value', fontsize=12)
plt.ylabel('Y Value', fontsize=12)
plt.show()
# Generating sample data
np.random.seed(10)
x = np.random.rand(150) * 50
y = 2 * x + np.random.normal(0, 8, 150)
category = np.random.choice(['Group A', 'Group B', 'Group C'], 150)
# Combining into a DataFrame
data = pd.DataFrame({'X': x, 'Y': y, 'Category': category})
# Set the aesthetic style of the plots
sns.set_style("whitegrid")
# Creating the lmplot with more details
g = sns.lmplot(x='X', y='Y', col='Category', hue='Category', data=data,
aspect=1.2, height=5, col_wrap=2, palette='Set1',
scatter_kws={'s': 50, 'alpha': 0.5}, line_kws={'lw': 2},
markers='o', sharex=False, sharey=False)
# Customizing the plot
g.fig.suptitle('Linear Regression Analysis by Category', fontsize=20, y=1.05)
g.set_axis_labels("X Variable", "Y Variable")
g.add_legend(title="Category")
g.set(xlim=(0, 50), ylim=(0, 120))
# Iterating over each subplot to make further customizations
for ax, title in zip(g.axes.flat, ['Group A - Linear Fit', 'Group B - Linear Fit', 'Group C - Linear Fit']):
ax.set_title(title)
plt.tight_layout()
plt.show()
sns.relplot
is a figure-level function in Seaborn designed for visualizing statistical relationships between data. It can create both scatter plots and line plots, making it versatile for exploring how two or more quantitative variables relate across a dataset. This function is especially powerful due to its ability to facet the data across multiple subplots with the row
and col
parameters, allowing for the examination of complex interactions and trends within subsets of the data.
A common use case for sns.relplot
is when you want to understand the relationship between two variables while also considering the impact of one or two additional categorical variables. For instance, in a dataset containing information about cars, you might use sns.relplot
to explore how the weight of cars affects their fuel efficiency (miles per gallon), with the plot points colored by the number of cylinders and faceted by the country of origin. This kind of visualization can quickly reveal patterns and anomalies, such as whether heavier cars consistently have lower fuel efficiency or if this trend varies significantly by the number of cylinders or the country of origin.
sns.relplot
shines in its flexibility and ease of use for creating complex, multi-faceted visualizations that can accommodate various aspects of a dataset, making it an invaluable tool for exploratory data analysis.
import seaborn as sns
sns.set_theme(style="white")
# Load the example mpg dataset
mpg = sns.load_dataset("mpg")
# Plot miles per gallon against horsepower with other semantics
sns.relplot(x="horsepower", y="mpg", hue="origin", size="weight",
sizes=(40, 400), alpha=.5, palette="muted",
height=6, data=mpg)
# Set the aesthetic style of the plots
sns.set_style("whitegrid")
# Creating the relational plot
g = sns.relplot(x="weight", y="mpg", hue="cylinders", data=mpg, kind="scatter", size="cylinders",
palette="viridis", sizes=(40, 400), aspect=1.5, height=5, alpha=.7)
# Customizing the plot
g.fig.suptitle('Fuel Efficiency vs. Car Weight by Number of Cylinders', fontsize=16)
g.set_axis_labels("Weight", "Miles Per Gallon (MPG)")
g.legend.set_title("Cylinders")
plt.show()
is a categorical scatterplot in Seaborn, designed to show all data points without overlapping. This plot type is particularly useful for visualizing the distribution of data across categories while maintaining each point's individual value, making it easy to assess the density and distribution of data points within categories.
A common use case for a swarmplot
is when you need to compare the distribution of a variable across different categories, especially when the dataset is not too large. For example, in the mpg
dataset, a swarmplot
could be used to visualize the distribution of fuel efficiency (measured in miles per gallon) across cars with different numbers of cylinders. This can help identify not only the central tendency and variability within each category but also how individual vehicles compare across different cylinder counts.
swarmplot
is particularly valuable in exploratory data analysis when you're interested in understanding the spread of your data across categories and looking for patterns or outliers that warrant further investigation.
# Sample data
data = pd.DataFrame({
'Group': ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C'],
'Value': [23, 45, 56, 78, 12, 30, 52, 44, 33, 67, 89, 36]
})
# Creating the swarm plot
plt.figure(figsize=(8, 6))
swarm_plot = sns.swarmplot(x='Group', y='Value', data=data, palette='viridis')
# Customizing the plot
plt.title('Swarm Plot of Values by Group', fontsize=16)
plt.xlabel('Group', fontsize=12)
plt.ylabel('Value', fontsize=12)
plt.show()
mpg = sns.load_dataset('mpg')
# Creating the swarm plot
plt.figure(figsize=(10, 8))
swarm_plot = sns.swarmplot(x='cylinders', y='mpg', data=mpg, palette='Set2')
# Customizing the plot
plt.title('Swarm Plot of MPG by Number of Cylinders', fontsize=16)
plt.xlabel('Number of Cylinders', fontsize=12)
plt.ylabel('Miles Per Gallon (MPG)', fontsize=12)
plt.show()
is a matrix of scatter plots that enables quick visualization of relationships between multiple pairwise combinations of variables in a dataset. It's particularly useful for exploring correlations, distributions, and trends among several quantitative variables. Each plot in the matrix represents the relationship between two variables, allowing for the detection of patterns, outliers, and insights that might not be apparent from looking at single variables in isolation.
In practical terms, a pairplot
might be used in a dataset like mpg
(miles per gallon) to explore how different vehicle characteristics (like weight, horsepower, and engine displacement) relate to fuel efficiency and each other. This can help automotive researchers, for instance, to identify trends and factors influencing fuel efficiency, or data scientists to select features for machine learning models predicting vehicle performance.
import seaborn as sns
import matplotlib.pyplot as plt
# Load the mpg dataset
mpg = sns.load_dataset('mpg')
# Creating the pairplot
pairplot = sns.pairplot(mpg, vars=['mpg', 'displacement', 'horsepower', 'weight'], hue='cylinders', palette='viridis')
# Customizing the plot
pairplot.fig.suptitle('Pairplot of MPG, Displacement, Horsepower, and Weight by Number of Cylinders', y=1.02)
plt.show()
A jointplot
in Seaborn is a versatile tool for visualizing the relationship between two variables by displaying their bivariate (joint) distribution and their univariate (marginal) distributions simultaneously. Essentially, it combines a scatter plot or a hexbin plot of two variables with histograms or Kernel Density Estimates (KDEs) of each variable on the axes. This allows for a detailed exploration of the relationship between the two variables, including their individual distributions.
jointplot
is particularly useful in the early stages of data analysis to understand the relationships between pairs of variables in a dataset and to identify patterns, trends, and potential outliers.jointplot
can help in examining the relationships between two measured phenomena, such as the effect of a drug on health outcomes or the relationship between economic indicators.jointplot
can be used to identify relationships between features and target variables or among features themselves, assisting in feature selection and engineering.For instance, using the mpg
dataset, a jointplot
can reveal how a car's horsepower relates to its fuel efficiency (mpg
). This visualization could provide insights into how engine power influences fuel consumption, useful for automotive engineers, policymakers, and consumers interested in vehicle performance and environmental impact.
# Load the mpg dataset
mpg = sns.load_dataset('mpg')
# Creating a detailed joint plot of 'mpg' vs 'horsepower' with a kind of 'hex'
jointplot = sns.jointplot(x='horsepower', y='mpg', data=mpg, kind='scatter', color='blue', space=0, marginal_kws=dict(bins=30, fill=True))
# Customizing the plot with titles and labels
jointplot.set_axis_labels('Horsepower', 'MPG', fontsize=12)
jointplot.fig.suptitle('MPG vs Horsepower', fontsize=15, y=1.03)
plt.show()
from scipy.stats import pearsonr
corr_coeff, p_value = pearsonr(mpg['horsepower'].fillna(0), mpg['mpg'].fillna(0))
# Creating the joint plot
jointplot = sns.jointplot(x='horsepower', y='mpg', data=mpg, kind='reg', color='blue')
# Annotating the plot with the correlation coefficient and p-value
plt.text(x=0.5, y=0.9, s=f'Corr: {corr_coeff:.2f}, p-value: {p_value:.2e}',
ha='center', va='center', transform=jointplot.ax_joint.transAxes)
# Show the plot
plt.show()
penguins = sns.load_dataset("penguins")
sns.jointplot(data=penguins, x="flipper_length_mm", y="bill_length_mm", hue="species")
# Creating a displot for the distribution of mpg values
sns.displot(mpg['mpg'], kde=True,bins=20)
plt.title('Distribution of MPG in Cars')
plt.xlabel('Miles Per Gallon (MPG)')
plt.ylabel('Density')
plt.show()
#import seaborn as sns
sns.set_theme(style="whitegrid")
# Load the diamonds dataset
diamonds = sns.load_dataset("diamonds")
# Plot the distribution of clarity ratings, conditional on carat
sns.displot(
data=diamonds,
x="carat", hue="cut",
kind="kde", height=6,
multiple="fill", clip=(0, None),
palette="ch:rot=-.25,hue=1,light=.75",
)
# Python Program to find the area of triangle
a = 5
b = 6
c = 7
# Uncomment below to take inputs from the user
# a = float(input('Enter first side: '))
# b = float(input('Enter second side: '))
# c = float(input('Enter third side: '))
# calculate the semi-perimeter
s = (a + b + c) / 2
# calculate the area
area = (s*(s-a)*(s-b)*(s-c)) ** 0.5
print('The area of the triangle is %0.2f' %area)