Seaborn Facet Grid: Create Multi-Plot Layouts

Faceting and grids in data visualization are powerful concepts for creating multi-panel plots, allowing for the exploration of complex datasets by dividing the data into subsets and plotting these subsets side by side.
20 Videos
No Coding Experience Required
45 Assignments
Self Paced
An abstract design featuring smooth curves and geometric shapes, creating a minimalist aesthetic.

Sign Up For Free

Join now for expert-led courses, hands-on exercises, and a supportive learning community!

Faceting and grids in data visualization are powerful concepts for creating multi-panel plots, allowing for the exploration of complex datasets by dividing the data into subsets and plotting these subsets side by side. This approach makes it easier to compare different subsets of data and identify patterns, trends, and outliers.

Faceting

Faceting involves creating a matrix of plots that share the same variable mappings across different subsets of the data. Each "facet" represents a slice or aspect of the dataset, enabling detailed comparisons across multiple categories simultaneously. It's particularly useful when you want to compare the distribution or relationship of variables within different subgroups of your data.

Seaborn's FacetGrid is a prime example of faceting. It lets you create a grid of subplots based on the values of one or more categorical variables. You can map dataset-wide plots onto each facet, allowing for the same plot type to be repeated across different categories.

Grids

Grids, like faceting, allow for the creation of multiple plots, but they offer more flexibility in terms of plot types and layouts. Seaborn provides two main types of grids: FacetGrid and PairGrid.

  • FacetGrid: As mentioned, FacetGrid is used for faceting a dataset by one or two variables. It can work with any plot type.
  • PairGrid: PairGrid is another type of grid for plotting pairwise relationships in a dataset. By drawing the same plot types across an entire dataframe, it enables detailed analysis of how every variable relates to all others.

Use Cases

  • Comparative Analysis: Faceting and grids are ideal for comparing patterns across different levels of a categorical variable. For example, you might use faceting to compare the distribution of product sales across different regions.
  • Pattern Recognition: They help in recognizing patterns that are specific to subgroups within the data. For instance, using a PairGrid, you could explore how the relationships between pairs of variables differ across species in a dataset of flowers.
  • Data Exploration: Both faceting and grids facilitate extensive data exploration. They can help identify which variables influence others and how those relationships are manifested across different subgroups of the data.

Faceting and grids enhance the visualization of high-dimensional data by breaking it down into understandable and comparable chunks. They leverage the power of visual comparison to reveal insights that might not be apparent from looking at a single plot or statistic, making them invaluable tools in data analysis and storytelling.

import seaborn as sns
import matplotlib.pyplot as plt

# Load the penguins dataset
penguins = sns.load_dataset('penguins')

# Initialize a FacetGrid
g = sns.FacetGrid(penguins, row='species', col='island', margin_titles=True, height=3)

# Map a scatter plot to each facet
g.map(sns.scatterplot, 'flipper_length_mm', 'body_mass_g')

# Customize the FacetGrid
g.fig.suptitle('Flipper Length vs. Body Mass for Penguins by Species and Island', y=1.03)
g.set_axis_labels('Flipper Length (mm)', 'Body Mass (g)')
g.add_legend()

plt.show()
A grid of scatterplots comparing miles per gallon (MPG) versus weight, segmented by the number of cylinders and origin of the car.
A grid of scatterplots comparing miles per gallon (MPG) versus weight, segmented by the number of cylinders and origin of the car.

# Load the mpg dataset
mpg = sns.load_dataset('mpg')

# Initialize a FacetGrid
g = sns.FacetGrid(mpg, col='cylinders', row='origin', margin_titles=True, height=3)

# Map a scatter plot to each facet
g.map(sns.scatterplot, 'weight', 'mpg')

# Customize the FacetGrid
g.fig.suptitle('MPG vs. Weight by Number of Cylinders and Origin', y=1.03)
g.set_axis_labels('Weight', 'MPG')
g.add_legend()

plt.show()
A blank dual-panel plot intended for comparing tips during lunch and dinner, with axes labeled.
A blank dual-panel plot intended for comparing tips during lunch and dinner, with axes labeled.

Faceting and grids

Faceting and grids in data visualization are powerful concepts for creating multi-panel plots, allowing for the exploration of complex datasets by dividing the data into subsets and plotting these subsets side by side. This approach makes it easier to compare different subsets of data and identify patterns, trends, and outliers.

Faceting

Faceting involves creating a matrix of plots that share the same variable mappings across different subsets of the data. Each "facet" represents a slice or aspect of the dataset, enabling detailed comparisons across multiple categories simultaneously. It's particularly useful when you want to compare the distribution or relationship of variables within different subgroups of your data.

Seaborn's FacetGrid is a prime example of faceting. It lets you create a grid of subplots based on the values of one or more categorical variables. You can map dataset-wide plots onto each facet, allowing for the same plot type to be repeated across different categories.

Grids

Grids, like faceting, allow for the creation of multiple plots, but they offer more flexibility in terms of plot types and layouts. Seaborn provides two main types of grids: FacetGrid and PairGrid.

  • FacetGrid: As mentioned, FacetGrid is used for faceting a dataset by one or two variables. It can work with any plot type.
  • PairGrid: PairGrid is another type of grid for plotting pairwise relationships in a dataset. By drawing the same plot types across an entire dataframe, it enables detailed analysis of how every variable relates to all others.

Use Cases

  • Comparative Analysis: Faceting and grids are ideal for comparing patterns across different levels of a categorical variable. For example, you might use faceting to compare the distribution of product sales across different regions.
  • Pattern Recognition: They help in recognizing patterns that are specific to subgroups within the data. For instance, using a PairGrid, you could explore how the relationships between pairs of variables differ across species in a dataset of flowers.
  • Data Exploration: Both faceting and grids facilitate extensive data exploration. They can help identify which variables influence others and how those relationships are manifested across different subgroups of the data.

Faceting and grids enhance the visualization of high-dimensional data by breaking it down into understandable and comparable chunks. They leverage the power of visual comparison to reveal insights that might not be apparent from looking at a single plot or statistic, making them invaluable tools in data analysis and storytelling.

tips = sns.load_dataset("tips")
g = sns.FacetGrid(tips, col="time")
A histogram comparing the distribution of tips for lunch and dinner, with separate bars for each time category.
A histogram comparing the distribution of tips for lunch and dinner, with separate bars for each time category.

tips.info()
A scatterplot comparing total bills and tips, split by gender and smoking status, with points colored by smoking status.
A scatterplot comparing total bills and tips, split by gender and smoking status, with points colored by smoking status.
g = sns.FacetGrid(tips, col="time")
g.map(sns.histplot, "tip")

A bar chart displaying average total bills by gender across different days of the week, with error bars for variability.
A bar chart displaying average total bills by gender across different days of the week, with error bars for variability.

g = sns.FacetGrid(tips, col="sex", hue="smoker")
g.map(sns.scatterplot, "total_bill", "tip", alpha=.7)
g.add_legend()

A dual-panel scatterplot showing total bills against tips, split by gender, with regression-like patterns visible in the data.
A dual-panel scatterplot showing total bills against tips, split by gender, with regression-like patterns visible in the data.

g = sns.FacetGrid(tips, col="day", height=4, aspect=.5)
g.map(sns.barplot, "sex", "total_bill", order=["Male", "Female"])

A dual-panel scatterplot showing total bills versus normalized values for males and females.
A dual-panel scatterplot showing total bills versus normalized values for males and females.

Using custom functions

You’re not limited to existing matplotlib and seaborn functions when using FacetGrid. However, to work properly, any function you use must follow a few rules:

It must plot onto the “currently active” matplotlib Axes. This will be true of functions in the matplotlib.pyplot namespace, and you can call matplotlib.pyplot.gca() to get a reference to the current Axes if you want to work directly with its methods.
It must accept the data that it plots in positional arguments. Internally, FacetGrid will pass a Series of data for each of the named positional arguments passed to FacetGrid.map().
It must be able to accept color and label keyword arguments, and, ideally, it will do something useful with them. In most cases, it’s easiest to catch a generic dictionary of **kwargs and pass it along to the underlying plotting function.
from scipy import stats
def quantile_plot(x, **kwargs):
    quantiles, xr = stats.probplot(x, fit=False)
    plt.scatter(xr, quantiles, **kwargs)

g = sns.FacetGrid(tips, col="sex", height=4)
g.map(quantile_plot, "total_bill")

A pairplot grid of scatterplots and histograms showing relationships between petal and sepal dimensions for a dataset.
A pairplot grid of scatterplots and histograms showing relationships between petal and sepal dimensions for a dataset.

def qqplot(x, y, **kwargs):
    _, xr = stats.probplot(x, fit=False)
    _, yr = stats.probplot(y, fit=False)
    plt.scatter(xr, yr, **kwargs)

g = sns.FacetGrid(tips, col="smoker", height=4)
g.map(qqplot, "total_bill", "tip")

A pairplot grid visualizing relationships between variables in a dataset, including petal and sepal lengths and widths.
A pairplot grid visualizing relationships between variables in a dataset, including petal and sepal lengths and widths.

pairgrid

iris = sns.load_dataset("iris")
g = sns.PairGrid(iris)
g.map(sns.scatterplot)
A pairplot comparing sepal length and sepal width, with diagonal panels showing direct relationships between variables.
A pairplot comparing sepal length and sepal width, with diagonal panels showing direct relationships between variables.
g = sns.PairGrid(iris)
g.map_diag(sns.histplot)
g.map_offdiag(sns.scatterplot)
A pairplot with scatter plots, histograms, and density curves showing relationships between Iris dataset features.
A pairplot with scatter plots, histograms, and density curves showing relationships between Iris dataset features.

g = sns.PairGrid(iris, vars=["sepal_length", "sepal_width"], hue="species")
g.map(sns.scatterplot)
A scatter plot of penguin bill length versus depth, categorized by gender with point sizes representing body mass.
A scatter plot of penguin bill length versus depth, categorized by gender with point sizes representing body mass.

g = sns.PairGrid(iris)
g.map_upper(sns.scatterplot)
g.map_lower(sns.kdeplot)
g.map_diag(sns.kdeplot, lw=3, legend=False)

A scatter plot of penguin bill length versus depth without any grouping or categorization.
A scatter plot of penguin bill length versus depth without any grouping or categorization.

The seaborn.objects interface

The seaborn.objects namespace was introduced in version 0.12 as a completely new interface for making seaborn plots. It offers a more consistent and flexible API, comprising a collection of composable classes for transforming and plotting data. In contrast to the existing seaborn functions, the new interface aims to support end-to-end plot specification and customization without dropping down to matplotlib (although it will remain possible to do so if necessary).

import seaborn.objects as so
(
    so.Plot(
        penguins, x="bill_length_mm", y="bill_depth_mm",
        edgecolor="sex", edgewidth="body_mass_g",
    )
    .add(so.Dot(color=".8"))
)

A scatter plot of penguin bill length versus depth, color-coded by species with body mass represented by point size.
A scatter plot of penguin bill length versus depth, color-coded by species with body mass represented by point size.
(
    so.Plot(penguins, x="bill_length_mm", y="bill_depth_mm")
    .add(so.Dot(color="g", pointsize=4))
)

A bar chart comparing penguin body mass by species and gender (male and female).
A bar chart comparing penguin body mass by species and gender (male and female).
(
    so.Plot(
        penguins, x="bill_length_mm", y="bill_depth_mm",
        color="species", pointsize="body_mass_g",
    )
    .add(so.Dot())
)
A horizontal bar chart comparing total bills by group size for lunch and dinner.
A horizontal bar chart comparing total bills by group size for lunch and dinner.
(
    so.Plot(penguins, x="species", y="body_mass_g", color="sex")
    .add(so.Bar(), so.Agg(), so.Dodge())
)
A scatter plot showing a linear regression line comparing tips and total bills during lunch and dinner.
A scatter plot showing a linear regression line comparing tips and total bills during lunch and dinner.
(
    so.Plot(tips, x="total_bill", y="size", color="time")
    .add(so.Bar(), so.Agg(), so.Dodge(), orient="y")
)

A grid plot showing only the Euclidean distance as a straight black line between two points.
A grid plot showing only the Euclidean distance as a straight black line between two points.
(
    so.Plot(tips, x="total_bill", y="tip", color="time")
    .add(so.Dots())
    .add(so.Line(), so.PolyFit())
)

A grid plot showing the Euclidean distance between two points with a straight red diagonal line.
A grid plot showing the Euclidean distance between two points with a straight red diagonal line.

import pandas as pd

# Define a DataFrame to hold the distance metrics and their formulas
distance_metrics = pd.DataFrame({
    "Distance Metric": ["Euclidean Distance", "Manhattan Distance", "Hamming Distance", "Cosine Similarity", "Minkowski Distance"],
    "Formula": [
        "sqrt((x1 - x2)^2 + (y1 - y2)^2)",
        "|x1 - x2| + |y1 - y2|",
        "The number of positions at which the corresponding symbols are different.",
        "1 - (A . B) / (||A|| ||B||)",
        "(|x1 - x2|^p + |y1 - y2|^p)^(1/p)"
    ],
    "Description": [
        "Straight-line distance between two points in Euclidean space.",
        "Sum of the absolute differences of their Cartesian coordinates.",
        "Used for categorical data. Counts the number of positions where the corresponding symbols are different.",
        "Measures the cosine of the angle between two vectors projected in a multi-dimensional space.",
        "Generalized form of Euclidean and Manhattan distances. 'p' determines the metric to use."
    ]
})

distance_metrics

A grid plot showing the Manhattan distance between two points with a stepwise green path.
A grid plot showing the Manhattan distance between two points with a stepwise green path.

import matplotlib.pyplot as plt
import numpy as np

# Points coordinates
x1, y1 = (2, 3)
x2, y2 = (5, 7)

# Calculate Euclidean distance
distance = np.sqrt((x2 - x1)**2 + (y2 - y1)**2)

# Create plot
fig, ax = plt.subplots()
ax.plot([x1, x2], [y1, y2], 'ro-')  # Line between points
ax.text(x1, y1, 'A (2, 3)', fontsize=12, ha='right')
ax.text(x2, y2, 'B (5, 7)', fontsize=12, ha='right')
ax.annotate('', xy=(x1, y1), xytext=(x2, y2),
            arrowprops=dict(arrowstyle='<->', lw=2))
ax.text((x1 + x2) / 2, (y1 + y2) / 2, f'd = {distance:.2f}', fontsize=14, ha='left')

# Grid and labels
plt.grid(True)
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Euclidean Distance\n$d = \\sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}$')
plt.axis('equal')

plt.show()
Visualization of Cosine Distance formula with arrows for Vector A and Vector B on a 2D Cartesian plane.
Visualization of Cosine Distance formula with arrows for Vector A and Vector B on a 2D Cartesian plane.

# Create plot with horizontal and vertical distances
fig, ax = plt.subplots()

# Plot points
ax.plot([x1, x2], [y1, y2], 'ro')  # Points A and B
ax.text(x1, y1, 'A (2, 3)', fontsize=12, ha='right')
ax.text(x2, y2, 'B (5, 7)', fontsize=12, ha='right')

# Draw horizontal and vertical lines
ax.plot([x1, x2], [y1, y1], 'g--')  # Horizontal line
ax.plot([x2, x2], [y1, y2], 'b--')  # Vertical line

# Annotations for distances
ax.text((x1 + x2) / 2, y1 -0.8, f'Horizontal (x2-x1)= {x2 - x1}', va='top', ha='center', color='green')
ax.text(x2 + 0.3, (y1 + y2) /2, f'Vertical (y2-y1)= {y2 - y1}', ha='left', va='center', rotation='vertical', color='blue')

# Euclidean distance line and annotation
ax.annotate('', xy=(x1, y1), xytext=(x2, y2),
            arrowprops=dict(arrowstyle='<->', lw=1.5, color='red'))
ax.text((x1 + x2) / 3, (y1 + y2) / 2, f'd = {distance:.2f}', fontsize=14, ha='left', color='red')

# Setting
plt.grid(True)
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Euclidean Distance\n$d = \\sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}$')
plt.axis('equal')

plt.show()

Confusion matrix heatmap visualizing the classification model's performance across three flower classes: setosa, versicolor, and virginica.
Confusion matrix heatmap visualizing the classification model's performance across three flower classes: setosa, versicolor, and virginica.

# Create plot for Manhattan Distance
fig, ax = plt.subplots()

# Plot points
ax.plot([x1, x2], [y1, y2], 'ro')  # Points A and B
ax.text(x1, y1, 'A (2, 3)', fontsize=12, ha='right')
ax.text(x2, y2, 'B (5, 7)', fontsize=12, ha='right')

# Draw path representing Manhattan Distance
ax.plot([x1, x1], [y1, y2], 'g--')  # Vertical part
ax.plot([x1, x2], [y2, y2], 'g--')  # Horizontal part

# Calculate Manhattan distance
manhattan_distance = abs(x2 - x1) + abs(y2 - y1)

# Annotations for Manhattan distance
ax.text(x1 - 0.5, (y1 + y2) / 2, f'|Y2-Y1|= {abs(y2 - y1)}', va='center', ha='right', color='red')
ax.text((x1 + x2) / 2, y2-0.4, f'|X2-X1|= {abs(x2 - x1)}', ha='center', va='bottom', color='red')
ax.text((x1 + x2) / 2, y1 - 1, f'Manhattan d = {manhattan_distance}', fontsize=14, ha='center', color='green')

# Setting
plt.grid(True)
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Manhattan Distance = |X2-X1|+|Y2-Y1|')
plt.axis('equal')

plt.show()
Pair plot illustrating the relationship between Flipper Length and Body Mass for Adelie, Chinstrap, and Gentoo penguins, segmented by islands.
Pair plot illustrating the relationship between Flipper Length and Body Mass for Adelie, Chinstrap, and Gentoo penguins, segmented by islands.

# Define vectors for demonstration
A = np.array([0, 2])  # Vector A
B = np.array([2, 2])  # Vector B

# Calculate cosine similarity and cosine distance
dot_product = np.dot(A, B)
norm_a = np.linalg.norm(A)
norm_b = np.linalg.norm(B)
cosine_similarity = dot_product / (norm_a * norm_b)
cosine_distance = 1 - cosine_similarity

# Create the plot
fig, ax = plt.subplots()

# Draw vectors
ax.quiver(0, 0, A[0], A[1], angles='xy', scale_units='xy', scale=1, color='blue', width=0.01, label='Vector A')
ax.quiver(0, 0, B[0], B[1], angles='xy', scale_units='xy', scale=1, color='red', width=0.01, label='Vector B')

# Annotations and labels
ax.text(A[0]/2 - 0.1, A[1]/2, 'A', color='blue', fontsize=12)
ax.text(B[0]/2, B[1]/1.5, 'B', color='red', fontsize=12)
ax.text(1.5, 1, f'Cosine Distance = {cosine_distance:.2f}', fontsize=14, color='green')

# Setting
plt.grid(True)
plt.xlim(-0.5, 3)
plt.ylim(-0.5, 3)
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Cosine Distance\n$1 - \\frac{\\mathbf{A} \\cdot \\mathbf{B}}{\\|\\mathbf{A}\\| \\|\\mathbf{B}\\|}$')
plt.legend()
plt.gca().set_aspect('equal', adjustable='box')

plt.show()

Summary of a Pandas DataFrame with seven columns, showing non-null counts and data types for attributes like total_bill, tip, sex, and more.
Summary of a Pandas DataFrame with seven columns, showing non-null counts and data types for attributes like total_bill, tip, sex, and more.

# Importing necessary libraries
importlotlib.pyplot as plt
from sklearn.metrics import classification_report
from sklearn.model_s numpy as np
from collections import Counter
import matpelection import train_test_split

# Step 1: Distance Metric - Euclidean Distance
def euclidean_distance(x1, x2):
    """
    Calculate the Euclidean distance between two vectors.
    """
    return np.sqrt(np.sum((x1 - x2) ** 2))

# Step 2: KNN Classifier
class KNN:
    def __init__(self, k=3):
        self.k = k

    def fit(self, X, y):
        """
        Fit the model using the training data.
        """
        self.X_train = X
        self.y_train = y

    def predict(self, X):
        """
        Predict the class labels for the provided data.
        """
        y_pred = [self._predict(x) for x in X]
        return np.array(y_pred)

    def _predict(self, x):
        """
        Predict the class label for a single sample.
        """
        # Compute distances between x and all examples in the training set
        distances = [euclidean_distance(x, x_train) for x_train in self.X_train]
        # Sort by distance and return indices of the first k neighbors
        k_indices = np.argsort(distances)[:self.k]
        # Extract the labels of the k nearest neighbor training samples
        k_nearest_labels = [self.y_train[i] for i in k_indices]
        # Return the most common class label
        most_common = Counter(k_nearest_labels).most_common(1)
        return most_common[0][0]

    def calculate_accuracy(self,y_true, y_pred):
        """
        Calculate the accuracy of predictions against the true labels.
        """
        return np.sum(y_true == y_pred) / len(y_true)

    def generate_classification_report(self,y_true, y_pred):
        """
        Generate a classification report including precision, recall, and F1-score.
        """
        return print(classification_report(y_true, y_pred))

if __name__ == "__main__":
    from sklearn.datasets import load_iris
    from sklearn.model_selection import train_test_split

    # Load dataset
    iris = load_iris()
    X, y = iris.data, iris.target

    # Split dataset
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)

    # Initialize and train classifier
    classifier = KNN(k=4)
    classifier.fit(X_train, y_train)

    # Make predictions
    predictions = classifier.predict(X_test)

    # Calculate accuracy
    accuracy = classifier.calculate_accuracy(y_test,predictions)
    print(f"Accuracy: {accuracy}")
    classifier.generate_classification_report(y_test,predictions)

OUTPUT:

Table comparing distance metrics such as Euclidean, Manhattan, Hamming, Cosine Similarity, and Minkowski with corresponding formulas and brief descriptions.
Table comparing distance metrics such as Euclidean, Manhattan, Hamming, Cosine Similarity, and Minkowski with corresponding formulas and brief descriptions.
classifier.calculate_accuracy(y_test,predictions)

Output: 0.9666666666666667

classifier.generate_classification_report(y_test,predictions)

OUTPUT:

      precision    recall f1-score   support

 

          0       1.00     1.00      1.00        13

          1       1.00     0.83      0.91         6

          2       0.92     1.00      0.96        11

 

   accuracy                           0.97        30

  macro avg       0.97     0.94      0.96        30

weighted avg       0.97     0.97      0.97        30

classifier.plot_confusion_matrix(y_test,predictions,classes=iris.target_names)

OUTPUT:

Classification report displaying accuracy, precision, recall, F1-score, and support for a three-class model.
Classification report displaying accuracy, precision, recall, F1-score, and support for a three-class model.
# Enhanced KNN Classifier with different distance metrics and weighted voting

# Step 1: Generalize the distance calculation
def minkowski_distance(x1, x2, p=2):
    """
    Calculate the Minkowski distance between two vectors.
    """
    return np.sum(np.abs(x1 - x2) ** p) ** (1/p)

# Step 2: Enhanced KNN Class
class KNN_Enhanced:
    def __init__(self, k=3, distance_metric='euclidean', weights='uniform'):
        self.k = k
        self.weights = weights
        # Assign the appropriate distance function
        if distance_metric == 'euclidean':
            self.distance = lambda x1, x2: minkowski_distance(x1, x2, p=2)
        elif distance_metric == 'manhattan':
            self.distance = lambda x1, x2: minkowski_distance(x1, x2, p=1)
        elif distance_metric == 'minkowski':
            self.distance = lambda x1, x2: minkowski_distance(x1, x2, p=3)
        else:
            raise ValueError("Unsupported distance metric. Use 'euclidean', 'manhattan', or 'minkowski'.")

    def fit(self, X, y):
        self.X_train = X
        self.y_train = y

    def predict(self, X):
        y_pred = [self._predict(x) for x in X]
        return np.array(y_pred)

    def _predict(self, x):
        # Compute distances between x and all examples in the training set
        distances = [self.distance(x, x_train) for x_train in self.X_train]
        # Sort by distance and return indices of the first k neighbors
        k_indices = np.argsort(distances)[:self.k]
        # Extract the labels of the k nearest neighbor training samples
        k_nearest_labels = [self.y_train[i] for i in k_indices]

        if self.weights == 'uniform':
            # Uniform weights: Use majority vote
            most_common = Counter(k_nearest_labels).most_common(1)
            return most_common[0][0]
        elif self.weights == 'distance':
            # Weighted by distance: Closer neighbors have higher weight
            k_nearest_distances = [distances[i] for i in k_indices]
            # Avoid division by zero
            k_nearest_distances = [1e-5 if d == 0 else d for d in k_nearest_distances]
            weight_sum = sum((1/d) for d in k_nearest_distances)
            weighted_votes = Counter()
            for i, label in enumerate(k_nearest_labels):
                weighted_votes[label] += (1 / k_nearest_distances[i]) / weight_sum
            return weighted_votes.most_common(1)[0][0]
        else:
            raise ValueError("Unsupported weights type. Use 'uniform' or 'distance'.")
    def calculate_accuracy(self,y_true, y_pred):
        """
        Calculate the accuracy of predictions against the true labels.
        """
        return np.sum(y_true == y_pred) / len(y_true)

    def generate_classification_report(self,y_true, y_pred):
        """
        Generate a classification report including precision, recall, and F1-score.
        """
        return print(classification_report(y_true, y_pred))



# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)

knn_uniform_euclidean = KNN_Enhanced(k=2, distance_metric='euclidean', weights='uniform')
knn_distance_manhattan = KNN_Enhanced(k=5, distance_metric='manhattan', weights='distance')

# Fitting the model
knn_uniform_euclidean.fit(X_train, y_train)
knn_distance_manhattan.fit(X_train, y_train)

# Making predictions
predictions_uniform_euclidean = knn_uniform_euclidean.predict(X_test)
predictions_distance_manhattan = knn_distance_manhattan.predict(X_test)

accuracy_uniform_euclidean= classifier.calculate_accuracy(y_test,predictions)
print(f"Accuracy_uniform_euclidean: {accuracy_uniform_euclidean}")
classifier.generate_classification_report(y_test,predictions_uniform_euclidean)

accuracy_distance_manhattan= classifier.calculate_accuracy(y_test,predictions_distance_manhattan)
print(f"Accuracy_distance_manhattan: {accuracy_distance_manhattan}")
classifier.generate_classification_report(y_test,predictions_distance_manhattan)

OUTPUT:

Accuracy_uniform_euclidean:0.9666666666666667

             precision    recall f1-score   support

 

          0       1.00     1.00      1.00        13

          1       0.71     0.83      0.77         6

          2       0.90     0.82      0.86        11

 

   accuracy                           0.90        30

  macro avg       0.87     0.88      0.88        30

weighted avg       0.91     0.90      0.90        30

 

Accuracy_distance_manhattan:0.9333333333333333

             precision    recall f1-score   support

 

          0       1.00     1.00      1.00        13

          1       0.83     0.83      0.83         6

          2       0.91     0.91      0.91        11

 

   accuracy                           0.93        30

  macro avg       0.91     0.91      0.91        30

weighted avg       0.93     0.93      0.93        30

accuracy_uniform_euclidean = np.sum(predictions_uniform_euclidean == y_test) / len(y_test)
print(f"Accuracy: {accuracy_uniform_euclidean}")

OUTPUT:

Accuracy: 0.9385964912280702

from sklearn.datasets import load_breast_cancer
cancer_data= load_breast_cancer()
X, y = cancer_data.data, cancer_data.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)

knn_uniform_euclidean = KNN_Enhanced(k=3, distance_metric='euclidean', weights='uniform')
knn_distance_manhattan = KNN_Enhanced(k=5, distance_metric='manhattan', weights='distance')

# Fitting the model
knn_uniform_euclidean.fit(X_train, y_train)
knn_distance_manhattan.fit(X_train, y_train)

# Making predictions
predictions_uniform_euclidean = knn_uniform_euclidean.predict(X_test)
predictions_distance_manhattan = knn_distance_manhattan.predict(X_test)

predictions_uniform_euclidean, predictions_distance_manhattan

OUTPUT:

(array([1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1,1, 0, 0,

       1, 1, 1, 0, 0, 1,0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0,

       0, 0, 0, 0, 1, 0,0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0,

       1, 1, 1, 1, 1, 1,0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1,

       0, 1, 0, 1, 1, 0,1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

       1, 1, 1, 0]),

array([1, 1, 0, 1, 0, 1,1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0,

       1, 1, 1, 0, 0, 1,0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1,

       0, 1, 0, 0, 1, 0,0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0,

       1, 1, 1, 0, 1, 1,0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1,

       0, 1, 0, 1, 1, 0,1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

       1, 1, 1, 0]))

from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Initialize and train classifier
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train_scaled, y_train)

# Make predictions
predictions = knn.predict(X_test_scaled)

# Evaluate accuracy
print(f"Accuracy: {accuracy_score(y_test, predictions)}")

OUTPUT:

Accuracy: 1.0

Lesson Assignment
Challenge yourself with our lab assignment and put your skills to test.
# Python Program to find the area of triangle

a = 5
b = 6
c = 7

# Uncomment below to take inputs from the user
# a = float(input('Enter first side: '))
# b = float(input('Enter second side: '))
# c = float(input('Enter third side: '))

# calculate the semi-perimeter
s = (a + b + c) / 2

# calculate the area
area = (s*(s-a)*(s-b)*(s-c)) ** 0.5
print('The area of the triangle is %0.2f' %area)
Sign up to get access to our code lab and run this code.
AI icon

AI Assistant For Help

Enhance your learning experience with our AI Learning Assistant. This sophisticated tool seamlessly evaluates your progress, course materials, and code, providing customized feedback and suggestions on the spot.
development icon

Flexible Mobile Coding

Engage with your coding tasks anytime, anywhere. Our adaptable, mobile optimized IDE lets you execute programming tasks directly from any web enabled device.
web
search icon

Project Development Support

Navigate through project challenges effortlessly with AI- powered support and swift access to a resource- rich community network.
file sharing icon

On-Demand Documentation

Quickly access integrated, context-specific documentation directly within the learning platform, streamlining your study process without the need to switch applications.
An abstract design featuring smooth curves and geometric shapes, creating a minimalist aesthetic.

Ready to become a Data Scientist that industry loves to hire? Apply Now. 

Explore Courses