Data Visualization with Python

Seaborn - Facet Grid

Faceting and grids in data visualization are powerful concepts for creating multi-panel plots, allowing for the exploration of complex datasets by dividing the data into subsets and plotting these subsets side by side.
Enroll

Faceting and grids in data visualization are powerful concepts for creating multi-panel plots, allowing for the exploration of complex datasets by dividing the data into subsets and plotting these subsets side by side. This approach makes it easier to compare different subsets of data and identify patterns, trends, and outliers.

Faceting

Faceting involves creating a matrix of plots that share the same variable mappings across different subsets of the data. Each "facet" represents a slice or aspect of the dataset, enabling detailed comparisons across multiple categories simultaneously. It's particularly useful when you want to compare the distribution or relationship of variables within different subgroups of your data.

Seaborn's FacetGrid is a prime example of faceting. It lets you create a grid of subplots based on the values of one or more categorical variables. You can map dataset-wide plots onto each facet, allowing for the same plot type to be repeated across different categories.

Grids

Grids, like faceting, allow for the creation of multiple plots, but they offer more flexibility in terms of plot types and layouts. Seaborn provides two main types of grids: FacetGrid and PairGrid.

  • FacetGrid: As mentioned, FacetGrid is used for faceting a dataset by one or two variables. It can work with any plot type.
  • PairGrid: PairGrid is another type of grid for plotting pairwise relationships in a dataset. By drawing the same plot types across an entire dataframe, it enables detailed analysis of how every variable relates to all others.

Use Cases

  • Comparative Analysis: Faceting and grids are ideal for comparing patterns across different levels of a categorical variable. For example, you might use faceting to compare the distribution of product sales across different regions.
  • Pattern Recognition: They help in recognizing patterns that are specific to subgroups within the data. For instance, using a PairGrid, you could explore how the relationships between pairs of variables differ across species in a dataset of flowers.
  • Data Exploration: Both faceting and grids facilitate extensive data exploration. They can help identify which variables influence others and how those relationships are manifested across different subgroups of the data.

Faceting and grids enhance the visualization of high-dimensional data by breaking it down into understandable and comparable chunks. They leverage the power of visual comparison to reveal insights that might not be apparent from looking at a single plot or statistic, making them invaluable tools in data analysis and storytelling.

import seaborn as sns
import matplotlib.pyplot as plt

# Load the penguins dataset
penguins = sns.load_dataset('penguins')

# Initialize a FacetGrid
g = sns.FacetGrid(penguins, row='species', col='island', margin_titles=True, height=3)

# Map a scatter plot to each facet
g.map(sns.scatterplot, 'flipper_length_mm', 'body_mass_g')

# Customize the FacetGrid
g.fig.suptitle('Flipper Length vs. Body Mass for Penguins by Species and Island', y=1.03)
g.set_axis_labels('Flipper Length (mm)', 'Body Mass (g)')
g.add_legend()

plt.show()

# Load the mpg dataset
mpg = sns.load_dataset('mpg')

# Initialize a FacetGrid
g = sns.FacetGrid(mpg, col='cylinders', row='origin', margin_titles=True, height=3)

# Map a scatter plot to each facet
g.map(sns.scatterplot, 'weight', 'mpg')

# Customize the FacetGrid
g.fig.suptitle('MPG vs. Weight by Number of Cylinders and Origin', y=1.03)
g.set_axis_labels('Weight', 'MPG')
g.add_legend()

plt.show()

Faceting and grids

Faceting and grids in data visualization are powerful concepts for creating multi-panel plots, allowing for the exploration of complex datasets by dividing the data into subsets and plotting these subsets side by side. This approach makes it easier to compare different subsets of data and identify patterns, trends, and outliers.

Faceting

Faceting involves creating a matrix of plots that share the same variable mappings across different subsets of the data. Each "facet" represents a slice or aspect of the dataset, enabling detailed comparisons across multiple categories simultaneously. It's particularly useful when you want to compare the distribution or relationship of variables within different subgroups of your data.

Seaborn's FacetGrid is a prime example of faceting. It lets you create a grid of subplots based on the values of one or more categorical variables. You can map dataset-wide plots onto each facet, allowing for the same plot type to be repeated across different categories.

Grids

Grids, like faceting, allow for the creation of multiple plots, but they offer more flexibility in terms of plot types and layouts. Seaborn provides two main types of grids: FacetGrid and PairGrid.

  • FacetGrid: As mentioned, FacetGrid is used for faceting a dataset by one or two variables. It can work with any plot type.
  • PairGrid: PairGrid is another type of grid for plotting pairwise relationships in a dataset. By drawing the same plot types across an entire dataframe, it enables detailed analysis of how every variable relates to all others.

Use Cases

  • Comparative Analysis: Faceting and grids are ideal for comparing patterns across different levels of a categorical variable. For example, you might use faceting to compare the distribution of product sales across different regions.
  • Pattern Recognition: They help in recognizing patterns that are specific to subgroups within the data. For instance, using a PairGrid, you could explore how the relationships between pairs of variables differ across species in a dataset of flowers.
  • Data Exploration: Both faceting and grids facilitate extensive data exploration. They can help identify which variables influence others and how those relationships are manifested across different subgroups of the data.

Faceting and grids enhance the visualization of high-dimensional data by breaking it down into understandable and comparable chunks. They leverage the power of visual comparison to reveal insights that might not be apparent from looking at a single plot or statistic, making them invaluable tools in data analysis and storytelling.

tips = sns.load_dataset("tips")
g = sns.FacetGrid(tips, col="time")

tips.info()
g = sns.FacetGrid(tips, col="time")
g.map(sns.histplot, "tip")

g = sns.FacetGrid(tips, col="sex", hue="smoker")
g.map(sns.scatterplot, "total_bill", "tip", alpha=.7)
g.add_legend()

g = sns.FacetGrid(tips, col="day", height=4, aspect=.5)
g.map(sns.barplot, "sex", "total_bill", order=["Male", "Female"])

Using custom functions

You’re not limited to existing matplotlib and seaborn functions when using FacetGrid. However, to work properly, any function you use must follow a few rules:

It must plot onto the “currently active” matplotlib Axes. This will be true of functions in the matplotlib.pyplot namespace, and you can call matplotlib.pyplot.gca() to get a reference to the current Axes if you want to work directly with its methods.
It must accept the data that it plots in positional arguments. Internally, FacetGrid will pass a Series of data for each of the named positional arguments passed to FacetGrid.map().
It must be able to accept color and label keyword arguments, and, ideally, it will do something useful with them. In most cases, it’s easiest to catch a generic dictionary of **kwargs and pass it along to the underlying plotting function.
from scipy import stats
def quantile_plot(x, **kwargs):
    quantiles, xr = stats.probplot(x, fit=False)
    plt.scatter(xr, quantiles, **kwargs)

g = sns.FacetGrid(tips, col="sex", height=4)
g.map(quantile_plot, "total_bill")

def qqplot(x, y, **kwargs):
    _, xr = stats.probplot(x, fit=False)
    _, yr = stats.probplot(y, fit=False)
    plt.scatter(xr, yr, **kwargs)

g = sns.FacetGrid(tips, col="smoker", height=4)
g.map(qqplot, "total_bill", "tip")

pairgrid

iris = sns.load_dataset("iris")
g = sns.PairGrid(iris)
g.map(sns.scatterplot)
g = sns.PairGrid(iris)
g.map_diag(sns.histplot)
g.map_offdiag(sns.scatterplot)

g = sns.PairGrid(iris, vars=["sepal_length", "sepal_width"], hue="species")
g.map(sns.scatterplot)

g = sns.PairGrid(iris)
g.map_upper(sns.scatterplot)
g.map_lower(sns.kdeplot)
g.map_diag(sns.kdeplot, lw=3, legend=False)

The seaborn.objects interface

The seaborn.objects namespace was introduced in version 0.12 as a completely new interface for making seaborn plots. It offers a more consistent and flexible API, comprising a collection of composable classes for transforming and plotting data. In contrast to the existing seaborn functions, the new interface aims to support end-to-end plot specification and customization without dropping down to matplotlib (although it will remain possible to do so if necessary).

import seaborn.objects as so
(
    so.Plot(
        penguins, x="bill_length_mm", y="bill_depth_mm",
        edgecolor="sex", edgewidth="body_mass_g",
    )
    .add(so.Dot(color=".8"))
)

(
    so.Plot(penguins, x="bill_length_mm", y="bill_depth_mm")
    .add(so.Dot(color="g", pointsize=4))
)

(
    so.Plot(
        penguins, x="bill_length_mm", y="bill_depth_mm",
        color="species", pointsize="body_mass_g",
    )
    .add(so.Dot())
)
(
    so.Plot(penguins, x="species", y="body_mass_g", color="sex")
    .add(so.Bar(), so.Agg(), so.Dodge())
)
(
    so.Plot(tips, x="total_bill", y="size", color="time")
    .add(so.Bar(), so.Agg(), so.Dodge(), orient="y")
)

(
    so.Plot(tips, x="total_bill", y="tip", color="time")
    .add(so.Dots())
    .add(so.Line(), so.PolyFit())
)

import pandas as pd

# Define a DataFrame to hold the distance metrics and their formulas
distance_metrics = pd.DataFrame({
    "Distance Metric": ["Euclidean Distance", "Manhattan Distance", "Hamming Distance", "Cosine Similarity", "Minkowski Distance"],
    "Formula": [
        "sqrt((x1 - x2)^2 + (y1 - y2)^2)",
        "|x1 - x2| + |y1 - y2|",
        "The number of positions at which the corresponding symbols are different.",
        "1 - (A . B) / (||A|| ||B||)",
        "(|x1 - x2|^p + |y1 - y2|^p)^(1/p)"
    ],
    "Description": [
        "Straight-line distance between two points in Euclidean space.",
        "Sum of the absolute differences of their Cartesian coordinates.",
        "Used for categorical data. Counts the number of positions where the corresponding symbols are different.",
        "Measures the cosine of the angle between two vectors projected in a multi-dimensional space.",
        "Generalized form of Euclidean and Manhattan distances. 'p' determines the metric to use."
    ]
})

distance_metrics

import matplotlib.pyplot as plt
import numpy as np

# Points coordinates
x1, y1 = (2, 3)
x2, y2 = (5, 7)

# Calculate Euclidean distance
distance = np.sqrt((x2 - x1)**2 + (y2 - y1)**2)

# Create plot
fig, ax = plt.subplots()
ax.plot([x1, x2], [y1, y2], 'ro-')  # Line between points
ax.text(x1, y1, 'A (2, 3)', fontsize=12, ha='right')
ax.text(x2, y2, 'B (5, 7)', fontsize=12, ha='right')
ax.annotate('', xy=(x1, y1), xytext=(x2, y2),
            arrowprops=dict(arrowstyle='<->', lw=2))
ax.text((x1 + x2) / 2, (y1 + y2) / 2, f'd = {distance:.2f}', fontsize=14, ha='left')

# Grid and labels
plt.grid(True)
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Euclidean Distance\n$d = \\sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}$')
plt.axis('equal')

plt.show()

# Create plot with horizontal and vertical distances
fig, ax = plt.subplots()

# Plot points
ax.plot([x1, x2], [y1, y2], 'ro')  # Points A and B
ax.text(x1, y1, 'A (2, 3)', fontsize=12, ha='right')
ax.text(x2, y2, 'B (5, 7)', fontsize=12, ha='right')

# Draw horizontal and vertical lines
ax.plot([x1, x2], [y1, y1], 'g--')  # Horizontal line
ax.plot([x2, x2], [y1, y2], 'b--')  # Vertical line

# Annotations for distances
ax.text((x1 + x2) / 2, y1 -0.8, f'Horizontal (x2-x1)= {x2 - x1}', va='top', ha='center', color='green')
ax.text(x2 + 0.3, (y1 + y2) /2, f'Vertical (y2-y1)= {y2 - y1}', ha='left', va='center', rotation='vertical', color='blue')

# Euclidean distance line and annotation
ax.annotate('', xy=(x1, y1), xytext=(x2, y2),
            arrowprops=dict(arrowstyle='<->', lw=1.5, color='red'))
ax.text((x1 + x2) / 3, (y1 + y2) / 2, f'd = {distance:.2f}', fontsize=14, ha='left', color='red')

# Setting
plt.grid(True)
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Euclidean Distance\n$d = \\sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}$')
plt.axis('equal')

plt.show()

# Create plot for Manhattan Distance
fig, ax = plt.subplots()

# Plot points
ax.plot([x1, x2], [y1, y2], 'ro')  # Points A and B
ax.text(x1, y1, 'A (2, 3)', fontsize=12, ha='right')
ax.text(x2, y2, 'B (5, 7)', fontsize=12, ha='right')

# Draw path representing Manhattan Distance
ax.plot([x1, x1], [y1, y2], 'g--')  # Vertical part
ax.plot([x1, x2], [y2, y2], 'g--')  # Horizontal part

# Calculate Manhattan distance
manhattan_distance = abs(x2 - x1) + abs(y2 - y1)

# Annotations for Manhattan distance
ax.text(x1 - 0.5, (y1 + y2) / 2, f'|Y2-Y1|= {abs(y2 - y1)}', va='center', ha='right', color='red')
ax.text((x1 + x2) / 2, y2-0.4, f'|X2-X1|= {abs(x2 - x1)}', ha='center', va='bottom', color='red')
ax.text((x1 + x2) / 2, y1 - 1, f'Manhattan d = {manhattan_distance}', fontsize=14, ha='center', color='green')

# Setting
plt.grid(True)
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Manhattan Distance = |X2-X1|+|Y2-Y1|')
plt.axis('equal')

plt.show()

# Define vectors for demonstration
A = np.array([0, 2])  # Vector A
B = np.array([2, 2])  # Vector B

# Calculate cosine similarity and cosine distance
dot_product = np.dot(A, B)
norm_a = np.linalg.norm(A)
norm_b = np.linalg.norm(B)
cosine_similarity = dot_product / (norm_a * norm_b)
cosine_distance = 1 - cosine_similarity

# Create the plot
fig, ax = plt.subplots()

# Draw vectors
ax.quiver(0, 0, A[0], A[1], angles='xy', scale_units='xy', scale=1, color='blue', width=0.01, label='Vector A')
ax.quiver(0, 0, B[0], B[1], angles='xy', scale_units='xy', scale=1, color='red', width=0.01, label='Vector B')

# Annotations and labels
ax.text(A[0]/2 - 0.1, A[1]/2, 'A', color='blue', fontsize=12)
ax.text(B[0]/2, B[1]/1.5, 'B', color='red', fontsize=12)
ax.text(1.5, 1, f'Cosine Distance = {cosine_distance:.2f}', fontsize=14, color='green')

# Setting
plt.grid(True)
plt.xlim(-0.5, 3)
plt.ylim(-0.5, 3)
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Cosine Distance\n$1 - \\frac{\\mathbf{A} \\cdot \\mathbf{B}}{\\|\\mathbf{A}\\| \\|\\mathbf{B}\\|}$')
plt.legend()
plt.gca().set_aspect('equal', adjustable='box')

plt.show()

# Importing necessary libraries
importlotlib.pyplot as plt
from sklearn.metrics import classification_report
from sklearn.model_s numpy as np
from collections import Counter
import matpelection import train_test_split

# Step 1: Distance Metric - Euclidean Distance
def euclidean_distance(x1, x2):
    """
    Calculate the Euclidean distance between two vectors.
    """
    return np.sqrt(np.sum((x1 - x2) ** 2))

# Step 2: KNN Classifier
class KNN:
    def __init__(self, k=3):
        self.k = k

    def fit(self, X, y):
        """
        Fit the model using the training data.
        """
        self.X_train = X
        self.y_train = y

    def predict(self, X):
        """
        Predict the class labels for the provided data.
        """
        y_pred = [self._predict(x) for x in X]
        return np.array(y_pred)

    def _predict(self, x):
        """
        Predict the class label for a single sample.
        """
        # Compute distances between x and all examples in the training set
        distances = [euclidean_distance(x, x_train) for x_train in self.X_train]
        # Sort by distance and return indices of the first k neighbors
        k_indices = np.argsort(distances)[:self.k]
        # Extract the labels of the k nearest neighbor training samples
        k_nearest_labels = [self.y_train[i] for i in k_indices]
        # Return the most common class label
        most_common = Counter(k_nearest_labels).most_common(1)
        return most_common[0][0]

    def calculate_accuracy(self,y_true, y_pred):
        """
        Calculate the accuracy of predictions against the true labels.
        """
        return np.sum(y_true == y_pred) / len(y_true)

    def generate_classification_report(self,y_true, y_pred):
        """
        Generate a classification report including precision, recall, and F1-score.
        """
        return print(classification_report(y_true, y_pred))

if __name__ == "__main__":
    from sklearn.datasets import load_iris
    from sklearn.model_selection import train_test_split

    # Load dataset
    iris = load_iris()
    X, y = iris.data, iris.target

    # Split dataset
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)

    # Initialize and train classifier
    classifier = KNN(k=4)
    classifier.fit(X_train, y_train)

    # Make predictions
    predictions = classifier.predict(X_test)

    # Calculate accuracy
    accuracy = classifier.calculate_accuracy(y_test,predictions)
    print(f"Accuracy: {accuracy}")
    classifier.generate_classification_report(y_test,predictions)

OUTPUT:

classifier.calculate_accuracy(y_test,predictions)

Output: 0.9666666666666667

classifier.generate_classification_report(y_test,predictions)

OUTPUT:

      precision    recall f1-score   support

 

          0       1.00     1.00      1.00        13

          1       1.00     0.83      0.91         6

          2       0.92     1.00      0.96        11

 

   accuracy                           0.97        30

  macro avg       0.97     0.94      0.96        30

weighted avg       0.97     0.97      0.97        30

classifier.plot_confusion_matrix(y_test,predictions,classes=iris.target_names)

OUTPUT:

# Enhanced KNN Classifier with different distance metrics and weighted voting

# Step 1: Generalize the distance calculation
def minkowski_distance(x1, x2, p=2):
    """
    Calculate the Minkowski distance between two vectors.
    """
    return np.sum(np.abs(x1 - x2) ** p) ** (1/p)

# Step 2: Enhanced KNN Class
class KNN_Enhanced:
    def __init__(self, k=3, distance_metric='euclidean', weights='uniform'):
        self.k = k
        self.weights = weights
        # Assign the appropriate distance function
        if distance_metric == 'euclidean':
            self.distance = lambda x1, x2: minkowski_distance(x1, x2, p=2)
        elif distance_metric == 'manhattan':
            self.distance = lambda x1, x2: minkowski_distance(x1, x2, p=1)
        elif distance_metric == 'minkowski':
            self.distance = lambda x1, x2: minkowski_distance(x1, x2, p=3)
        else:
            raise ValueError("Unsupported distance metric. Use 'euclidean', 'manhattan', or 'minkowski'.")

    def fit(self, X, y):
        self.X_train = X
        self.y_train = y

    def predict(self, X):
        y_pred = [self._predict(x) for x in X]
        return np.array(y_pred)

    def _predict(self, x):
        # Compute distances between x and all examples in the training set
        distances = [self.distance(x, x_train) for x_train in self.X_train]
        # Sort by distance and return indices of the first k neighbors
        k_indices = np.argsort(distances)[:self.k]
        # Extract the labels of the k nearest neighbor training samples
        k_nearest_labels = [self.y_train[i] for i in k_indices]

        if self.weights == 'uniform':
            # Uniform weights: Use majority vote
            most_common = Counter(k_nearest_labels).most_common(1)
            return most_common[0][0]
        elif self.weights == 'distance':
            # Weighted by distance: Closer neighbors have higher weight
            k_nearest_distances = [distances[i] for i in k_indices]
            # Avoid division by zero
            k_nearest_distances = [1e-5 if d == 0 else d for d in k_nearest_distances]
            weight_sum = sum((1/d) for d in k_nearest_distances)
            weighted_votes = Counter()
            for i, label in enumerate(k_nearest_labels):
                weighted_votes[label] += (1 / k_nearest_distances[i]) / weight_sum
            return weighted_votes.most_common(1)[0][0]
        else:
            raise ValueError("Unsupported weights type. Use 'uniform' or 'distance'.")
    def calculate_accuracy(self,y_true, y_pred):
        """
        Calculate the accuracy of predictions against the true labels.
        """
        return np.sum(y_true == y_pred) / len(y_true)

    def generate_classification_report(self,y_true, y_pred):
        """
        Generate a classification report including precision, recall, and F1-score.
        """
        return print(classification_report(y_true, y_pred))



# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)

knn_uniform_euclidean = KNN_Enhanced(k=2, distance_metric='euclidean', weights='uniform')
knn_distance_manhattan = KNN_Enhanced(k=5, distance_metric='manhattan', weights='distance')

# Fitting the model
knn_uniform_euclidean.fit(X_train, y_train)
knn_distance_manhattan.fit(X_train, y_train)

# Making predictions
predictions_uniform_euclidean = knn_uniform_euclidean.predict(X_test)
predictions_distance_manhattan = knn_distance_manhattan.predict(X_test)

accuracy_uniform_euclidean= classifier.calculate_accuracy(y_test,predictions)
print(f"Accuracy_uniform_euclidean: {accuracy_uniform_euclidean}")
classifier.generate_classification_report(y_test,predictions_uniform_euclidean)

accuracy_distance_manhattan= classifier.calculate_accuracy(y_test,predictions_distance_manhattan)
print(f"Accuracy_distance_manhattan: {accuracy_distance_manhattan}")
classifier.generate_classification_report(y_test,predictions_distance_manhattan)

OUTPUT:

Accuracy_uniform_euclidean:0.9666666666666667

             precision    recall f1-score   support

 

          0       1.00     1.00      1.00        13

          1       0.71     0.83      0.77         6

          2       0.90     0.82      0.86        11

 

   accuracy                           0.90        30

  macro avg       0.87     0.88      0.88        30

weighted avg       0.91     0.90      0.90        30

 

Accuracy_distance_manhattan:0.9333333333333333

             precision    recall f1-score   support

 

          0       1.00     1.00      1.00        13

          1       0.83     0.83      0.83         6

          2       0.91     0.91      0.91        11

 

   accuracy                           0.93        30

  macro avg       0.91     0.91      0.91        30

weighted avg       0.93     0.93      0.93        30

accuracy_uniform_euclidean = np.sum(predictions_uniform_euclidean == y_test) / len(y_test)
print(f"Accuracy: {accuracy_uniform_euclidean}")

OUTPUT:

Accuracy: 0.9385964912280702

from sklearn.datasets import load_breast_cancer
cancer_data= load_breast_cancer()
X, y = cancer_data.data, cancer_data.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)

knn_uniform_euclidean = KNN_Enhanced(k=3, distance_metric='euclidean', weights='uniform')
knn_distance_manhattan = KNN_Enhanced(k=5, distance_metric='manhattan', weights='distance')

# Fitting the model
knn_uniform_euclidean.fit(X_train, y_train)
knn_distance_manhattan.fit(X_train, y_train)

# Making predictions
predictions_uniform_euclidean = knn_uniform_euclidean.predict(X_test)
predictions_distance_manhattan = knn_distance_manhattan.predict(X_test)

predictions_uniform_euclidean, predictions_distance_manhattan

OUTPUT:

(array([1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1,1, 0, 0,

       1, 1, 1, 0, 0, 1,0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0,

       0, 0, 0, 0, 1, 0,0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0,

       1, 1, 1, 1, 1, 1,0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1,

       0, 1, 0, 1, 1, 0,1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

       1, 1, 1, 0]),

array([1, 1, 0, 1, 0, 1,1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0,

       1, 1, 1, 0, 0, 1,0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1,

       0, 1, 0, 0, 1, 0,0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0,

       1, 1, 1, 0, 1, 1,0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1,

       0, 1, 0, 1, 1, 0,1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

       1, 1, 1, 0]))

from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Initialize and train classifier
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train_scaled, y_train)

# Make predictions
predictions = knn.predict(X_test_scaled)

# Evaluate accuracy
print(f"Accuracy: {accuracy_score(y_test, predictions)}")

OUTPUT:

Accuracy: 1.0

Lesson Assignment
Challenge yourself with our lab assignment and put your skills to test.
# Python Program to find the area of triangle

a = 5
b = 6
c = 7

# Uncomment below to take inputs from the user
# a = float(input('Enter first side: '))
# b = float(input('Enter second side: '))
# c = float(input('Enter third side: '))

# calculate the semi-perimeter
s = (a + b + c) / 2

# calculate the area
area = (s*(s-a)*(s-b)*(s-c)) ** 0.5
print('The area of the triangle is %0.2f' %area)
Sign up to get access to our code lab and run this code.
Sign up

Create Your Account Now!

Join our platform today and unlock access to expert- led courses, hands- on exercises, and a supportive learning community.
Sign Up