An abstract design featuring smooth curves and geometric shapes, creating a minimalist aesthetic.

Comprehensive Guide to Logistic Regression in Machine Learning – The Fundamentals

Logistic Regression is a powerful tool for making yes/no predictions, transforming raw inputs into probabilities with its S-shaped curve. This beginner-friendly guide covers the difference between Linear and Logistic Regression, key assumptions, and practical data preprocessing steps for accurate classification.
Mar 18, 2025
12 min read

What is Logistic Regression?

Imagine you’re a teacher, and you want to predict whether a student will pass or fail an exam. You have some data, like how many hours they studied. Logistic regression is like a smart tool that helps you make this prediction.

It’s a way to answer yes-or-no questions (like pass/fail, spam/not spam, cat/dog) based on some input data.

Logistic Regression vs Linear Regression

Alright, let’s compare logistic regression and linear regression in the simplest way possible. Think of them as two siblings—both are smart, but they’re good at solving different kinds of problems. Let’s break it down!

What is Linear Regression?

Linear regression is like the sibling who loves straight lines and numbers. It’s great at predicting continuous values (like height, weight, or price).

  • Example: You want to predict the price of a house based on its size.
    • Input: House size (e.g., 1,000 square feet).
    • Output: A number (e.g., $200,000).

Linear regression draws a straight line through your data points and uses it to predict values.

How is Logistic Regression Different?

Logistic regression is the sibling who’s all about yes-or-no questions. It doesn’t care about predicting numbers—it’s all about probabilities and making decisions.

  • Example: You want to predict if a house will sell quickly (yes or no) based on its size.
    • Input: House size (e.g., 1,000 square feet).
    • Output: A probability (e.g., 0.8 = 80% chance it’ll sell quickly).

Instead of a straight line, logistic regression uses an S-shaped curve to predict probabilities between 0 and 1.

A graph of a logistic modelAI-generated content may be incorrect.

When to Use Each

  • Linear Regression: Use it when you’re predicting a number.
    • Example: “How much will this car cost based on its mileage?”
  • Logistic Regression: Use it when you’re answering a yes-or-no question.
    • Example: “Will this car sell in the next 30 days (yes or no)?”

Why Not Use Linear Regression for Yes/No Questions?

Good question! Here’s why:

  • Linear regression doesn’t know how to handle probabilities. It might predict something like -2 or 1.5, which doesn’t make sense for yes/no questions.
  • Logistic regression fixes this by squishing the output into a range between 0 and 1 using its S-shaped curve.

What is the Sigmoid Function?

The sigmoid curve is the shape you get when you graph the logistic function. It looks like a smooth, stretched-out S. Here’s how it works:

  • If you give it a really big number (like 100), it’ll squish it close to 1.
  • If you give it a really small number (like -100), it’ll squish it close to 0.
  • If you give it 0, it lands right in the middle at 0.5.

So, the sigmoid curve is like a referee that says:

  • “Big positive numbers? You’re close to 1.”
  • “Big negative numbers? You’re close to 0.”
  • “Neutral numbers? You’re somewhere in the middle.”

Why Does It Look Like an S?

The S-shape happens because the logistic function grows quickly at first, then slows down as it gets closer to 0 or 1. It’s like a car speeding up on a straight road but hitting the brakes as it approaches a stop sign.

The Formula (Don’t Worry, It’s Simple!)

Here’s the logistic function formula:

f(x) = 1 / (1 + e^(-x))

Let’s break it down:

  • x is the input (any number you want to squish).
  • e is just a math constant (about 2.718—it’s like π but for exponential stuff).
  • The formula takes your input, flips it around a bit, and spits out a number between 0 and 1.

Mathematical Foundations of Logistic Regression

Logistic regression is like a math-powered crystal ball. It helps you predict whether something will happen or not—like "Will this email be spam?" or "Will this customer buy my product?" It’s all about making yes/no predictions (also called binary classification).

But here’s the twist: instead of just saying "yes" or "no," logistic regression gives you a probability. For example:

  • "There’s an 80% chance this email is spam."
  • "There’s a 20% chance this customer will buy.

The Big Idea

Logistic regression takes some input data (like how much time someone spends on your website) and spits out a probability between 0 and 1. If the probability is close to 1, it’s a "yes." If it’s close to 0, it’s a "no."

Step 1: Start with Linear Regression

Logistic regression starts with something you might already know: linear regression. That’s the one where you draw a straight line through data points to predict something. The formula looks like this:

Linear Regression in Python. In linear regression, you are… | by Dannar  Mawardi | TDS Archive | Medium

But here’s the problem: linear regression can give you predictions that are way too big or too small. For example, it might predict a probability of 1.5 or -0.3, which doesn’t make sense because probabilities must be between 0 and 1.

Step 2: Enter the Logistic Function

To fix this, we use the logistic function (a.k.a. the sigmoid function). This magical formula squishes any number into a range between 0 and 1. The formula is:

A mathematical equation with numbers and symbolsAI-generated content may be incorrect.

Here’s what it means:

x is the input value (it could be any number, positive or negative)

e is the mathematical constant (approximately 2.718), which is the base of natural logarithms.

The function f(x) takes any input x and transforms it into a value between 0 and 1. If the input is very large, it gets closer to 1. If it is very small, it gets closer to 0.

Step 3: Log-Odds and Probabilities

Logistic regression doesn’t predict probabilities directly. Instead, it predicts something called the log-odds (don’t worry, it’s not as scary as it sounds). Here’s the formula:

A black text with black lettersAI-generated content may be incorrect.
  • The odds are just the ratio of "yes" to "no" outcomes. For example, if there’s a 3:1 chance of something happening, the odds are 3.
  • The log-odds are just the natural logarithm of the odds. It’s a way to stretch the odds into a straight line.

The formula for odds is:

A black and white math equationAI-generated content may be incorrect.

p is the probability of success (event occurring)

1-p is the probability of failure (event not occurring)

From these two equations, we get:

A black and white math equationAI-generated content may be incorrect.

Solving for p:

Exponentiate both sides to remove the log:

A mathematical equation with black textAI-generated content may be incorrect.

And therefore, we get: 

A math equation with numbers and symbolsAI-generated content may be incorrect.

Step 4: Training the Model

When you train a logistic regression model, you’re basically finding the best values for b0​ and b1 (the intercept and slope) so that the predicted probabilities match the actual outcomes as closely as possible. This is done using a method called maximum likelihood estimation (MLE), which is a fancy way of saying, "Let’s find the parameters that make the data most likely."

Cross-Entropy Loss

Cross-Entropy Loss is a way to measure how bad your predictions are when you're trying to classify something (like predicting if an email is spam or not). It’s like a "badness score" for your model. The smaller the score, the better your model is doing.

Here’s a simplified explanation:

  1. Guessing Game: Your model is like a player in a guessing game, trying to predict the right answers with probabilities.
  2. Reality Check: Cross-Entropy Loss compares the model's predictions (probabilities) against the actual answers.
  3. Penalty Points: It gives a score based on how far off the predictions are. More wrong = more penalty points.
  4. Goal: Minimize the Cross-Entropy Loss score to get your model to make better predictions.

The formula for Cross-Entropy Loss is

Types of Logistic Regression

While it might seem confusing, although Logistic Regression is fundamentally binary, there are multiple types of logistic regression which all use the logistic function to model probabilities. 

Imagine you’re throwing a party and you want to plan it. Depending on  the situation, there are different types of logistic tools that you can apply:

  1. Binary Logistic Regression = Yes or No?

This is the simplest type. It’s like asking a straightforward question with only two possible answers: Yes or No.

For example, you want to predict:
“Will Trisha be coming to the party?”
The answer is either Yes(1) or No(0).

It’s like flipping a coin, only two outcomes are possible.

  1. Multinomial Logistic Regression = Multiple Choices

Now, imagine you’re asking a question with more than two possible answers. This is where multinomial logistic regression comes in.

For example, you’re asking your friends, "What type of drink will you bring to the party?"

Possible answers: Soda, Juice, or Water.

It’s like choosing from a menu—there are multiple options, and you’re predicting which one someone will pick.

  1. Ordinal Logistic Regression = Ranked Choices

This one is for situations where the answers are ranked or have an order. It’s not just about picking an option—it’s about picking one that has a specific position in a hierarchy.

For example, you’re asking your friends, "How excited are you about the party?"

Possible answers: Not excited, Somewhat excited, Very excited.

It’s like rating something on a scale—there’s a clear order to the choices.

Assumptions of Logistic Regression

Think of logistic regression as a tool that helps you make predictions (like "Will it rain tomorrow? Yes or No?"). But for it to work properly, it has a few "rules" or "assumptions" it needs to follow. If these assumptions aren’t met, the predictions might not be reliable. Let’s dive in:

  1. The Outcome is Categorical: The thing you’re trying to predict (the dependent variable) must be a category, not a number (unless the number is a category)
  2. Independence of Observation: Each data point (observation) should be independent of the others. In other words, one person’s outcome shouldn’t influence another person’s outcome.
  3. No Perfect Collinearity: The independent variables (the things you’re using to make predictions) shouldn’t be too closely related to each other. For example: Good: Using "Age" and "Income" to predict if someone will buy a product (they’re related, but not identical). Bad: Using "Age" and "Age in Months" (they’re basically the same thing).
  4. Linearity of the Logit: The relationship between the independent variables and the logit (a fancy term for the log of the odds) should be linear. Don’t worry about the math—just know that logistic regression assumes a certain kind of relationship between the variables.
  5. Sufficient Sample Size: You need enough data to make reliable predictions. If your dataset is too small, the results might not be trustworthy.
  6. No Outliers with Too Much Influence: Outliers (extreme values) shouldn’t have too much influence on the results.
  7. The Independent Variables are Meaningful: The variables you use to make predictions should actually be related to the outcome. For example: Good: Using "Age" and "Income" to predict if someone will buy a product. Bad: Using "Favourite Ice Cream Flavour" to predict if someone will buy a product (unless you’re selling ice cream!).

Preprocessing for Logistic Regression

Think of building a predictive model like baking a cake. If you skip steps, use the wrong ingredients, or don’t measure properly, the cake won’t turn out well. The same goes for logistic regression—if you don’t preprocess your data correctly, your predictions might flop! This guide will walk you through the basics of preparing your data to get the best results, even if you're just starting out.

Why is Preprocessing Important?

Imagine trying to get directions from a GPS that has your location wrong and no map scale. You'd end up confused and frustrated! Logistic regression is a "gentle" algorithm that can handle some bumps (like unscaled data), but preprocessing helps ensure it works accurately and efficiently.

Here’s why preprocessing matters:

  • Clean and consistent data helps the model make better predictions.
  • Scaling and handling extreme values prevent the model from focusing too much on certain features.
  • Encoding makes the data readable and usable for the algorithm.

Preprocessing is all about helping your algorithm do its job—kind of like tidying your room before inviting guests over. Now, let's get started!

Handling Missing Values

Step 1: Find the Missing Values

Check columns for empty and null values.

Step 2: Fill in the Gaps

Use Data Imputation to replace the 0s and null values. For a quick fix, use mean and median for numbers, and the most common category (mode) for categories.

Step 3: Drop Rows

If you have way too many missing values, it might be better to delete the row altogether. But be careful—don’t throw away too much data!

Encode Categorical Variables

Logistic regression doesn’t speak "words"—it works with numbers only. If your data includes categories (like "red, blue, green"), you’ll need to convert them into numbers. There’s two ways to do this: 

One-Hot Encoding:

Create a separate column for each category and fill it with 1s (yes) and 0s (no)

How to Perform One-Hot Encoding in Python

  

Label Encoding

Assign each category a unique number.

label encoding. what is label encoding. label encoding in machine learning.  sklearn label encoding . python label encoding. python label encoder |  Medium

One-hot encoding is usually better because it prevents the model from thinking one category (like 3) is "bigger" than another (like 1).

Scale Your Features

Imagine trying to add apples and elephants. Weird, right? Now imagine measuring something in meters and feet side-by-side—just as awkward! Scaling puts all your features on the same ruler so one doesn’t overpower the others.

What to do:

  1. Standardize Data: Subtract the mean and divide by the standard deviation. This makes the values centre around 0 and have similar spreads.
  2. Normalize Data: Squish all the values into a range of 0 to 1.

Handle Outliers

Outliers are sneaky! These extreme values can skew your model's performance. It’s like trying to take an average height using basketball players—they’ll tilt the results.

What to do:

  1. Find Outliers: Use visuals like box plots or scatter plots to spot data points that don’t fit.
  2. Trim or Transform: You can remove them if they don’t make sense (e.g., a height of 10 meters),or cap them at maximum value to reduce their effect.
  3. Log Transformation: If the data is highly skewed (data piled up on one end, like salaries), take the log to "squash" extreme values.

Do check out our comprehensive guide for handling outliers.

Feature Selection

Not every feature in your data is useful. Including too many can clutter your model like junk mail in an inbox.

What to Do:

  1. Check for Correlation:

If two features are similar (e.g., height in cm and height in inches), keep only one.

  1. Focus on High-Impact Features:

For example, if you’re predicting whether someone will buy a product, "Income" might matter more than "Hair Colour."

  1. Use Feature Selection Tools:

Python libraries like SelectKBest can find features that really make a difference.

Fun Fact: More features don’t always mean better predictions. Sometimes, less is more!

SIMILAR BLOGS

Interested in Writing for Us?

Share your expertise, inspire others, and join a community of passionate writers. Submit your articles on topics that matter to our readers. Gain visibility, grow your portfolio, and make an impact.
Join Now