Logistic Regression: A Practical Guide for Classification

September 09, 2025

🔍 What is Logistic Regression?

Logistic Regression is a supervised learning algorithm used for binary classification tasks — i.e., predicting whether something is True/False, Yes/No, or 1/0.

Despite its name, it is not a regression algorithm. It’s used to classify outcomes into discrete categories.

🎯 When to Use Logistic Regression

Use Logistic Regression when:

Your target variable is binary (e.g., spam vs. not spam, churn vs. no churn)

You want a fast and interpretable baseline model

You need probabilities (not just labels)

🧠 How It Works

Unlike linear regression, which predicts a real number, logistic regression predicts a probability using the sigmoid (logistic) function:

🔁 Sigmoid Function:

𝜎

(

𝑧

)

𝑒

−

𝑧

σ(z)=

1+e

−z

Where:

𝑧

𝑤

⋅

𝑥

𝑏

z=w⋅x+b

The output

𝜎

(

𝑧

)

σ(z) is always between 0 and 1 — interpreted as the probability that the input belongs to the positive class.

🔁 Decision Rule

If:

𝑃

(

𝑦

∣

𝑥

)

𝜎

(

𝑧

)

≥

0.5

⇒

predict 1 (positive class)

P(y=1∣x)=σ(z)≥0.5⇒predict 1 (positive class)

Otherwise:

predict 0 (negative class)

🧮 Cost Function: Binary Cross-Entropy

Instead of Mean Squared Error, we use log loss or binary cross-entropy:

𝐽

(

𝑤

𝑏

)

−

𝑛

∑

𝑖

𝑛

[

𝑦

𝑖

log

⁡

(

𝑦

𝑖

)

(

−

𝑦

𝑖

)

log

⁡

(

−

𝑦

𝑖

)

]

J(w,b)=−

i=1

∑

log(

)+(1−y

)log(1−

)]

✅ Step-by-Step Example with scikit-learn

We’ll classify whether a person has diabetes using a public dataset.

🔧 Libraries Needed

import numpy as np

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import classification_report, confusion_matrix

import seaborn as sns

import matplotlib.pyplot as plt

📥 Load Data

# Load dataset (from CSV or online source)

url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"

cols = ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin',

'BMI', 'DiabetesPedigreeFunction', 'Age', 'Outcome']

df = pd.read_csv(url, names=cols)

print(df.head())

🧪 Prepare the Data

X = df.drop("Outcome", axis=1)

y = df["Outcome"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

🧠 Train the Model

model = LogisticRegression(max_iter=1000)

model.fit(X_train, y_train)

📊 Evaluate the Model

y_pred = model.predict(X_test)

print(confusion_matrix(y_test, y_pred))

print(classification_report(y_test, y_pred))

# Optional: visualize confusion matrix

sns.heatmap(confusion_matrix(y_test, y_pred), annot=True, fmt='d', cmap='Blues')

plt.xlabel("Predicted")

plt.ylabel("Actual")

plt.title("Confusion Matrix")

plt.show()

📈 Predict Probabilities

y_probs = model.predict_proba(X_test)[:, 1] # Probabilities of class 1

print(y_probs[:10]) # Show first 10 probabilities

📌 Key Terms in the Output

Metric Meaning

Precision Of the predicted positives, how many were correct?

Recall Of the actual positives, how many did we identify?

F1-Score Balance between precision and recall

Support Number of true instances for each class

🧠 Logistic Regression Insights

Fast to train on large datasets

Works best with linearly separable data

Sensitive to outliers and feature scaling

Can be extended to multi-class with multinomial or one-vs-rest strategies

🧪 Pros and Cons

Pros Cons

Simple, fast, interpretable Doesn't work well with complex relationships

Outputs class probabilities Assumes linear relationship between features and log-odds

Easy to regularize (L1/L2) Sensitive to irrelevant features

🚀 Summary

Component Description

Model Type Classification

Function Sigmoid (logistic)

Loss Function Binary Cross-Entropy

Output Probability between 0 and 1

Prediction Rule ≥ 0.5 → 1, else 0

📎 Next Steps

Would you like:

An implementation from scratch without using scikit-learn?

A guide to multi-class logistic regression?

To see how feature scaling affects results?

Learn Data Science Course in Hyderabad

Deep dive into specific algorithms with clear explanations and code.

Machine Learning Algorithms

Automating Your Data Pipeline with Python Scripts

Visit Our Quality Thought Training Institute in Hyderabad

Get Directions

Search This Blog

Best Quality Thought Software Institute Training in Hyderabad