Logistic Regression: A Practical Guide for Classification
๐ What is Logistic Regression?
Logistic Regression is a supervised learning algorithm used for binary classification tasks — i.e., predicting whether something is True/False, Yes/No, or 1/0.
Despite its name, it is not a regression algorithm. It’s used to classify outcomes into discrete categories.
๐ฏ When to Use Logistic Regression
Use Logistic Regression when:
Your target variable is binary (e.g., spam vs. not spam, churn vs. no churn)
You want a fast and interpretable baseline model
You need probabilities (not just labels)
๐ง How It Works
Unlike linear regression, which predicts a real number, logistic regression predicts a probability using the sigmoid (logistic) function:
๐ Sigmoid Function:
๐
(
๐ง
)
=
1
1
+
๐
−
๐ง
ฯ(z)=
1+e
−z
1
Where:
๐ง
=
๐ค
⋅
๐ฅ
+
๐
z=w⋅x+b
The output
๐
(
๐ง
)
ฯ(z) is always between 0 and 1 — interpreted as the probability that the input belongs to the positive class.
๐ Decision Rule
If:
๐
(
๐ฆ
=
1
∣
๐ฅ
)
=
๐
(
๐ง
)
≥
0.5
⇒
predict 1 (positive class)
P(y=1∣x)=ฯ(z)≥0.5⇒predict 1 (positive class)
Otherwise:
predict 0 (negative class)
predict 0 (negative class)
๐งฎ Cost Function: Binary Cross-Entropy
Instead of Mean Squared Error, we use log loss or binary cross-entropy:
๐ฝ
(
๐ค
,
๐
)
=
−
1
๐
∑
๐
=
1
๐
[
๐ฆ
๐
log
(
๐ฆ
^
๐
)
+
(
1
−
๐ฆ
๐
)
log
(
1
−
๐ฆ
^
๐
)
]
J(w,b)=−
n
1
i=1
∑
n
[y
i
log(
y
^
i
)+(1−y
i
)log(1−
y
^
i
)]
✅ Step-by-Step Example with scikit-learn
We’ll classify whether a person has diabetes using a public dataset.
๐ง Libraries Needed
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt
๐ฅ Load Data
# Load dataset (from CSV or online source)
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
cols = ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin',
'BMI', 'DiabetesPedigreeFunction', 'Age', 'Outcome']
df = pd.read_csv(url, names=cols)
print(df.head())
๐งช Prepare the Data
X = df.drop("Outcome", axis=1)
y = df["Outcome"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
๐ง Train the Model
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
๐ Evaluate the Model
y_pred = model.predict(X_test)
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
# Optional: visualize confusion matrix
sns.heatmap(confusion_matrix(y_test, y_pred), annot=True, fmt='d', cmap='Blues')
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix")
plt.show()
๐ Predict Probabilities
y_probs = model.predict_proba(X_test)[:, 1] # Probabilities of class 1
print(y_probs[:10]) # Show first 10 probabilities
๐ Key Terms in the Output
Metric Meaning
Precision Of the predicted positives, how many were correct?
Recall Of the actual positives, how many did we identify?
F1-Score Balance between precision and recall
Support Number of true instances for each class
๐ง Logistic Regression Insights
Fast to train on large datasets
Works best with linearly separable data
Sensitive to outliers and feature scaling
Can be extended to multi-class with multinomial or one-vs-rest strategies
๐งช Pros and Cons
Pros Cons
Simple, fast, interpretable Doesn't work well with complex relationships
Outputs class probabilities Assumes linear relationship between features and log-odds
Easy to regularize (L1/L2) Sensitive to irrelevant features
๐ Summary
Component Description
Model Type Classification
Function Sigmoid (logistic)
Loss Function Binary Cross-Entropy
Output Probability between 0 and 1
Prediction Rule ≥ 0.5 → 1, else 0
๐ Next Steps
Would you like:
An implementation from scratch without using scikit-learn?
A guide to multi-class logistic regression?
To see how feature scaling affects results?
Learn Data Science Course in Hyderabad
Read More
Linear Regression: Explained and Implemented from Scratch
Deep dive into specific algorithms with clear explanations and code.
Automating Your Data Pipeline with Python Scripts
Visit Our Quality Thought Training Institute in Hyderabad
Comments
Post a Comment