Deep dive into specific algorithms with clear explanations and code.
๐ Deep Dive into Specific ML Algorithms
We'll cover the following algorithms:
Linear Regression (Regression)
Logistic Regression (Classification)
Decision Tree (Classification/Regression)
Random Forest (Ensemble)
Support Vector Machine (SVM) (Classification)
K-Means Clustering (Unsupervised)
Principal Component Analysis (PCA) (Dimensionality Reduction)
1. Linear Regression
๐ Use Case: Predicting a continuous value (e.g., house prices)
๐ How it works:
Fits a straight line (or hyperplane) that minimizes the error between predicted and actual values using least squares.
๐ Code Example:
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression
import matplotlib.pyplot as plt
# Generate dummy data
X, y = make_regression(n_samples=100, n_features=1, noise=10)
# Model
model = LinearRegression()
model.fit(X, y)
# Predict
y_pred = model.predict(X)
# Plot
plt.scatter(X, y, color='blue')
plt.plot(X, y_pred, color='red')
plt.title("Linear Regression")
plt.show()
2. Logistic Regression
๐ Use Case: Binary classification (e.g., spam or not spam)
๐ How it works:
Uses the sigmoid function to predict probabilities and maps output to classes (0 or 1).
๐ Code Example:
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
# Load data
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Model
model = LogisticRegression(max_iter=10000)
model.fit(X_train, y_train)
# Evaluate
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))
3. Decision Tree
๐ Use Case: Both classification and regression
๐ How it works:
Builds a tree by splitting data on feature values that result in the highest information gain (Gini/entropy).
๐ Code Example:
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt
# Load data
X, y = load_iris(return_X_y=True)
# Model
tree = DecisionTreeClassifier(max_depth=3)
tree.fit(X, y)
# Plot tree
plt.figure(figsize=(12, 8))
plot_tree(tree, feature_names=load_iris().feature_names, class_names=load_iris().target_names, filled=True)
plt.show()
4. Random Forest
๐ Use Case: Improves accuracy and generalization over a single decision tree
๐ How it works:
Ensemble of decision trees built on bootstrapped samples with feature randomness → better generalization.
๐ Code Example:
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Load data
X, y = load_wine(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Model
rf = RandomForestClassifier(n_estimators=100)
rf.fit(X_train, y_train)
# Predict
y_pred = rf.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")
5. Support Vector Machine (SVM)
๐ Use Case: Classification with clear margin separation
๐ How it works:
Finds the optimal hyperplane that maximizes margin between classes. Can use kernels for non-linear boundaries.
๐ Code Example:
from sklearn.svm import SVC
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
# Generate data
X, y = make_classification(n_samples=200, n_features=2, n_classes=2, n_redundant=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Model
svm = SVC(kernel='linear')
svm.fit(X_train, y_train)
# Evaluate
print(classification_report(y_test, svm.predict(X_test)))
6. K-Means Clustering
๐ Use Case: Grouping unlabeled data (e.g., customer segmentation)
๐ How it works:
Assigns data points to k clusters by minimizing intra-cluster distance.
๐ Code Example:
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
# Generate data
X, _ = make_blobs(n_samples=300, centers=3, cluster_std=0.6)
# Model
kmeans = KMeans(n_clusters=3)
kmeans.fit(X)
labels = kmeans.labels_
centroids = kmeans.cluster_centers_
# Plot
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
plt.scatter(centroids[:, 0], centroids[:, 1], s=300, c='red', marker='x')
plt.title("K-Means Clustering")
plt.show()
7. Principal Component Analysis (PCA)
๐ Use Case: Reduce dimensionality, visualize high-dimensional data
๐ How it works:
Transforms data into a set of linearly uncorrelated components ranked by variance.
๐ Code Example:
from sklearn.decomposition import PCA
from sklearn.datasets import load_digits
import matplotlib.pyplot as plt
# Load data
X, y = load_digits(return_X_y=True)
# PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)
# Plot
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y, cmap='Spectral', s=10)
plt.title("PCA of Digits Dataset")
plt.xlabel("PC1")
plt.ylabel("PC2")
plt.colorbar()
plt.show()
✅ Summary Table
Algorithm Type Key Use
Linear Regression Supervised Predict continuous value
Logistic Regression Supervised Binary classification
Decision Tree Supervised Interpretability, fast
Random Forest Supervised High accuracy, ensemble
SVM Supervised Clear margin classification
K-Means Unsupervised Grouping unlabeled data
PCA Unsupervised Dimensionality reduction
Learn Data Science Course in Hyderabad
Read More
Automating Your Data Pipeline with Python Scripts
Web Scraping with BeautifulSoup and Scrapy
Creating Interactive Dashboards with Plotly
Visit Our Quality Thought Training Institute in Hyderabad
Comments
Post a Comment