Creating a Sentiment Analysis Model with Machine Learning

Step 1: Understand Sentiment Analysis

Sentiment Analysis is the process of identifying the emotional tone (positive, negative, or neutral) of a piece of text, such as:

“I love this product!” → Positive

“This is the worst movie ever.” → Negative

“It’s okay, not great.” → Neutral

✅ Step 2: Tools & Libraries You'll Need

Make sure you have these installed:

pip install pandas numpy scikit-learn nltk

✅ Step 3: Load Your Dataset

You need a labeled dataset with text and corresponding sentiment labels.

Example: Load sample data

import pandas as pd

# Example data

data = {

'text': [

'I love this product!',

'This is terrible and awful.',

'Amazing experience, would recommend.',

'Not bad, but not great either.',

'Worst purchase ever.'

'sentiment': ['positive', 'negative', 'positive', 'neutral', 'negative']

}

df = pd.DataFrame(data)

✅ Step 4: Preprocess the Text

Text needs to be cleaned and converted into numerical features.

import nltk

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.preprocessing import LabelEncoder

nltk.download('punkt')

nltk.download('stopwords')

from nltk.corpus import stopwords

import string

# Function to clean text

def preprocess_text(text):

tokens = nltk.word_tokenize(text.lower())

tokens = [word for word in tokens if word.isalpha()] # Remove punctuation

tokens = [word for word in tokens if word not in stopwords.words('english')] # Remove stopwords

return " ".join(tokens)

df['cleaned_text'] = df['text'].apply(preprocess_text)

✅ Step 5: Convert Text to Features

Use Bag of Words (BoW) or TF-IDF to turn text into numbers.

from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer()

X = vectorizer.fit_transform(df['cleaned_text'])

✅ Step 6: Encode the Labels

Convert string labels (like “positive”) to numerical form.

le = LabelEncoder()

y = le.fit_transform(df['sentiment']) # positive=2, negative=0, neutral=1

✅ Step 7: Train a Machine Learning Model

You can use classifiers like Logistic Regression, SVM, or Naive Bayes.

from sklearn.model_selection import train_test_split

from sklearn.naive_bayes import MultinomialNB

from sklearn.metrics import classification_report, accuracy_score

# Split the data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the model

model = MultinomialNB()

model.fit(X_train, y_train)

# Predict

y_pred = model.predict(X_test)

# Evaluate

print("Accuracy:", accuracy_score(y_test, y_pred))

print("\nClassification Report:\n", classification_report(y_test, y_pred, target_names=le.classes_))

✅ Step 8: Test on New Data

def predict_sentiment(text):

cleaned = preprocess_text(text)

vect = vectorizer.transform([cleaned])

pred = model.predict(vect)

return le.inverse_transform(pred)[0]

# Example

print(predict_sentiment("I absolutely love it!"))

🧠 Summary of the Workflow

Step Task Tools Used

1 Load dataset pandas

2 Clean & tokenize text nltk

3 Convert text to numbers TfidfVectorizer

4 Encode labels LabelEncoder

5 Train/test split train_test_split

6 Train model MultinomialNB / LogisticRegression

7 Evaluate model accuracy_score, classification_report

8 Make predictions model.predict()

🚀 Want to Try with a Real Dataset?

You can use:

IMDb movie reviews (positive/negative)

Twitter sentiment datasets

Amazon product reviews

Let me know and I’ll help you load and train with a real dataset.

Learn AI ML Course in Hyderabad

NLP & Text-Based AI

Training Deep Learning Models: Common Pitfalls and How to Avoid Them

Understanding Transformer Models for NLP

September 29, 2025

Monday, September 29, 2025

Creating a Sentiment Analysis Model with Machine Learning

No Comments

About

Search This Blog

Blog Archive

Report Abuse

About Me

Monday, September 29, 2025

Creating a Sentiment Analysis Model with Machine Learning

Subscribe by Email

No Comments

About

Search This Blog

Blog Archive

Report Abuse

About Me