Monday, September 29, 2025

thumbnail

Creating a Sentiment Analysis Model with Machine Learning

 Step 1: Understand Sentiment Analysis


Sentiment Analysis is the process of identifying the emotional tone (positive, negative, or neutral) of a piece of text, such as:


“I love this product!” → Positive


“This is the worst movie ever.” → Negative


“It’s okay, not great.” → Neutral


✅ Step 2: Tools & Libraries You'll Need


Make sure you have these installed:


pip install pandas numpy scikit-learn nltk


✅ Step 3: Load Your Dataset


You need a labeled dataset with text and corresponding sentiment labels.


Example: Load sample data

import pandas as pd


# Example data

data = {

    'text': [

        'I love this product!',

        'This is terrible and awful.',

        'Amazing experience, would recommend.',

        'Not bad, but not great either.',

        'Worst purchase ever.'

    ],

    'sentiment': ['positive', 'negative', 'positive', 'neutral', 'negative']

}


df = pd.DataFrame(data)


✅ Step 4: Preprocess the Text


Text needs to be cleaned and converted into numerical features.


import nltk

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.preprocessing import LabelEncoder


nltk.download('punkt')

nltk.download('stopwords')


from nltk.corpus import stopwords

import string


# Function to clean text

def preprocess_text(text):

    tokens = nltk.word_tokenize(text.lower())

    tokens = [word for word in tokens if word.isalpha()]  # Remove punctuation

    tokens = [word for word in tokens if word not in stopwords.words('english')]  # Remove stopwords

    return " ".join(tokens)


df['cleaned_text'] = df['text'].apply(preprocess_text)


✅ Step 5: Convert Text to Features


Use Bag of Words (BoW) or TF-IDF to turn text into numbers.


from sklearn.feature_extraction.text import TfidfVectorizer


vectorizer = TfidfVectorizer()

X = vectorizer.fit_transform(df['cleaned_text'])


✅ Step 6: Encode the Labels


Convert string labels (like “positive”) to numerical form.


le = LabelEncoder()

y = le.fit_transform(df['sentiment'])  # positive=2, negative=0, neutral=1


✅ Step 7: Train a Machine Learning Model


You can use classifiers like Logistic Regression, SVM, or Naive Bayes.


from sklearn.model_selection import train_test_split

from sklearn.naive_bayes import MultinomialNB

from sklearn.metrics import classification_report, accuracy_score


# Split the data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


# Train the model

model = MultinomialNB()

model.fit(X_train, y_train)


# Predict

y_pred = model.predict(X_test)


# Evaluate

print("Accuracy:", accuracy_score(y_test, y_pred))

print("\nClassification Report:\n", classification_report(y_test, y_pred, target_names=le.classes_))


✅ Step 8: Test on New Data

def predict_sentiment(text):

    cleaned = preprocess_text(text)

    vect = vectorizer.transform([cleaned])

    pred = model.predict(vect)

    return le.inverse_transform(pred)[0]


# Example

print(predict_sentiment("I absolutely love it!"))


๐Ÿง  Summary of the Workflow

Step Task Tools Used

1 Load dataset pandas

2 Clean & tokenize text nltk

3 Convert text to numbers TfidfVectorizer

4 Encode labels LabelEncoder

5 Train/test split train_test_split

6 Train model MultinomialNB / LogisticRegression

7 Evaluate model accuracy_score, classification_report

8 Make predictions model.predict()

๐Ÿš€ Want to Try with a Real Dataset?


You can use:


IMDb movie reviews (positive/negative)


Twitter sentiment datasets


Amazon product reviews


Let me know and I’ll help you load and train with a real dataset.

Learn AI ML Course in Hyderabad

Read More

How to Use Pre-trained Models for Natural Language Processing

NLP & Text-Based AI

Training Deep Learning Models: Common Pitfalls and How to Avoid Them

Understanding Transformer Models for NLP


Subscribe by Email

Follow Updates Articles from This Blog via Email

No Comments

About

Search This Blog

Powered by Blogger.

Blog Archive