Step 1: Understand Sentiment Analysis
Sentiment Analysis is the process of identifying the emotional tone (positive, negative, or neutral) of a piece of text, such as:
“I love this product!” → Positive
“This is the worst movie ever.” → Negative
“It’s okay, not great.” → Neutral
✅ Step 2: Tools & Libraries You'll Need
Make sure you have these installed:
pip install pandas numpy scikit-learn nltk
✅ Step 3: Load Your Dataset
You need a labeled dataset with text and corresponding sentiment labels.
Example: Load sample data
import pandas as pd
# Example data
data = {
'text': [
'I love this product!',
'This is terrible and awful.',
'Amazing experience, would recommend.',
'Not bad, but not great either.',
'Worst purchase ever.'
],
'sentiment': ['positive', 'negative', 'positive', 'neutral', 'negative']
}
df = pd.DataFrame(data)
✅ Step 4: Preprocess the Text
Text needs to be cleaned and converted into numerical features.
import nltk
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.preprocessing import LabelEncoder
nltk.download('punkt')
nltk.download('stopwords')
from nltk.corpus import stopwords
import string
# Function to clean text
def preprocess_text(text):
tokens = nltk.word_tokenize(text.lower())
tokens = [word for word in tokens if word.isalpha()] # Remove punctuation
tokens = [word for word in tokens if word not in stopwords.words('english')] # Remove stopwords
return " ".join(tokens)
df['cleaned_text'] = df['text'].apply(preprocess_text)
✅ Step 5: Convert Text to Features
Use Bag of Words (BoW) or TF-IDF to turn text into numbers.
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(df['cleaned_text'])
✅ Step 6: Encode the Labels
Convert string labels (like “positive”) to numerical form.
le = LabelEncoder()
y = le.fit_transform(df['sentiment']) # positive=2, negative=0, neutral=1
✅ Step 7: Train a Machine Learning Model
You can use classifiers like Logistic Regression, SVM, or Naive Bayes.
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report, accuracy_score
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train the model
model = MultinomialNB()
model.fit(X_train, y_train)
# Predict
y_pred = model.predict(X_test)
# Evaluate
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred, target_names=le.classes_))
✅ Step 8: Test on New Data
def predict_sentiment(text):
cleaned = preprocess_text(text)
vect = vectorizer.transform([cleaned])
pred = model.predict(vect)
return le.inverse_transform(pred)[0]
# Example
print(predict_sentiment("I absolutely love it!"))
๐ง Summary of the Workflow
Step Task Tools Used
1 Load dataset pandas
2 Clean & tokenize text nltk
3 Convert text to numbers TfidfVectorizer
4 Encode labels LabelEncoder
5 Train/test split train_test_split
6 Train model MultinomialNB / LogisticRegression
7 Evaluate model accuracy_score, classification_report
8 Make predictions model.predict()
๐ Want to Try with a Real Dataset?
You can use:
IMDb movie reviews (positive/negative)
Twitter sentiment datasets
Amazon product reviews
Let me know and I’ll help you load and train with a real dataset.
Learn AI ML Course in Hyderabad
Read More
How to Use Pre-trained Models for Natural Language Processing
Training Deep Learning Models: Common Pitfalls and How to Avoid Them
Understanding Transformer Models for NLP
Subscribe by Email
Follow Updates Articles from This Blog via Email
No Comments