Using Hugging Face for NLP Projects

 ๐Ÿค— Using Hugging Face for NLP Projects


Build real-world NLP applications with pre-trained models in minutes.


๐Ÿ“Œ What is Hugging Face?


Hugging Face

 is a company and open-source platform known for:


Transformers library (for NLP, vision, audio, and more)


Thousands of pretrained models (BERT, GPT, T5, etc.)


Datasets for ML/NLP tasks


Model inference, training, and deployment tools


๐Ÿงฐ Key Libraries You'll Use

Library Use

transformers Load, train, and use pre-trained models

datasets Load and process popular NLP datasets

tokenizers Fast tokenization

accelerate Speed up training on CPUs/GPUs

gradio Easily build and demo NLP apps

๐Ÿ”ง Installation

pip install transformers datasets



Optionally, add:


pip install gradio accelerate


๐Ÿง  Common NLP Tasks with Hugging Face

Task Example

Text Classification Sentiment analysis, topic detection

Named Entity Recognition (NER) Extract names, locations, dates

Question Answering Answer questions based on context

Text Generation Generate creative or informative text

Translation English → French, etc.

Summarization Convert long text to short summaries

๐Ÿš€ Quick Start: Sentiment Analysis

from transformers import pipeline


classifier = pipeline("sentiment-analysis")

result = classifier("I love using Hugging Face for NLP!")[0]

print(f"Label: {result['label']}, Confidence: {result['score']:.2f}")



Output:


Label: POSITIVE, Confidence: 0.99


๐Ÿ”ค Tokenization and Model Inference (Manual)

from transformers import AutoTokenizer, AutoModelForSequenceClassification

import torch


model_name = "distilbert-base-uncased-finetuned-sst-2-english"


tokenizer = AutoTokenizer.from_pretrained(model_name)

model = AutoModelForSequenceClassification.from_pretrained(model_name)


inputs = tokenizer("Transformers are amazing!", return_tensors="pt")

outputs = model(**inputs)


probs = torch.nn.functional.softmax(outputs.logits, dim=-1)

print(probs)


๐Ÿ“š Using NLP Datasets

from datasets import load_dataset


dataset = load_dataset("imdb")  # Sentiment dataset

print(dataset["train"][0])  # First review



Other popular datasets:


ag_news – News classification


squad – Question answering


conll2003 – NER


common_voice – Speech-to-text


๐Ÿ—️ Fine-Tuning a Pretrained Model (Text Classification)

Step 1: Load Dataset

from datasets import load_dataset

dataset = load_dataset("ag_news")


Step 2: Preprocess and Tokenize

from transformers import AutoTokenizer


tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")


def tokenize(batch):

    return tokenizer(batch["text"], padding=True, truncation=True)


tokenized_dataset = dataset.map(tokenize, batched=True)


Step 3: Train Model

from transformers import AutoModelForSequenceClassification, Trainer, TrainingArguments


model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=4)


training_args = TrainingArguments(

    output_dir="./results",

    evaluation_strategy="epoch",

    per_device_train_batch_size=8,

    per_device_eval_batch_size=8,

    num_train_epochs=3,

    logging_dir="./logs",

)


trainer = Trainer(

    model=model,

    args=training_args,

    train_dataset=tokenized_dataset["train"],

    eval_dataset=tokenized_dataset["test"]

)


trainer.train()


๐Ÿงช Demo Your Model with Gradio

import gradio as gr


def predict_sentiment(text):

    result = classifier(text)[0]

    return f"{result['label']} ({result['score']:.2f})"


gr.Interface(fn=predict_sentiment, inputs="text", outputs="text").launch()


๐Ÿง  Top Hugging Face Models by Task

Task Model

Sentiment Analysis distilbert-base-uncased-finetuned-sst-2-english

NER dslim/bert-base-NER

QA deepset/roberta-base-squad2

Text Generation gpt2, tiiuae/falcon-7b-instruct

Summarization facebook/bart-large-cnn

Translation Helsinki-NLP/opus-mt-en-fr


Find more here: https://huggingface.co/models


๐Ÿ“ฆ Bonus: Hugging Face Hub


Upload and share your models/datasets:


pip install huggingface_hub

huggingface-cli login  # use your HF token



Upload a model:


from huggingface_hub import notebook_login

notebook_login()


model.push_to_hub("your-model-name")


๐Ÿงญ Final Tips


Use pipeline() for rapid prototyping


Fine-tune when pre-trained accuracy isn’t enough


Use datasets library for real-world NLP data


Use gradio or streamlit for demos


Monitor GPU usage with accelerate when training

Learn Data Science Course in Hyderabad

Read More

MLflow for Machine Learning Experiment Tracking

How to Automate Data Science Workflows with Apache Airflow

Using Streamlit for Building Data Science Applications

How Docker and Kubernetes Help in Data Science Deployment

Visit Our Quality Thought Training Institute in Hyderabad

Get Directions

Comments

Popular posts from this blog

Understanding Snowflake Editions: Standard, Enterprise, Business Critical

Installing Tosca: Step-by-Step Guide for Beginners

Entry-Level Cybersecurity Jobs You Can Apply For Today