Web Scraping with BeautifulSoup and Scrapy

September 08, 2025

🕸️ What Is Web Scraping?

Web scraping is the process of extracting data from websites using automated tools or code.

🧰 Tools You’ll Learn:

Tool Description Use Case

BeautifulSoup Lightweight library for parsing HTML/XML Great for small, simple scraping tasks

Scrapy Powerful web scraping framework Ideal for large-scale, complex scraping projects

📦 Installation

Install both using pip:

pip install beautifulsoup4 requests lxml scrapy

📍 PART 1: Web Scraping with BeautifulSoup

🧪 Example: Scrape Titles from a Blog Page

import requests

from bs4 import BeautifulSoup

# Target URL

url = 'https://example-blog.com'

# Send request

response = requests.get(url)

soup = BeautifulSoup(response.content, 'lxml') # or 'html.parser'

# Extract article titles

titles = soup.find_all('h2', class_='post-title')

# Print results

for title in titles:

print(title.text.strip())

✅ Key BeautifulSoup Functions

Function What It Does

soup.find() Finds first matching tag

soup.find_all() Finds all matching tags

tag.text Extracts inner text

tag['href'] Extracts attribute value

🚀 PART 2: Web Scraping with Scrapy

Scrapy is more structured and powerful — perfect for crawling multiple pages or websites.

🛠️ Step 1: Create a Scrapy Project

scrapy startproject my_scraper

cd my_scraper

🛠️ Step 2: Generate a Spider

scrapy genspider blog_spider example-blog.com

🧪 Step 3: Sample Spider Code

# my_scraper/spiders/blog_spider.py

import scrapy

class BlogSpider(scrapy.Spider):

name = 'blog_spider'

start_urls = ['https://example-blog.com']

def parse(self, response):

for post in response.css('h2.post-title'):

yield {

'title': post.css('a::text').get(),

'link': post.css('a::attr(href)').get()

}

🛠️ Step 4: Run the Spider

scrapy crawl blog_spider

Or to save the results:

scrapy crawl blog_spider -o posts.json

🔍 When to Use What?

Task Use BeautifulSoup Use Scrapy

Small scripts ✅ ❌

Quick one-time jobs ✅ ❌

Need to follow many links ❌ ✅

High performance & large data ❌ ✅

Exporting to CSV/JSON ✅ (manually) ✅ (built-in)

Handling AJAX or JavaScript ❌ ❌ (use Selenium or Playwright)

🔐 Notes on Ethics & Legality

Always check the site's robots.txt (e.g., example.com/robots.txt)

Avoid overloading servers — use delays or polite crawling

Some sites may ban or block bots

🧠 Summary

Tool Pros Cons

BeautifulSoup Easy to use, simple syntax Not great for large projects

Scrapy Fast, scalable, powerful Steeper learning curve

Learn Data Science Course in Hyderabad

A Comparison of Python vs. R for Data Science

The Best Python Libraries for Machine Learning

Building Your First Data Science Project in Jupyter Notebook

Visit Our Quality Thought Training Institute in Hyderabad

Get Directions

Search This Blog

Best Quality Thought Software Institute Training in Hyderabad