Introduction
Web scraping allows Python to fetch and extract data from websites.
With libraries like requests and BeautifulSoup,
you can automate data collection, price tracking, content extraction, and more.
1. Installing Required Libraries
pip install requests beautifulsoup4
2. Basic GET Request
import requests
page = requests.get("https://example.com")
print(page.text)
3. Parsing HTML with BeautifulSoup
from bs4 import BeautifulSoup
soup = BeautifulSoup(page.text, "html.parser")
title = soup.find("h1").text
print("Page title:", title)
4. Finding Multiple Elements
items = soup.find_all("p")
for p in items:
print(p.text)
5. Scraping Attributes
img = soup.find("img")
print(img["src"])
6. Using CSS Selectors
links = soup.select("a[href]")
for a in links:
print(a["href"])
7. Scraping a Table
rows = soup.select("table tr")
for row in rows:
cells = [c.text for c in row.select("td")]
print(cells)
8. Handling Headers (Simulate Browser)
headers = {
"User-Agent": "Mozilla/5.0"
}
page = requests.get("https://example.com", headers=headers)
9. Preventing Blocks / Rate Limits
- Use delays between requests
- Rotate user-agents
- Do not hammer servers
import time
time.sleep(1)
10. Scraping JSON APIs
res = requests.get("https://jsonplaceholder.typicode.com/users")
data = res.json()
for user in data:
print(user["name"])
11. Saving Scraped Data
with open("output.txt", "w", encoding="utf-8") as f:
for p in items:
f.write(p.text + "\n")
12. Warning: Legal & Ethical Rules
- Always check a website’s robots.txt
- Never scrape login-restricted content
- Do not overload servers
- Use scraping responsibly
Summary
requestsfetches HTMLBeautifulSoupparses and extracts data- CSS selectors make scraping easier
- You can scrape tables, lists, images, links, and APIs
- Always scrape ethically