Python Web Scraping

Introduction

Web scraping allows Python to fetch and extract data from websites. With libraries like requests and BeautifulSoup, you can automate data collection, price tracking, content extraction, and more.

1. Installing Required Libraries

pip install requests beautifulsoup4
    

2. Basic GET Request

import requests

page = requests.get("https://example.com")
print(page.text)
    

3. Parsing HTML with BeautifulSoup

from bs4 import BeautifulSoup

soup = BeautifulSoup(page.text, "html.parser")
title = soup.find("h1").text

print("Page title:", title)
    

4. Finding Multiple Elements

items = soup.find_all("p")
for p in items:
    print(p.text)
    

5. Scraping Attributes

img = soup.find("img")
print(img["src"])
    

6. Using CSS Selectors

links = soup.select("a[href]")
for a in links:
    print(a["href"])
    

7. Scraping a Table

rows = soup.select("table tr")
for row in rows:
    cells = [c.text for c in row.select("td")]
    print(cells)
    

8. Handling Headers (Simulate Browser)

headers = {
    "User-Agent": "Mozilla/5.0"
}

page = requests.get("https://example.com", headers=headers)
    

9. Preventing Blocks / Rate Limits

import time
time.sleep(1)
    

10. Scraping JSON APIs

res = requests.get("https://jsonplaceholder.typicode.com/users")
data = res.json()

for user in data:
    print(user["name"])
    

11. Saving Scraped Data

with open("output.txt", "w", encoding="utf-8") as f:
    for p in items:
        f.write(p.text + "\n")
    

12. Warning: Legal & Ethical Rules

Summary