Python Web Scraping – CodeTweakrs

Introduction

Web scraping allows Python to fetch and extract data from websites. With libraries like requests and BeautifulSoup, you can automate data collection, price tracking, content extraction, and more.

1. Installing Required Libraries

pip install requests beautifulsoup4

2. Basic GET Request

import requests

page = requests.get("https://example.com")
print(page.text)

3. Parsing HTML with BeautifulSoup

from bs4 import BeautifulSoup

soup = BeautifulSoup(page.text, "html.parser")
title = soup.find("h1").text

print("Page title:", title)

4. Finding Multiple Elements

items = soup.find_all("p")
for p in items:
    print(p.text)

5. Scraping Attributes

img = soup.find("img")
print(img["src"])

6. Using CSS Selectors

links = soup.select("a[href]")
for a in links:
    print(a["href"])

7. Scraping a Table

rows = soup.select("table tr")
for row in rows:
    cells = [c.text for c in row.select("td")]
    print(cells)

8. Handling Headers (Simulate Browser)

headers = {
    "User-Agent": "Mozilla/5.0"
}

page = requests.get("https://example.com", headers=headers)

9. Preventing Blocks / Rate Limits

Use delays between requests
Rotate user-agents
Do not hammer servers

import time
time.sleep(1)

10. Scraping JSON APIs

res = requests.get("https://jsonplaceholder.typicode.com/users")
data = res.json()

for user in data:
    print(user["name"])

11. Saving Scraped Data

with open("output.txt", "w", encoding="utf-8") as f:
    for p in items:
        f.write(p.text + "\n")

12. Warning: Legal & Ethical Rules

Always check a website’s robots.txt
Never scrape login-restricted content
Do not overload servers
Use scraping responsibly

Summary

requests fetches HTML
BeautifulSoup parses and extracts data
CSS selectors make scraping easier
You can scrape tables, lists, images, links, and APIs
Always scrape ethically