Introduction
Python dominates data analysis thanks to NumPy, Pandas, and Matplotlib. You’ll learn how to load, clean, analyze, transform, and visualize data using standard tools.
1. Installing Required Libraries
pip install numpy pandas matplotlib
2. Importing the Core Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
3. Loading a CSV File
df = pd.read_csv("data.csv")
print(df.head())
4. Basic Data Inspection
df.info()
df.describe()
df.columns
df.shape
5. Selecting Columns
ages = df["age"]
subset = df[["name", "age", "email"]]
6. Filtering Rows
adults = df[df["age"] >= 18]
active_users = df[df["status"] == "active"]
7. Adding & Modifying Columns
df["yearly_income"] = df["salary"] * 12
df["is_adult"] = df["age"] >= 18
8. Handling Missing Data
df.fillna(0, inplace=True)
df.dropna(inplace=True)
9. Sorting Data
df.sort_values("age", ascending=False)
10. Grouping & Aggregation
df.groupby("country")["salary"].mean()
df.groupby("status").size()
11. Merging DataFrames
merged = pd.merge(users, purchases, on="user_id")
12. NumPy Arrays for Fast Math
arr = np.array([1,2,3,4,5])
arr * 10
np.mean(arr)
np.std(arr)
13. Visualization with Matplotlib
Line Chart
plt.plot(df["age"])
plt.show()
Histogram
plt.hist(df["salary"], bins=20)
plt.show()
Scatter Plot
plt.scatter(df["age"], df["salary"])
plt.show()
14. Exporting Results
df.to_csv("output.csv", index=False)
df.to_excel("output.xlsx", index=False)
15. Example: Full Mini Analysis
df = pd.read_csv("employees.csv")
# cleanup
df.dropna(subset=["age", "salary"], inplace=True)
# calculate
df["salary_yearly"] = df["salary"] * 12
# group
avg_salary = df.groupby("department")["salary_yearly"].mean()
print(avg_salary)
Summary
- Pandas = loading, cleaning, transforming, grouping
- NumPy = fast numeric operations
- Matplotlib = charts & visualizations
- Data analysis = transform → analyze → visualize → export