Python Data Analysis

Introduction

Python dominates data analysis thanks to NumPy, Pandas, and Matplotlib. You’ll learn how to load, clean, analyze, transform, and visualize data using standard tools.

1. Installing Required Libraries

pip install numpy pandas matplotlib
    

2. Importing the Core Libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
    

3. Loading a CSV File

df = pd.read_csv("data.csv")
print(df.head())
    

4. Basic Data Inspection

df.info()
df.describe()
df.columns
df.shape
    

5. Selecting Columns

ages = df["age"]
subset = df[["name", "age", "email"]]
    

6. Filtering Rows

adults = df[df["age"] >= 18]
active_users = df[df["status"] == "active"]
    

7. Adding & Modifying Columns

df["yearly_income"] = df["salary"] * 12
df["is_adult"] = df["age"] >= 18
    

8. Handling Missing Data

df.fillna(0, inplace=True)
df.dropna(inplace=True)
    

9. Sorting Data

df.sort_values("age", ascending=False)
    

10. Grouping & Aggregation

df.groupby("country")["salary"].mean()
df.groupby("status").size()
    

11. Merging DataFrames

merged = pd.merge(users, purchases, on="user_id")
    

12. NumPy Arrays for Fast Math

arr = np.array([1,2,3,4,5])
arr * 10
np.mean(arr)
np.std(arr)
    

13. Visualization with Matplotlib

Line Chart

plt.plot(df["age"])
plt.show()
    

Histogram

plt.hist(df["salary"], bins=20)
plt.show()
    

Scatter Plot

plt.scatter(df["age"], df["salary"])
plt.show()
    

14. Exporting Results

df.to_csv("output.csv", index=False)
df.to_excel("output.xlsx", index=False)
    

15. Example: Full Mini Analysis

df = pd.read_csv("employees.csv")

# cleanup
df.dropna(subset=["age", "salary"], inplace=True)

# calculate
df["salary_yearly"] = df["salary"] * 12

# group
avg_salary = df.groupby("department")["salary_yearly"].mean()

print(avg_salary)
    

Summary