Introduction
Pandas is one of the most powerful Python libraries used for data analysis and data manipulation. Whether you are working in data science, machine learning, or backend development, learning Pandas is essential.
In this tutorial, you will learn Pandas from scratch, including installation, DataFrames, operations, and real examples.
What is Pandas
Pandas is an open-source Python library used for handling structured data efficiently. It provides easy-to-use data structures and tools for data analysis.
Key Features
- Fast and efficient data handling
- Easy data cleaning and transformation
- Supports CSV, Excel, SQL data
- Powerful grouping and filtering
Installing Pandas
You can install Pandas using pip:
pip install pandas
Import Pandas in Python:
import pandas as pd
Pandas Data Structures
Pandas mainly provides two data structures:
1. Series
A Series is a one-dimensional array.
data = [10, 20, 30]
series = pd.Series(data)
print(series)
2. DataFrame
A DataFrame is a two-dimensional table (rows and columns).
data = {
"Name": ["A", "B", "C"],
"Marks": [85, 90, 88]
}df = pd.DataFrame(data)
print(df)
Reading Data in Pandas
Pandas can read data from different sources.
Read CSV File
df = pd.read_csv("data.csv")
print(df)
Read Excel File
df = pd.read_excel("data.xlsx")
Viewing Data
First 5 Rows
df.head()
Last 5 Rows
df.tail()
Data Information
df.info()
Selecting Data
Select Column
df["Name"]
Select Multiple Columns
df[["Name", "Marks"]]
Select Rows
df.iloc[0]
Filtering Data
df[df["Marks"] > 85]
This returns rows where marks are greater than 85.
Adding New Column
df["Grade"] = ["A", "A+", "A"]
Updating Data
df.loc[0, "Marks"] = 95
Deleting Column
df.drop("Grade", axis=1, inplace=True)
Handling Missing Values
Check Missing Values
df.isnull()
Fill Missing Values
df.fillna(0, inplace=True)
Sorting Data
df.sort_values("Marks", ascending=False)
Grouping Data
df.groupby("Grade").mean()
Working with Real Example
data = {
"Name": ["A", "B", "C", "D"],
"Marks": [80, 90, 85, 95]
}df = pd.DataFrame(data)# Filter students with marks > 85
result = df[df["Marks"] > 85]print(result)
Exporting Data
Save to CSV
df.to_csv("output.csv", index=False)
Advantages of Pandas
- Easy to learn and use
- Handles large datasets
- Integrates with NumPy and Matplotlib
- Widely used in industry
Common Mistakes
- Not handling missing values
- Using wrong indexing
- Forgetting axis in operations
- Not checking data types
Tips to Learn Pandas Faster
- Practice daily with datasets
- Try real-world projects
- Use Jupyter Notebook
- Focus on DataFrame operations
Conclusion
Pandas is an essential tool for anyone working with data in Python. In this tutorial, you learned the basics of Pandas including DataFrames, data selection, filtering, and data cleaning.
With regular practice, you can master Pandas and use it in data science, machine learning, and real-world applications.