10 Pandas Functions Every Data Scientist Should Know
๐ผ 10 Pandas Functions Every Data Scientist Should Know
1. read_csv()
๐ Load Data from a CSV File
import pandas as pd
df = pd.read_csv('data.csv')
✅ Why it's useful:
The most common starting point—reads CSV files into a DataFrame.
2. info()
๐ Quick Summary of DataFrame
df.info()
✅ Why it's useful:
Displays column names, non-null counts, and data types. Great for checking data integrity.
3. describe()
๐ Summary Statistics
df.describe()
✅ Why it's useful:
Gives mean, std, min, max, and percentiles for numerical columns.
4. value_counts()
๐ Count Unique Values
df['gender'].value_counts()
✅ Why it's useful:
Used for categorical analysis—quickly see distribution of unique values in a column.
5. groupby()
๐ Aggregate Data by Groups
df.groupby('department')['salary'].mean()
✅ Why it's useful:
Performs aggregation (like sum, mean, count) over groups. Essential for EDA and business logic.
6. isnull() + sum()
๐ Detect Missing Values
df.isnull().sum()
✅ Why it's useful:
Helps identify which columns have missing data and how many nulls there are.
7. apply()
๐ Apply a Function to Rows or Columns
df['salary_tax'] = df['salary'].apply(lambda x: x * 0.3)
✅ Why it's useful:
Powerful for row-wise or column-wise transformations.
8. loc[] and iloc[]
๐ Access Rows and Columns by Label or Position
df.loc[0, 'name'] # by label
df.iloc[0, 0] # by index position
✅ Why it's useful:
Used for precise row/column slicing and filtering.
9. merge()
๐ Combine DataFrames (SQL Join Style)
pd.merge(df1, df2, on='customer_id', how='left')
✅ Why it's useful:
Joins multiple tables—very similar to SQL joins (left, right, inner, outer).
10. pivot_table()
๐ Create Summary Tables (like Excel Pivot Tables)
df.pivot_table(index='department', values='salary', aggfunc='mean')
✅ Why it's useful:
Great for multi-level aggregations and quick summaries.
๐ Bonus: A Few More Handy Ones
Function Purpose
dropna() Remove rows with missing values
fillna() Fill missing values
duplicated() Find duplicate rows
sort_values() Sort by one or more columns
astype() Change data types
๐งช Mini Practice Task
Try using all 10 of these in a single workflow:
# Load data
df = pd.read_csv('employees.csv')
# Clean data
df.dropna(subset=['salary'], inplace=True)
# Summary
print(df.info())
print(df['department'].value_counts())
# Add calculated column
df['tax'] = df['salary'].apply(lambda x: x * 0.2)
# Group and analyze
dept_avg = df.groupby('department')['salary'].mean()
# Merge with another dataset
df2 = pd.read_csv('departments.csv')
merged_df = pd.merge(df, df2, on='department', how='left')
✅ Final Tip
Learn to chain functions together:
df[df['salary'] > 50000].groupby('department')['salary'].mean().sort_values(ascending=False)
This kind of one-liner is Pythonic and highly efficient!
Learn Data Science Course in Hyderabad
Read More
Focus on the practical, code-based aspects of data science.
A Guide to Data Types: Structured vs. Unstructured
Exploratory Data Analysis (EDA) in 5 Minutes
Visit Our Quality Thought Training Institute in Hyderabad
Comments
Post a Comment