10 Pandas Functions Every Data Scientist Should Know

September 05, 2025

🐼 10 Pandas Functions Every Data Scientist Should Know

1. read_csv()

🔍 Load Data from a CSV File

import pandas as pd

df = pd.read_csv('data.csv')

✅ Why it's useful:

The most common starting point—reads CSV files into a DataFrame.

2. info()

🔍 Quick Summary of DataFrame

df.info()

✅ Why it's useful:

Displays column names, non-null counts, and data types. Great for checking data integrity.

3. describe()

🔍 Summary Statistics

df.describe()

✅ Why it's useful:

Gives mean, std, min, max, and percentiles for numerical columns.

4. value_counts()

🔍 Count Unique Values

df['gender'].value_counts()

✅ Why it's useful:

Used for categorical analysis—quickly see distribution of unique values in a column.

5. groupby()

🔍 Aggregate Data by Groups

df.groupby('department')['salary'].mean()

✅ Why it's useful:

Performs aggregation (like sum, mean, count) over groups. Essential for EDA and business logic.

6. isnull() + sum()

🔍 Detect Missing Values

df.isnull().sum()

✅ Why it's useful:

Helps identify which columns have missing data and how many nulls there are.

7. apply()

🔍 Apply a Function to Rows or Columns

df['salary_tax'] = df['salary'].apply(lambda x: x * 0.3)

✅ Why it's useful:

Powerful for row-wise or column-wise transformations.

8. loc[] and iloc[]

🔍 Access Rows and Columns by Label or Position

df.loc[0, 'name'] # by label

df.iloc[0, 0] # by index position

✅ Why it's useful:

Used for precise row/column slicing and filtering.

9. merge()

🔍 Combine DataFrames (SQL Join Style)

pd.merge(df1, df2, on='customer_id', how='left')

✅ Why it's useful:

Joins multiple tables—very similar to SQL joins (left, right, inner, outer).

10. pivot_table()

🔍 Create Summary Tables (like Excel Pivot Tables)

df.pivot_table(index='department', values='salary', aggfunc='mean')

✅ Why it's useful:

Great for multi-level aggregations and quick summaries.

🚀 Bonus: A Few More Handy Ones

Function Purpose

dropna() Remove rows with missing values

fillna() Fill missing values

duplicated() Find duplicate rows

sort_values() Sort by one or more columns

astype() Change data types

🧪 Mini Practice Task

Try using all 10 of these in a single workflow:

# Load data

df = pd.read_csv('employees.csv')

# Clean data

df.dropna(subset=['salary'], inplace=True)

# Summary

print(df.info())

print(df['department'].value_counts())

# Add calculated column

df['tax'] = df['salary'].apply(lambda x: x * 0.2)

# Group and analyze

dept_avg = df.groupby('department')['salary'].mean()

# Merge with another dataset

df2 = pd.read_csv('departments.csv')

merged_df = pd.merge(df, df2, on='department', how='left')

✅ Final Tip

Learn to chain functions together:

df[df['salary'] > 50000].groupby('department')['salary'].mean().sort_values(ascending=False)

This kind of one-liner is Pythonic and highly efficient!

Learn Data Science Course in Hyderabad

Python & R for Data Science

A Guide to Data Types: Structured vs. Unstructured

Exploratory Data Analysis (EDA) in 5 Minutes

Visit Our Quality Thought Training Institute in Hyderabad

Get Directions

Search This Blog

Best Quality Thought Software Institute Training in Hyderabad