10 Pandas Functions Every Data Scientist Should Know

 ๐Ÿผ 10 Pandas Functions Every Data Scientist Should Know

1. read_csv()

๐Ÿ” Load Data from a CSV File

import pandas as pd


df = pd.read_csv('data.csv')



✅ Why it's useful:

The most common starting point—reads CSV files into a DataFrame.


2. info()

๐Ÿ” Quick Summary of DataFrame

df.info()



✅ Why it's useful:

Displays column names, non-null counts, and data types. Great for checking data integrity.


3. describe()

๐Ÿ” Summary Statistics

df.describe()



✅ Why it's useful:

Gives mean, std, min, max, and percentiles for numerical columns.


4. value_counts()

๐Ÿ” Count Unique Values

df['gender'].value_counts()



✅ Why it's useful:

Used for categorical analysis—quickly see distribution of unique values in a column.


5. groupby()

๐Ÿ” Aggregate Data by Groups

df.groupby('department')['salary'].mean()



✅ Why it's useful:

Performs aggregation (like sum, mean, count) over groups. Essential for EDA and business logic.


6. isnull() + sum()

๐Ÿ” Detect Missing Values

df.isnull().sum()



✅ Why it's useful:

Helps identify which columns have missing data and how many nulls there are.


7. apply()

๐Ÿ” Apply a Function to Rows or Columns

df['salary_tax'] = df['salary'].apply(lambda x: x * 0.3)



✅ Why it's useful:

Powerful for row-wise or column-wise transformations.


8. loc[] and iloc[]

๐Ÿ” Access Rows and Columns by Label or Position

df.loc[0, 'name']     # by label  

df.iloc[0, 0]         # by index position



✅ Why it's useful:

Used for precise row/column slicing and filtering.


9. merge()

๐Ÿ” Combine DataFrames (SQL Join Style)

pd.merge(df1, df2, on='customer_id', how='left')



✅ Why it's useful:

Joins multiple tables—very similar to SQL joins (left, right, inner, outer).


10. pivot_table()

๐Ÿ” Create Summary Tables (like Excel Pivot Tables)

df.pivot_table(index='department', values='salary', aggfunc='mean')



✅ Why it's useful:

Great for multi-level aggregations and quick summaries.


๐Ÿš€ Bonus: A Few More Handy Ones

Function Purpose

dropna() Remove rows with missing values

fillna() Fill missing values

duplicated() Find duplicate rows

sort_values() Sort by one or more columns

astype() Change data types

๐Ÿงช Mini Practice Task


Try using all 10 of these in a single workflow:


# Load data

df = pd.read_csv('employees.csv')


# Clean data

df.dropna(subset=['salary'], inplace=True)


# Summary

print(df.info())

print(df['department'].value_counts())


# Add calculated column

df['tax'] = df['salary'].apply(lambda x: x * 0.2)


# Group and analyze

dept_avg = df.groupby('department')['salary'].mean()


# Merge with another dataset

df2 = pd.read_csv('departments.csv')

merged_df = pd.merge(df, df2, on='department', how='left')


✅ Final Tip


Learn to chain functions together:


df[df['salary'] > 50000].groupby('department')['salary'].mean().sort_values(ascending=False)



This kind of one-liner is Pythonic and highly efficient!

Learn Data Science Course in Hyderabad

Read More

Focus on the practical, code-based aspects of data science.

Python & R for Data Science

A Guide to Data Types: Structured vs. Unstructured

Exploratory Data Analysis (EDA) in 5 Minutes

Visit Our Quality Thought Training Institute in Hyderabad

Get Directions

Comments

Popular posts from this blog

Understanding Snowflake Editions: Standard, Enterprise, Business Critical

Installing Tosca: Step-by-Step Guide for Beginners

Entry-Level Cybersecurity Jobs You Can Apply For Today