Friday, September 5, 2025

thumbnail

10 Pandas Functions Every Data Scientist Should Know

 ๐Ÿผ 10 Pandas Functions Every Data Scientist Should Know

1. read_csv()

๐Ÿ” Load Data from a CSV File

import pandas as pd


df = pd.read_csv('data.csv')



✅ Why it's useful:

The most common starting point—reads CSV files into a DataFrame.


2. info()

๐Ÿ” Quick Summary of DataFrame

df.info()



✅ Why it's useful:

Displays column names, non-null counts, and data types. Great for checking data integrity.


3. describe()

๐Ÿ” Summary Statistics

df.describe()



✅ Why it's useful:

Gives mean, std, min, max, and percentiles for numerical columns.


4. value_counts()

๐Ÿ” Count Unique Values

df['gender'].value_counts()



✅ Why it's useful:

Used for categorical analysis—quickly see distribution of unique values in a column.


5. groupby()

๐Ÿ” Aggregate Data by Groups

df.groupby('department')['salary'].mean()



✅ Why it's useful:

Performs aggregation (like sum, mean, count) over groups. Essential for EDA and business logic.


6. isnull() + sum()

๐Ÿ” Detect Missing Values

df.isnull().sum()



✅ Why it's useful:

Helps identify which columns have missing data and how many nulls there are.


7. apply()

๐Ÿ” Apply a Function to Rows or Columns

df['salary_tax'] = df['salary'].apply(lambda x: x * 0.3)



✅ Why it's useful:

Powerful for row-wise or column-wise transformations.


8. loc[] and iloc[]

๐Ÿ” Access Rows and Columns by Label or Position

df.loc[0, 'name']     # by label  

df.iloc[0, 0]         # by index position



✅ Why it's useful:

Used for precise row/column slicing and filtering.


9. merge()

๐Ÿ” Combine DataFrames (SQL Join Style)

pd.merge(df1, df2, on='customer_id', how='left')



✅ Why it's useful:

Joins multiple tables—very similar to SQL joins (left, right, inner, outer).


10. pivot_table()

๐Ÿ” Create Summary Tables (like Excel Pivot Tables)

df.pivot_table(index='department', values='salary', aggfunc='mean')



✅ Why it's useful:

Great for multi-level aggregations and quick summaries.


๐Ÿš€ Bonus: A Few More Handy Ones

Function Purpose

dropna() Remove rows with missing values

fillna() Fill missing values

duplicated() Find duplicate rows

sort_values() Sort by one or more columns

astype() Change data types

๐Ÿงช Mini Practice Task


Try using all 10 of these in a single workflow:


# Load data

df = pd.read_csv('employees.csv')


# Clean data

df.dropna(subset=['salary'], inplace=True)


# Summary

print(df.info())

print(df['department'].value_counts())


# Add calculated column

df['tax'] = df['salary'].apply(lambda x: x * 0.2)


# Group and analyze

dept_avg = df.groupby('department')['salary'].mean()


# Merge with another dataset

df2 = pd.read_csv('departments.csv')

merged_df = pd.merge(df, df2, on='department', how='left')


✅ Final Tip


Learn to chain functions together:


df[df['salary'] > 50000].groupby('department')['salary'].mean().sort_values(ascending=False)



This kind of one-liner is Pythonic and highly efficient!

Learn Data Science Course in Hyderabad

Read More

Focus on the practical, code-based aspects of data science.

Python & R for Data Science

A Guide to Data Types: Structured vs. Unstructured

Exploratory Data Analysis (EDA) in 5 Minutes

Visit Our Quality Thought Training Institute in Hyderabad

Get Directions

Subscribe by Email

Follow Updates Articles from This Blog via Email

No Comments

About

Search This Blog

Powered by Blogger.

Blog Archive