Friday, September 5, 2025

thumbnail

Data Manipulation with dplyr in R

๐Ÿ”ง Data Manipulation with dplyr in R

๐Ÿ“ฆ Loading the Package

install.packages("dplyr")   # Run once

library(dplyr)


๐Ÿ“Š Core Functions ("Verbs") of dplyr


Each function (verb) is used to perform a specific transformation. These functions are chained using the pipe operator (%>%), making code readable and intuitive.


Function Description

select() Pick columns

filter() Pick rows based on conditions

arrange() Sort rows

mutate() Add or modify columns

summarise() Aggregate values

group_by() Group data for summarizing

rename() Rename columns

distinct() Remove duplicate rows

๐Ÿงช Example Dataset: mtcars


Let's explore these functions using the built-in mtcars dataset.


data("mtcars")

mtcars <- tibble::rownames_to_column(mtcars, var = "car")


๐Ÿ”น 1. select(): Choose Columns

mtcars %>% 

  select(car, mpg, hp)



✅ Pick only the columns car, mpg, and hp.


๐Ÿ”น 2. filter(): Filter Rows by Condition

mtcars %>% 

  filter(mpg > 25)



✅ Show cars with miles per gallon greater than 25.


๐Ÿ”น 3. arrange(): Sort Data

mtcars %>%

  arrange(desc(mpg))



✅ Sort cars by descending fuel efficiency.


๐Ÿ”น 4. mutate(): Create or Modify Columns

mtcars %>% 

  mutate(power_to_weight = hp / wt)



✅ Add a new column called power_to_weight.


๐Ÿ”น 5. summarise(): Aggregate Values

mtcars %>% 

  summarise(avg_mpg = mean(mpg), max_hp = max(hp))



✅ Get average mpg and max horsepower.


๐Ÿ”น 6. group_by() + summarise(): Grouped Summary

mtcars %>%

  group_by(cyl) %>%

  summarise(avg_mpg = mean(mpg), count = n())



✅ Calculate average mpg per cylinder count.


๐Ÿ”น 7. rename(): Rename Columns

mtcars %>%

  rename(MilesPerGallon = mpg)



✅ Rename the mpg column.


๐Ÿ”น 8. distinct(): Remove Duplicate Rows

mtcars %>%

  select(cyl, gear) %>%

  distinct()



✅ Show unique combinations of cyl and gear.


๐Ÿ”น 9. Pipe (%>%) for Chaining Operations


You can chain multiple steps in a clean workflow:


mtcars %>%

  filter(mpg > 20) %>%

  mutate(efficiency = mpg / hp) %>%

  arrange(desc(efficiency)) %>%

  select(car, mpg, hp, efficiency)


๐Ÿ“Œ Bonus Tips


Use n() inside summarise() to count rows.


Use across() to apply functions to multiple columns.


mtcars %>%

  summarise(across(c(mpg, hp), mean))


✅ Summary: dplyr Workflow


Load data (e.g., read.csv() or readr::read_csv())


Use verbs like select(), filter(), mutate(), group_by(), summarise()


Chain with %>% for clean, readable code

Learn Data Science Course in Hyderabad

Read More

10 Pandas Functions Every Data Scientist Should Know

Focus on the practical, code-based aspects of data science.

Python & R for Data Science

A Guide to Data Types: Structured vs. Unstructured

Visit Our Quality Thought Training Institute in Hyderabad

Get Directions

Subscribe by Email

Follow Updates Articles from This Blog via Email

No Comments

About

Search This Blog

Powered by Blogger.

Blog Archive