๐ง Data Manipulation with dplyr in R
๐ฆ Loading the Package
install.packages("dplyr") # Run once
library(dplyr)
๐ Core Functions ("Verbs") of dplyr
Each function (verb) is used to perform a specific transformation. These functions are chained using the pipe operator (%>%), making code readable and intuitive.
Function Description
select() Pick columns
filter() Pick rows based on conditions
arrange() Sort rows
mutate() Add or modify columns
summarise() Aggregate values
group_by() Group data for summarizing
rename() Rename columns
distinct() Remove duplicate rows
๐งช Example Dataset: mtcars
Let's explore these functions using the built-in mtcars dataset.
data("mtcars")
mtcars <- tibble::rownames_to_column(mtcars, var = "car")
๐น 1. select(): Choose Columns
mtcars %>%
select(car, mpg, hp)
✅ Pick only the columns car, mpg, and hp.
๐น 2. filter(): Filter Rows by Condition
mtcars %>%
filter(mpg > 25)
✅ Show cars with miles per gallon greater than 25.
๐น 3. arrange(): Sort Data
mtcars %>%
arrange(desc(mpg))
✅ Sort cars by descending fuel efficiency.
๐น 4. mutate(): Create or Modify Columns
mtcars %>%
mutate(power_to_weight = hp / wt)
✅ Add a new column called power_to_weight.
๐น 5. summarise(): Aggregate Values
mtcars %>%
summarise(avg_mpg = mean(mpg), max_hp = max(hp))
✅ Get average mpg and max horsepower.
๐น 6. group_by() + summarise(): Grouped Summary
mtcars %>%
group_by(cyl) %>%
summarise(avg_mpg = mean(mpg), count = n())
✅ Calculate average mpg per cylinder count.
๐น 7. rename(): Rename Columns
mtcars %>%
rename(MilesPerGallon = mpg)
✅ Rename the mpg column.
๐น 8. distinct(): Remove Duplicate Rows
mtcars %>%
select(cyl, gear) %>%
distinct()
✅ Show unique combinations of cyl and gear.
๐น 9. Pipe (%>%) for Chaining Operations
You can chain multiple steps in a clean workflow:
mtcars %>%
filter(mpg > 20) %>%
mutate(efficiency = mpg / hp) %>%
arrange(desc(efficiency)) %>%
select(car, mpg, hp, efficiency)
๐ Bonus Tips
Use n() inside summarise() to count rows.
Use across() to apply functions to multiple columns.
mtcars %>%
summarise(across(c(mpg, hp), mean))
✅ Summary: dplyr Workflow
Load data (e.g., read.csv() or readr::read_csv())
Use verbs like select(), filter(), mutate(), group_by(), summarise()
Chain with %>% for clean, readable code
Learn Data Science Course in Hyderabad
Read More
10 Pandas Functions Every Data Scientist Should Know
Focus on the practical, code-based aspects of data science.
A Guide to Data Types: Structured vs. Unstructured
Visit Our Quality Thought Training Institute in Hyderabad
Subscribe by Email
Follow Updates Articles from This Blog via Email
No Comments