Data Manipulation with dplyr in R

🔧 Data Manipulation with dplyr in R

📦 Loading the Package

install.packages("dplyr") # Run once

library(dplyr)

📊 Core Functions ("Verbs") of dplyr

Each function (verb) is used to perform a specific transformation. These functions are chained using the pipe operator (%>%), making code readable and intuitive.

Function Description

select() Pick columns

filter() Pick rows based on conditions

arrange() Sort rows

mutate() Add or modify columns

summarise() Aggregate values

group_by() Group data for summarizing

rename() Rename columns

distinct() Remove duplicate rows

🧪 Example Dataset: mtcars

Let's explore these functions using the built-in mtcars dataset.

data("mtcars")

mtcars <- tibble::rownames_to_column(mtcars, var = "car")

🔹 1. select(): Choose Columns

mtcars %>%

select(car, mpg, hp)

✅ Pick only the columns car, mpg, and hp.

🔹 2. filter(): Filter Rows by Condition

mtcars %>%

filter(mpg > 25)

✅ Show cars with miles per gallon greater than 25.

🔹 3. arrange(): Sort Data

mtcars %>%

arrange(desc(mpg))

✅ Sort cars by descending fuel efficiency.

🔹 4. mutate(): Create or Modify Columns

mtcars %>%

mutate(power_to_weight = hp / wt)

✅ Add a new column called power_to_weight.

🔹 5. summarise(): Aggregate Values

mtcars %>%

summarise(avg_mpg = mean(mpg), max_hp = max(hp))

✅ Get average mpg and max horsepower.

🔹 6. group_by() + summarise(): Grouped Summary

mtcars %>%

group_by(cyl) %>%

summarise(avg_mpg = mean(mpg), count = n())

✅ Calculate average mpg per cylinder count.

🔹 7. rename(): Rename Columns

mtcars %>%

rename(MilesPerGallon = mpg)

✅ Rename the mpg column.

🔹 8. distinct(): Remove Duplicate Rows

mtcars %>%

select(cyl, gear) %>%

distinct()

✅ Show unique combinations of cyl and gear.

🔹 9. Pipe (%>%) for Chaining Operations

You can chain multiple steps in a clean workflow:

mtcars %>%

filter(mpg > 20) %>%

mutate(efficiency = mpg / hp) %>%

arrange(desc(efficiency)) %>%

select(car, mpg, hp, efficiency)

📌 Bonus Tips

Use n() inside summarise() to count rows.

Use across() to apply functions to multiple columns.

mtcars %>%

summarise(across(c(mpg, hp), mean))

✅ Summary: dplyr Workflow

Load data (e.g., read.csv() or readr::read_csv())

Use verbs like select(), filter(), mutate(), group_by(), summarise()

Chain with %>% for clean, readable code

Learn Data Science Course in Hyderabad

Focus on the practical, code-based aspects of data science.

Python & R for Data Science

A Guide to Data Types: Structured vs. Unstructured

Visit Our Quality Thought Training Institute in Hyderabad

Get Directions

September 05, 2025

Friday, September 5, 2025

Data Manipulation with dplyr in R