Create and modify columns in R with the mutate() function from dplyr (2024)

HOME DATA MANIPULATION IN R MUTATE DPLYR R

Data Manipulation in R dplyr

Create and modify columns in R with the mutate() function from dplyr (1)

The mutate function from dplyr package is used to create new columns or modify existing columns in a data frame, while retaining the original structure. It enables users to apply functions or operations to data within a data frame and store the results as new variables.

Syntax

The syntax of the mutate function is the following:

# Basic usagemutate(.data, new_column_name = expression)mutate( .data, # Data set ..., # New columns (new_column_name = expression) .by = NULL, # Grouping variables .keep = c("all", "used", "unused", "none"), # Which columns to keep .before = NULL, # New columns will appear before this .after = NULL # New columns will appear after this)

Create columns

To create a new column you can specify the new column name (Var3) and a expression to calculate the values of the new column. In the example below the expression is the sum of two other columns (Var1 + Var2).

library(dplyr)set.seed(8)df <- data.frame(Var1 = sample(1:50, 5), Var2 = sample(1:50, 5))# New column with the sum of 'Var1' and 'Var2'df_2 <- df %>% mutate(Var3 = Var1 + Var2)df_2
 Var1 Var2 Var31 32 39 712 34 1 353 15 29 444 12 3 155 42 35 77

Create and modify columns in R with the mutate() function from dplyr (2)

You can also apply a function to a column to create a new one. The following example illustrates how to create a new column with the square root of another.

library(dplyr)set.seed(8)df <- data.frame(Var1 = sample(1:50, 5), Var2 = sample(1:50, 5))# New column with the square root of 'Var1'df_2 <- df %>% mutate(Sqrt_Var1 = sqrt(Var1))df_2
 Var1 Var2 Sqrt_Var11 32 39 5.6568542 34 1 5.8309523 15 29 3.8729834 12 3 3.4641025 42 35 6.480741

Create and modify columns in R with the mutate() function from dplyr (3)

Add multiple columns at once

Notice that you can add several columns simultaneously by passing more expressions separated by commas to mutate.

library(dplyr)set.seed(8)df <- data.frame(Var1 = sample(1:50, 5), Var2 = sample(1:50, 5))# Create three new columns named 'Var3', 'Var4' and 'Var5'df_2 <- df %>% mutate(Var3 = Var1 + Var2, Var4 = c*msum(Var1), Var5 = ifelse(Var1 > Var2, TRUE, FALSE))df_2
 Var1 Var2 Var3 Var4 Var51 32 39 71 32 FALSE2 34 1 35 66 TRUE3 15 29 44 81 FALSE4 12 3 15 93 TRUE5 42 35 77 135 TRUE

Create and modify columns in R with the mutate() function from dplyr (4)

Using across

The across function allows to select specific columns with helper functions while using mutate.

In the following example, the sqrt function is applied to all columns containing "1" in their names. Subsequently, the new columns are named using the previous column name ("{.col}") followed by a suffix ("_sqrt").

library(dplyr)set.seed(8)df <- data.frame(Var1 = sample(1:50, 5), Var2 = sample(1:50, 5))# Create new columns with all columns that contain "1" in their namesdf_2 <- df %>% mutate(across(contains("1"), sqrt, .names = "{.col}_sqrt"))df_2
 Var1 Var2 Var1_sqrt1 32 39 5.6568542 34 1 5.8309523 15 29 3.8729834 12 3 3.4641025 42 35 6.480741

Create and modify columns in R with the mutate() function from dplyr (5)

Modify existing columns

The mutate function is also capable of modifying existing columns. To achieve this, simply use the syntax: old_column_name = expression

library(dplyr)set.seed(8)df <- data.frame(Var1 = sample(1:50, 5), Var2 = sample(1:50, 5))# Modify 'Var1' as its value divided by the mean of the columndf_2 <- df %>% mutate(Var1 = Var1 / mean(Var1, na.rm = TRUE)) df_2
 Var1 Var21 1.1851852 392 1.2592593 13 0.5555556 294 0.4444444 35 1.5555556 35

Modify multiple columns

With across, you can select specific columns and apply a custom function to them without creating new columns. The example below demonstrates how to apply the log function to all columns except for Var1.

library(dplyr)set.seed(8)df <- data.frame(Var1 = sample(1:50, 5), Var2 = sample(1:50, 5))# Apply log to all columns but 'Var1'df_2 <- df %>% mutate(across(!Var1, log)) df_2
 Var1 Var21 32 3.6635622 34 0.0000003 15 3.3672964 12 1.0986125 42 3.555348

Create and modify columns in R with the mutate() function from dplyr (6)

In such cases where the desired function takes additional arguments, you will need to use ~ before the function and reference . to represent the column values.

library(dplyr)set.seed(8)df <- data.frame(Var1 = sample(1:50, 5), Var2 = sample(1:50, 5))# Modify all columns but 'Var1' with paste0df_2 <- df %>% mutate(across(!Var1, ~paste0("Value: ", .))) df_2
 Var1 Var21 32 Value: 392 34 Value: 13 15 Value: 294 12 Value: 35 42 Value: 35

Create and modify columns in R with the mutate() function from dplyr (7)

Position of the new columns

By default, when you use mutate to create a new column, it appends that column to the end of the data frame. Nevertheless, the mutate function allows for the use of .before or .after arguments to control the position of the new columns relative to other columns.

In the example below, a new column is added before Var2:

library(dplyr)set.seed(8)df <- data.frame(Var1 = sample(1:50, 5), Var2 = sample(1:50, 5))# New column before 'Var2'df_2 <- df %>% mutate(Var3 = Var1 / Var2, .before = Var2)df_2
 Var1 Var3 Var21 32 0.8205128 392 34 34.0000000 13 15 0.5172414 294 12 4.0000000 35 42 1.2000000 35

Create and modify columns in R with the mutate() function from dplyr (8)

Additionally, you can specify positions based on the index of the column. For instance, setting .before = 1 will add the new column as the first column.

library(dplyr)set.seed(8)df <- data.frame(Var1 = sample(1:50, 5), Var2 = sample(1:50, 5))# New column on the first position (before first column)df_2 <- df %>% mutate(Var3 = Var1 / Var2, .before = 1)df_2
 Var3 Var1 Var21 0.8205128 32 392 34.0000000 34 13 0.5172414 15 294 4.0000000 12 35 1.2000000 42 35

Create and modify columns in R with the mutate() function from dplyr (9)

In the example below, setting .after = "Var2" will place the new column after Var2.

library(dplyr)set.seed(8)df <- data.frame(Var1 = sample(1:50, 5), Var2 = sample(1:50, 5))# New column after 'Var2'df_2 <- df %>% mutate(Var3 = Var1 * Var2, .after = "Var2")df_2
 Var1 Var2 Var31 32 39 12482 34 1 343 15 29 4354 12 3 365 42 35 1470

Create and modify columns in R with the mutate() function from dplyr (10)

Keep or drop columns

When you add new columns to a data frame all other columns are preserved by default. However, the .keep argument, which defaults to "all", can also be set to "used" to only keep columns used inside mutate, to "unused" to keep unused columns and to "none" to delete all the old columns.

The following example demonstrates how to use .keep by retaining only used columns.

library(dplyr)set.seed(8)df <- data.frame(Var1 = sample(1:50, 5), Var2 = sample(1:50, 5))# New column and keep the new column ('Var3') and the column used ('Var1')df_2 <- df %>% mutate(Var3 = Var1 * 2, .keep = "used")df_2
 Var1 Var31 32 642 34 683 15 304 12 245 42 84

Create and modify columns in R with the mutate() function from dplyr (11)

The opposite is to keep only the new column and the columns that haven’t been used setting .keep = "unused".

library(dplyr)set.seed(8)df <- data.frame(Var1 = sample(1:50, 5), Var2 = sample(1:50, 5))# New column and keep the new column ('Var3') and the column unused ('Var2')df_2 <- df %>% mutate(Var3 = Var1 * 2, .keep = "unused")df_2
 Var2 Var31 39 642 1 683 29 304 3 245 35 84

Create and modify columns in R with the mutate() function from dplyr (12)

Finally, to remove all columns from the original data frame, use .keep = "none".

library(dplyr)set.seed(8)df <- data.frame(Var1 = sample(1:50, 5), Var2 = sample(1:50, 5))# New column and only keep the new column ('Var3')df_2 <- df %>% mutate(Var3 = Var1 * 2, .keep = "none")df_2
 Var31 642 683 304 245 84

Create and modify columns in R with the mutate() function from dplyr (13)

Create and modify columns in R with the mutate() function from dplyr (2024)

References

Top Articles
Latest Posts
Article information

Author: Kelle Weber

Last Updated:

Views: 6101

Rating: 4.2 / 5 (53 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Kelle Weber

Birthday: 2000-08-05

Address: 6796 Juan Square, Markfort, MN 58988

Phone: +8215934114615

Job: Hospitality Director

Hobby: tabletop games, Foreign language learning, Leather crafting, Horseback riding, Swimming, Knapping, Handball

Introduction: My name is Kelle Weber, I am a magnificent, enchanting, fair, joyous, light, determined, joyous person who loves writing and wants to share my knowledge and understanding with you.