I have a data frame where one column contains a string of numbers. Each row contains a set of numbers that vary in length. For example:
Month Count
Jan "[1.2445, 23.888883, 16.11208347]"
Feb "[2.6473, 400.6256]"
March "[6723.1838282, 187.1212, 90.111, 75.1342899]"
The goal is to convert the list into numeric format, and sum across each row. Resulting in something like (rounded decimals for ease):
Month Count
Jan 41.245
Feb 403.2729
March 7075.500
I have used this code which works for a single row, but I am unable to abstract it over the entire data frame by row.
sum(as.numeric(strsplit(substr(Data$Count, 2, nchar(Data$Count) - 1), ',')[[1]]))
Recognized by R Language Collective
1
You were almost there – in base R you can use lapply
with your strsplit
and gsub
(instead of substr
):
unlist(lapply(strsplit(gsub("\[|\]", "", df$Count), ","), (x) sum(as.numeric(x))))
# [1] 41.24547 403.27290 7075.55032
Data:
df <- read.table(text = 'Month Count
Jan "[1.2445, 23.888883, 16.11208347]"
Feb "[2.6473, 400.6256]"
March "[6723.1838282, 187.1212, 90.111, 75.1342899]" ', h = TRUE)
Data <- read.table(text = 'Month Count
Jan "[1.2445, 23.888883, 16.11208347]"
Feb "[2.6473, 400.6256]"
March "[6723.1838282, 187.1212, 90.111, 75.1342899]"',
header = T, stringsAsFactors = F)
library(dplyr, warn.conflicts = F)
Data %>%
mutate(Count = strsplit(gsub("\[|\]", "", Count), ", ")) %>%
tidyr::unnest_longer(Count) %>%
summarise(Count = sum(as.numeric(Count)), .by = Month) %>%
as.data.frame()
#> Month Count
#> 1 Jan 41.24547
#> 2 Feb 403.27290
#> 3 March 7075.55032
Created on 2024-04-29 with reprex v2.0.2
Recognized by R Language Collective