I wanted to do some string manipulation based on Char length condition.
I have this table, let’s called it sample table.
RiskCode |
---|
A01 |
A02.999 |
I want to transform the RiskCode column in sample table by creating new column such as this:
RiskCode | RiskCode2 |
---|---|
A01 | A01.00 |
A02.999 | A02.99 |
I have created a function to transform the string:
icd_transform <- function(x){
if(nchar(x) > 6) {
return(substr(x,1,6))
} else if (nchar(x) == 5) {
return(paste(x,"0",sep=""))
} else if (nchar(x) == 3) {
return(paste(x,".00",sep=""))
} else {return(x)}
}
I tried to use function above on apply to see the results first.
apply(sample$RiskCode,2,icd_transform)
But, I found an error below:
Error in apply(sample$RiskCode, 2, icd_transform) :
dim(X) must have a positive length
Could you guys help me to solve the problem? Thank you.
Dhestar Bagus Wirawan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
4
You can directly change your variable without creating a function:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
sample <- data.frame(RiskCode = c("A01", "A02.999"))
sample <- mutate(sample,
RiskCode2 = case_when(
nchar(RiskCode) > 6 ~ substr(RiskCode, 1, 6),
nchar(RiskCode) == 5 ~ paste0(RiskCode, "0"),
nchar(RiskCode) == 3 ~ paste0(RiskCode, ".00")
))
print(sample)
#> RiskCode RiskCode2
#> 1 A01 A01.00
#> 2 A02.999 A02.99
Created on 2024-07-19 with reprex v2.1.0
An approach using Vectorize
sample$RiskCode2 <- Vectorize((x) icd_transform(x))(sample$RiskCode)
sample
RiskCode RiskCode2
1 A01 A01.00
2 A02.999 A02.99
2
Another tidyverse
take:
# Pkgs (dplyr, stringr) ---------------------------------------------------
library(tidyverse)
# Sample data -------------------------------------------------------------
my_df <- structure(
list(risk_code = c("A01", "A02.999")),
class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"),
row.names = c(NA, -2L))
# Code (just paste0 and str_sub) ------------------------------------------
my_df <- mutate(
my_df,
new_risk_code = str_sub(
paste0(
risk_code, # "A00.00" is the
str_sub("A00.00", str_length(risk_code) + 1, 6)), # default pattern
1, 6)) # with length 6
# Output ------------------------------------------------------------------
print(my_df)
#> # A tibble: 2 × 2
#> risk_code new_risk_code
#> <chr> <chr>
#> 1 A01 A01.00
#> 2 A02.999 A02.99
Created on 2024-07-19 with reprex v2.1.0
This has already been addressed in the comments but I thought I would expand on those comments.
apply(X, 2, ...)
needs X
to be matrix or other object with at least 2 dimensions (in which case it calls the function once per column) but in the question’s code X
is a plain vector (which has no dimensions at all).
dim(sample$RiskCode)
## NULL
1) We can instead use sapply
which iterates over the components of a vector (or list). That will work with icd_transform
as given in the question.
sapply(sample$RiskCode, icd_transform)
The Vectorize
function mentioned in the another answer would also work.
2) or we could rewrite icd_transform
to accept vectors by using ifelse
rather than if...else...
in which case we don’t even need sapply
. This creates a new data frame, sample2, with the result. Here x
can be a vector. Note that strtrim(x, 6)
will return x
unchanged if x
has less than or equal to 6 characters and otherwise truncate it to 6 characters.
icd_transform2 <- function(x) {
suffix <- ifelse(nchar(x) == 5, "0",
ifelse(nchar(x) == 3, ".00", ""))
x |>
strtrim(6) |>
paste0(suffix)
}
sample2 <- sample |> transform(RiskCode2 = icd_transform2(RiskCode))
3) The dplyr package has a useful case_match
which provides a multi-way switch that is vectorized.
library(dplyr)
icd_transform3 <- function(x) {
case_match(nchar(x),
3 ~ paste0(x, ".00"),
5 ~ paste0(x, "0"),
.default = strtrim(x, 6))
}
sample2 <- sample |> mutate(RiskCode2 = icd_transform3(RiskCode))
4) One comment referred to using sprintf
and depending on what the general case is you may be able to use something like this:
library(dplyr)
icd_transform4 <- function(x) {
sprintf("%s%05.2f", strtrim(x, 1), as.numeric(substring(strtrim(x, 6), 2)))
}
sample |>
mutate(RiskCode2 = icd_transform4(RiskCode))
Note
Input in reproducible form:
sample <- data.frame(RiskCode = c("A01", "A02.999"))