Objective: Remove columns if their name starts with XXX and the rows meet a criteria. For example, in the dataset below, remove all “fillerX” columns which only contain zeros.
Data:
iris %>%
tibble() %>%
slice(1:5) %>%
mutate(
fillerQ = rep(0,5),
fillerW = rep(0,5),
fillerE = rep(0,5),
fillerR = c(0,0,1,0,0),
fillerT = rep(0,5),
fillerY = rep(0,5),
fillerU = rep(0,5),
fillerI = c(0,0,0,0,1),
fillerO = rep(0,5),
fillerP = rep(0,5),
)
# A tibble: 5 × 15
Sepal.Length Sepal.Width Petal.Length Petal.Width Species fillerQ fillerW fillerE fillerR fillerT fillerY fillerU fillerI fillerO fillerP
<dbl> <dbl> <dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 5.1 3.5 1.4 0.2 setosa 0 0 0 0 0 0 0 0 0 0
2 4.9 3 1.4 0.2 setosa 0 0 0 0 0 0 0 0 0 0
3 4.7 3.2 1.3 0.2 setosa 0 0 0 1 0 0 0 0 0 0
4 4.6 3.1 1.5 0.2 setosa 0 0 0 0 0 0 0 0 0 0
5 5 3.6 1.4 0.2 setosa 0 0 0 0 0 0 0 1 0 0
Problem: We can use starts_with("filler")
to reference the filler columns, and we can use select_if(~ sum(abs(.)) != 0)
to keep non-zero columns, but we cannot put starts_with()
inside of select_if()
, since we will get the error:
Error:
! `starts_with()` must be used within a *selecting* function.
ℹ See ?tidyselect::faq-selection-context for details.
Run `rlang::last_trace()` to see where the error occurred.
Question: How do you combine starts_with()
and select_if()
?
select_if()
has been superseded. Use where()
inside select()
instead.
library(dplyr)
df %>%
select(!(starts_with("filler") & where(~ all(.x == 0))))
# # A tibble: 5 × 7
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species fillerR fillerI
# <dbl> <dbl> <dbl> <dbl> <fct> <dbl> <dbl>
# 1 5.1 3.5 1.4 0.2 setosa 0 0
# 2 4.9 3 1.4 0.2 setosa 0 0
# 3 4.7 3.2 1.3 0.2 setosa 1 0
# 4 4.6 3.1 1.5 0.2 setosa 0 0
# 5 5 3.6 1.4 0.2 setosa 0 1
1
Answer: Nest the entire tibble with !starts_with()
, perform the select_if()
with is.list() ||
as part of the argument, then unnest the data. Telling the select statement to accept lists is necessary, as it will error out if sum()
is attempted on the nested list. This looks like:
nest(data = !starts_with("filler")) %>%
select_if(~ is.list(.) || sum(abs(.)) != 0) %>%
relocate(data) %>%
unnest(data)
Which gives the expect result:
# A tibble: 5 × 7
Sepal.Length Sepal.Width Petal.Length Petal.Width Species fillerR fillerI
<dbl> <dbl> <dbl> <dbl> <fct> <dbl> <dbl>
1 5.1 3.5 1.4 0.2 setosa 0 0
2 4.9 3 1.4 0.2 setosa 0 0
3 4.6 3.1 1.5 0.2 setosa 0 0
4 4.7 3.2 1.3 0.2 setosa 1 0
5 5 3.6 1.4 0.2 setosa 0 1