I have a data set (df) with many columns, the names of which contain letters and numbers. For example
pr.2013 pr.2012 pr. 2011 cd.2013 cd.2012 cd.2011
and so on and I want to order them alphabetically to this format
cd.2013 cd.2012 cd.2011 pr.2013 pr.2012 pr. 2011
I tried
df1<-df %>% select(order(colnames(.)))
but with no success. Is there any way to do that in R. Manually is not an option as I have a large number of similar columns.
Many thanks in advance!
2
We assume
- the input shown reproducibly in the Note at the end
- that we want to sort by the prefix (part before dot)
- we wish to maintain the same order for the suffix (part after the dot)
Note that the tools package comes with R so it does not have to be installed.
library(dplyr)
library(tools)
ix <- dat %>% names %>% file_path_sans_ext %>% order
dat %>% select(all_of(ix))
## cd.2013 cd.2012 cd.2011 pr.2013 pr.2012 pr.2011
## 1 4 5 6 1 2 3
Note
Input data frame shown reproducibly.
dat <- data.frame(pr.2013 = 1L, pr.2012 = 2L, pr.2011 = 3L,
cd.2013 = 4L, cd.2012 = 5L, cd.2011 = 6L)
You can use order
+ sub
like below
> x <- c("pr.2013", "pr.2012", "pr. 2011", "cd.2013", "cd.2012", "cd.2011")
> x[order(sub("\..*", "", x))]
[1] "cd.2013" "cd.2012" "cd.2011" "pr.2013" "pr.2012" "pr. 2011"
If there’s really lots of columns, it might be better to create a separate sorting vector and then simply select accordingly. Try this:
# Pkgs (stringr, tibble, dplyr) -------------------------------------------
library(tidyverse)
# Toy data ----------------------------------------------------------------
my_df <- tibble::tribble(
~pr.2013, ~pr.2012, ~pr.2011, ~cd.2013, ~cd.2012, ~cd.2011,
1L, 2L, 3L, 4L, 5L, 6L)
# Sort --------------------------------------------------------------------
# Make a tibble with colnames
# Arrange it by letters fist and descending numbers last
aux <- tibble(value = colnames(my_df)) %>%
arrange(str_extract(value, "\D+"), desc(str_extract(value, "\d+$"))) %>%
pull(value)
# Sort the tibble/df with selection-helper `all_of` or `any_of`
new_df <- select(my_df, all_of(aux))
rm(aux)
# Output ------------------------------------------------------------------
new_df
#> # A tibble: 1 × 6
#> cd.2013 cd.2012 cd.2011 pr.2013 pr.2012 pr.2011
#> <int> <int> <int> <int> <int> <int>
#> 1 4 5 6 1 2 3
Created on 2024-09-16 with reprex v2.1.0
If this is your data
pr.2012 pr.2011 pr.2013 cd.2012 cd.2013 cd.2011
1 1 2 3 4 5 6
2 2 3 4 5 6 7
3 3 4 5 6 7 8
Using the strsplit
by .
and arrange
d colnames to select the variables in the right order.
library(dplyr)
df[,match(data.frame(t(data.frame(strsplit(colnames(df), "\.")))) %>%
arrange(X1, desc(X2)) %>%
reframe(paste0(X1, ".", X2)) %>%
unlist(use.names=F), colnames(df))]
cd.2013 cd.2012 cd.2011 pr.2013 pr.2012 pr.2011
1 5 4 6 3 1 2
2 6 5 7 4 2 3
3 7 6 8 5 3 4
Data
df <- structure(list(pr.2012 = 1:3, pr.2011 = 2:4, pr.2013 = 3:5, cd.2012 =
4:6, cd.2013 = 5:7, cd.2011 = 6:8), class = "data.frame", row.names =
c(NA, -3L))
5
Here is a base R solution:
df <- data.frame(
pr.2012 = letters[1:10],
pr.2013 = letters[11:20],
cd.2012 = LETTERS[1:10],
cd.2013 = LETTERS[11:20]
)
print(df)
pr.2012 pr.2013 cd.2012 cd.2013
1 a k A K
2 b l B L
3 c m C M
4 d n D N
5 e o E O
6 f p F P
7 g q G Q
8 h r H R
9 i s I S
10 j t J T
# order column names alphabetically & reverse order
df <- df[,order(sort(colnames(df),decreasing = T))]
print(df)
cd.2013 cd.2012 pr.2013 pr.2012
1 K A k a
2 L B l b
3 M C m c
4 N D n d
5 O E o e
6 P F p f
7 Q G q g
8 R H r h
9 S I s i
10 T J t j
2