I am working with a data.table
in R where multiple fields can contain multiple values separated by semicolons. I am trying to separate these values into individual rows, with each value corresponding to a new row in the table. However, the challenge is that the separate fields are related and need to be split and expanded correspondingly.
example data:
library(data.table)
df <- data.table(probe = c('A', 'B', 'C'), # there are many columns in real data
gene = c('geneA', 'geneB;geneC', 'geneD;geneH;geneI;geneO'),
type = c('mRNA', 'mRNA;miRNA', 'mRNA;miRNA;mRNA;miRNA')
)
df
probe gene type
1: A geneA mRNA
2: B geneB;geneC mRNA;miRNA
3: C geneD;geneH;geneI;geneO mRNA;miRNA;mRNA;miRNA
expected output:
df.new <- data.table(probe = c('A', 'B', 'B', 'C', 'C', 'C', 'C'),
gene = c('geneA', 'geneB', 'geneC', 'geneD', 'geneH', 'geneI', 'geneO'),
type = c('mRNA', 'mRNA', 'miRNA', 'mRNA', 'miRNA', 'mRNA', 'miRNA')
)
df.new
probe gene type
1: A geneA mRNA
2: B geneB mRNA
3: B geneC miRNA
4: C geneD mRNA
5: C geneH miRNA
6: C geneI mRNA
7: C geneO miRNA
Thank you for your kindly suggestions