I have many files in the same folder and I am trying to merge them based on the first column which is named “gene_name
“. here I added 3 example files to be merged:
file1:
"gene_name","tpm"
"HS3ST1",0.096383
"PDK4",0.227807
"BAIAP2L1",1.52907
"CACNG3",0.0760646
file2:
"gene_name","tpm"
"HS3ST1",0.0513056
"PDK4",0
"BAIAP2L1",1.1508
"CACNG3",0
"PDK4",2
file3:
"gene_name","tpm"
"HS3ST1",0.096383
"PDK4",0.227807
"BAIAP2L1",1.52907
"CACNG3",0.0760646
"BAIAP2L1",2.65
"CACNG3",3.6548
the main problem is that:
1- these files do not have exactly the same number of rows and some rows have similar “gene_name”
2- also the order of gene_name in different files are not exactly the same.
what I want is:
if in some of the files some of gene_name are repeated, I want only one occurrence of them (the one which has the highest “tmp
” which is the second column)
then merging all files in that folder based on the “gene_name” column.
here is the expected output:
"gene_name","file1","file2","file3"
"HS3ST1",0.096383,0.0513056,0.096383
"PDK4",0.227807,2,0.227807
"BAIAP2L1",1.52907,1.1508,2.65
"CACNG3",0.0760646,0,3.6548
how can I do it in R
?