I have a text file with 8 columns. I want to sort columns 1 and 3 based on the number before the first “_” in the string. After filtering, the values in column 8 must be summed. Also, the output must include the number of rows counted.
1_BAG:GA000870 0 3_BAG:GA000872 0 1 10203774 10809556 3.22
1_BAG:GA000870 0 3_BAG:GA000872 0 1 44282371 46620360 8.09
1_BAG:GA000870 0 3_BAG:GA000872 0 1 58765794 59231535 4.36
1_BAG:GA000870 0 3_BAG:GA000872 0 1 76916027 77270614 3.57
1_BAG:GA000870 0 4_BIR:GA000873 0 1 84053552 84926168 3.44
1_BAG:GA000870 0 3_BAG:GA000872 0 1 112131164 112504018 5.47
1_BAG:GA000870 0 3_BAG:GA000872 0 1 158402969 158851710 4.65
1_BAG:GA000870 0 2_BAG:GA000871 0 1 98233953 98697761 3.13
1_BAG:GA000870 0 3_BAG:GA000872 0 1 215248496 215968924 4.23
1_BAG:GA000870 0 2_BAG:GA000871 0 1 163144718 163344256 4.39
1_BAG:GA000870 0 2_BAG:GA000871 0 1 188056283 188607409 3.15
1_BAG:GA000870 0 5_BIR:GA000874 0 1 53259825 53566799 3.79
1_BAG:GA000870 0 2_BAG:GA000871 0 1 234917668 235040966 4.42
1_BAG:GA000870 0 7_BIR:GA000876 0 1 30235795 30845165 3.27
1_BAG:GA000870 0 7_BIR:GA000876 0 1 69902487 70086221 3.16
1_BAG:GA000870 0 10_BIR:GA000879 0 1 41952810 42141252 3.26
1_BAG:GA000870 0 10_BIR:GA000879 0 1 42852945 43299952 3.23
1_BAG:GA000870 0 10_BIR:GA000879 0 1 61122372 61257787 4.28
1_BAG:GA000870 0 9_BIR:GA000878 0 1 63147235 63544990 4.17
1_BAG:GA000870 0 9_BIR:GA000878 0 1 86339676 86933974 4.58
1_BAG:GA000870 0 10_BIR:GA000879 0 1 202173634 202991051 10.04
1_BAG:GA000870 0 3_BAG:GA000872 0 2 10203774 10809556 3.22
1_BAG:GA000870 0 3_BAG:GA000872 0 2 44282371 46620360 8.09
1_BAG:GA000870 0 3_BAG:GA000872 0 2 58765794 59231535 4.36
1_BAG:GA000870 0 3_BAG:GA000872 0 2 76916027 77270614 3.57
1_BAG:GA000870 0 4_BIR:GA000873 0 2 84053552 84926168 3.44
1_BAG:GA000870 0 3_BAG:GA000872 0 2 112131164 112504018 5.47
1_BAG:GA000870 0 3_BAG:GA000872 0 2 158402969 158851710 4.65
1_BAG:GA000870 0 2_BAG:GA000871 0 2 98233953 98697761 3.13
1_BAG:GA000870 0 3_BAG:GA000872 0 2 215248496 215968924 4.23
1_BAG:GA000870 0 2_BAG:GA000871 0 2 163144718 163344256 4.39
1_BAG:GA000870 0 2_BAG:GA000871 0 2 188056283 188607409 3.15
1_BAG:GA000870 0 5_BIR:GA000874 0 2 53259825 53566799 3.79
1_BAG:GA000870 0 2_BAG:GA000871 0 2 234917668 235040966 4.42
1_BAG:GA000870 0 7_BIR:GA000876 0 2 30235795 30845165 3.27
1_BAG:GA000870 0 7_BIR:GA000876 0 2 69902487 70086221 3.16
1_BAG:GA000870 0 10_BIR:GA000879 0 2 41952810 42141252 3.26
1_BAG:GA000870 0 10_BIR:GA000879 0 2 42852945 43299952 3.23
1_BAG:GA000870 0 10_BIR:GA000879 0 2 61122372 61257787 4.28
1_BAG:GA000870 0 9_BIR:GA000878 0 2 63147235 63544990 4.17
1_BAG:GA000870 0 9_BIR:GA000878 0 2 86339676 86933974 4.58
1_BAG:GA000870 0 10_BIR:GA000879 0 2 202173634 202991051 10.04
1_BAG:GA000870 0 3_BAG:GA000872 0 3 10203774 10809556 3.22
1_BAG:GA000870 0 3_BAG:GA000872 0 3 44282371 46620360 8.09
1_BAG:GA000870 0 3_BAG:GA000872 0 3 58765794 59231535 4.36
1_BAG:GA000870 0 3_BAG:GA000872 0 3 76916027 77270614 3.57
1_BAG:GA000870 0 4_BIR:GA000873 0 3 84053552 84926168 3.44
1_BAG:GA000870 0 3_BAG:GA000872 0 3 112131164 112504018 5.47
1_BAG:GA000870 0 3_BAG:GA000872 0 3 158402969 158851710 4.65
1_BAG:GA000870 0 2_BAG:GA000871 0 3 98233953 98697761 3.13
1_BAG:GA000870 0 3_BAG:GA000872 0 3 215248496 215968924 4.23
1_BAG:GA000870 0 2_BAG:GA000871 0 3 163144718 163344256 4.39
1_BAG:GA000870 0 2_BAG:GA000871 0 3 188056283 188607409 3.15
1_BAG:GA000870 0 5_BIR:GA000874 0 3 53259825 53566799 3.79
1_BAG:GA000870 0 2_BAG:GA000871 0 3 234917668 235040966 4.42
1_BAG:GA000870 0 7_BIR:GA000876 0 3 30235795 30845165 3.27
1_BAG:GA000870 0 7_BIR:GA000876 0 3 69902487 70086221 3.16
1_BAG:GA000870 0 10_BIR:GA000879 0 3 41952810 42141252 3.26
1_BAG:GA000870 0 10_BIR:GA000879 0 3 42852945 43299952 3.23
1_BAG:GA000870 0 10_BIR:GA000879 0 3 61122372 61257787 4.28
1_BAG:GA000870 0 9_BIR:GA000878 0 3 63147235 63544990 4.17
1_BAG:GA000870 0 9_BIR:GA000878 0 3 86339676 86933974 4.58
1_BAG:GA000870 0 10_BIR:GA000879 0 3 202173634 202991051 10.04
I have sorted the columns using
sort -k1 -k3 file.txt > sorted_file.txt
Output:
1_BAG:GA000870 0 10_BIR:GA000879 0 1 202173634 202991051 10.04
1_BAG:GA000870 0 10_BIR:GA000879 0 1 41952810 42141252 3.26
1_BAG:GA000870 0 10_BIR:GA000879 0 1 42852945 43299952 3.23
1_BAG:GA000870 0 10_BIR:GA000879 0 1 61122372 61257787 4.28
1_BAG:GA000870 0 10_BIR:GA000879 0 2 202173634 202991051 10.04
1_BAG:GA000870 0 10_BIR:GA000879 0 2 41952810 42141252 3.26
1_BAG:GA000870 0 10_BIR:GA000879 0 2 42852945 43299952 3.23
1_BAG:GA000870 0 10_BIR:GA000879 0 2 61122372 61257787 4.28
1_BAG:GA000870 0 10_BIR:GA000879 0 3 202173634 202991051 10.04
1_BAG:GA000870 0 10_BIR:GA000879 0 3 41952810 42141252 3.26
1_BAG:GA000870 0 10_BIR:GA000879 0 3 42852945 43299952 3.23
1_BAG:GA000870 0 10_BIR:GA000879 0 3 61122372 61257787 4.28
1_BAG:GA000870 0 2_BAG:GA000871 0 1 163144718 163344256 4.39
1_BAG:GA000870 0 2_BAG:GA000871 0 1 188056283 188607409 3.15
1_BAG:GA000870 0 2_BAG:GA000871 0 1 234917668 235040966 4.42
1_BAG:GA000870 0 2_BAG:GA000871 0 1 98233953 98697761 3.13
1_BAG:GA000870 0 2_BAG:GA000871 0 2 163144718 163344256 4.39
1_BAG:GA000870 0 2_BAG:GA000871 0 2 188056283 188607409 3.15
1_BAG:GA000870 0 2_BAG:GA000871 0 2 234917668 235040966 4.42
1_BAG:GA000870 0 2_BAG:GA000871 0 2 98233953 98697761 3.13
1_BAG:GA000870 0 2_BAG:GA000871 0 3 163144718 163344256 4.39
1_BAG:GA000870 0 2_BAG:GA000871 0 3 188056283 188607409 3.15
1_BAG:GA000870 0 2_BAG:GA000871 0 3 234917668 235040966 4.42
1_BAG:GA000870 0 2_BAG:GA000871 0 3 98233953 98697761 3.13
1_BAG:GA000870 0 3_BAG:GA000872 0 1 10203774 10809556 3.22
1_BAG:GA000870 0 3_BAG:GA000872 0 1 112131164 112504018 5.47
1_BAG:GA000870 0 3_BAG:GA000872 0 1 158402969 158851710 4.65
1_BAG:GA000870 0 3_BAG:GA000872 0 1 215248496 215968924 4.23
1_BAG:GA000870 0 3_BAG:GA000872 0 1 44282371 46620360 8.09
1_BAG:GA000870 0 3_BAG:GA000872 0 1 58765794 59231535 4.36
1_BAG:GA000870 0 3_BAG:GA000872 0 1 76916027 77270614 3.57
1_BAG:GA000870 0 3_BAG:GA000872 0 2 10203774 10809556 3.22
1_BAG:GA000870 0 3_BAG:GA000872 0 2 112131164 112504018 5.47
1_BAG:GA000870 0 3_BAG:GA000872 0 2 158402969 158851710 4.65
1_BAG:GA000870 0 3_BAG:GA000872 0 2 215248496 215968924 4.23
1_BAG:GA000870 0 3_BAG:GA000872 0 2 44282371 46620360 8.09
1_BAG:GA000870 0 3_BAG:GA000872 0 2 58765794 59231535 4.36
1_BAG:GA000870 0 3_BAG:GA000872 0 2 76916027 77270614 3.57
1_BAG:GA000870 0 3_BAG:GA000872 0 3 10203774 10809556 3.22
1_BAG:GA000870 0 3_BAG:GA000872 0 3 112131164 112504018 5.47
1_BAG:GA000870 0 3_BAG:GA000872 0 3 158402969 158851710 4.65
1_BAG:GA000870 0 3_BAG:GA000872 0 3 215248496 215968924 4.23
1_BAG:GA000870 0 3_BAG:GA000872 0 3 44282371 46620360 8.09
1_BAG:GA000870 0 3_BAG:GA000872 0 3 58765794 59231535 4.36
1_BAG:GA000870 0 3_BAG:GA000872 0 3 76916027 77270614 3.57
1_BAG:GA000870 0 4_BIR:GA000873 0 1 84053552 84926168 3.44
1_BAG:GA000870 0 4_BIR:GA000873 0 2 84053552 84926168 3.44
1_BAG:GA000870 0 4_BIR:GA000873 0 3 84053552 84926168 3.44
1_BAG:GA000870 0 5_BIR:GA000874 0 1 53259825 53566799 3.79
1_BAG:GA000870 0 5_BIR:GA000874 0 2 53259825 53566799 3.79
1_BAG:GA000870 0 5_BIR:GA000874 0 3 53259825 53566799 3.79
1_BAG:GA000870 0 7_BIR:GA000876 0 1 30235795 30845165 3.27
1_BAG:GA000870 0 7_BIR:GA000876 0 1 69902487 70086221 3.16
1_BAG:GA000870 0 7_BIR:GA000876 0 2 30235795 30845165 3.27
1_BAG:GA000870 0 7_BIR:GA000876 0 2 69902487 70086221 3.16
1_BAG:GA000870 0 7_BIR:GA000876 0 3 30235795 30845165 3.27
1_BAG:GA000870 0 7_BIR:GA000876 0 3 69902487 70086221 3.16
1_BAG:GA000870 0 9_BIR:GA000878 0 1 63147235 63544990 4.17
1_BAG:GA000870 0 9_BIR:GA000878 0 1 86339676 86933974 4.58
1_BAG:GA000870 0 9_BIR:GA000878 0 2 63147235 63544990 4.17
1_BAG:GA000870 0 9_BIR:GA000878 0 2 86339676 86933974 4.58
1_BAG:GA000870 0 9_BIR:GA000878 0 3 86339676 86933974 4.58
1_BAG:GA000870 0 9_BIR:GA000878 0 3 63147235 63544990 4.17
How can I get the following output?
1_BAG:GA000870 10_BIR:GA000879 62.43 12
1_BAG:GA000870 2_BAG:GA000871 45.27 12
1_BAG:GA000870 3_BAG:GA000872 100.77 21
…
jjseq is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.