I have a text file with the following format:
>1
ABCXYZ...
>2
LMNOPQRS...
>3
Where e.g. ABCXYZ… etc goes on for thousands of lines.
I have been using a python script to split these files by the “>” and then count the number of lines and characters between each >. However, the files are very large and the python script takes a long time to run. Is there a simple way to do this using awk or something else in the Linux command line?
2
This is by far not a complete answer, but using grep -n
you can show the line number of the line where the >
character appears.
From that point on, you can start using awk
to perform calculations on those numbers:
Prompt>grep -n ">" file1.txt
1:>1
3:>2
5:>3
Prompt>grep -n ">" file1.txt | awk -F ":" '{print $1}'
1
3
5
…