I have the following input file:
VARA {APPLE}
VARB {
ORANGE
BANANA
}
VARC {
KIWI{0}
KIWI{1}
BERRY
}
VARD {CHERRY{0} MELON}
I would like to get the following output when processing this file:
VARA APPLE
VARB ORANGE BANANA
VARC KIWI{0} KIWI{1} BERRY
VARD CHERRY{0} MELON
Before writing a script, I wonder if there is any elegant way to solve this by just using grep. Otherwise, I would probably read the input file character by character, count the number of parenthesis and thereby find the matching parenthesis and print the above text.
There will be no way to do this using grep, since there are also curly brackets you do not want to get rid of, and you cannot distinguish them using regex only.
user1336332 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
One way to do this could be using multiple steps, which makes it much easier (to read and to write regex):
import re
A = """
VARA {APPLE}
VARB {
ORANGE
BANANA
}
VARC {
KIWI{0}
KIWI{1}
BERRY
}
VARD {CHERRY{0} MELON}
"""
A = re.sub(r'(?m)^(S+)s+{([^rn]+)}$', r'1 2', A)
A = re.sub(r's{2,}', ' ', A)
A = re.sub(r'{s+', '', A)
A = re.sub(r's+}', '', A)
print(A)
Prints
VARA APPLE
VARB ORANGE BANANA
VARC KIWI{0} KIWI{1} BERRY
VARD CHERRY{0} MELON
(?m)^(S+)s+{([^rn]+)}$
removes the inline{}
.s{2,}
removes all multiline extra spaces.{s+
ands+}
removes the remaining multiline{}
.