While parsing a text file with hundreds of lines using Python 3.12, I have been stumbling on how to extract substrings from different line strings that have multiple parentheses, such as the following:
line1: 'group(something)'
line2: 'other(something) and group(something,something,something) and not group(something,something)'
I’m only interested in extracting the ‘something’/s from the ‘group()’ or ‘not group()’.
I have tried to define and use a function that can be able to handle the extraction of the ‘something’ inside of each ‘group()’:
def find_within(search_in, search_for, until):
return search_in[search_in.find(search_for) + len(search_for) : search_in.find(until)]
group1 = find_within(line1, 'group(', ')') # extracting group from the first line
group2 = find_within(line2, 'and group(', ')') # extracting group from the second line
not_group = find_within(line2, 'not group(', ')') # extracting 'not group' from the second line
This function can handle extraction from the lines that have only a single ‘group()’ like line1 but not the lines with multiple items like line2. It would spit out null instead.
I tried to modify the function by using ‘rfind’ as in:
def find_within(search_in, search_for, until):
return search_in[search_in.find(search_for) + len(search_for) : search_in.rfind(until)]
but the output was something like this: something,something,something) and not group(something,something
, while I expected:
group1 = 'something'
group2 = 'something,something,something'
not_group = 'something,something'
It seems like the .find() looks for the first occurrence of the parenthesis, so when I’m seeking to extract the ‘something’ that is in the middle of the string, the .find() looks at the closing parenthesis from a pair at the beginning of the string ahead of the ‘group(something)’, so it comes up with a ”.
And the .rfind() looks for the last occurrence of the parenthesis, so it extracts everything until the last parenthesis in the string.
Is there a better way of dealing with .find() or do I need to resort to RegEx?
Thank you in advance.