This is the code for my project, with self.keywords
being a list containing keywords
escaped_keywords = list(map(re.escape, self.keywords))
joined_keywords = '|'.join(escaped_keywords)
pattern = rf"\b({joined_keywords})\b|([^\w\s])"
temp = re.split(pattern, line)
In the pattern
variable, lies well, the pattern. Now I need a pattern to treat everything within "
and '
as a string, meaning even if symbols show up within the string, like this one: "hello, world!"
, it won’t split the ,
and !
.
The pattern right now, if fed "hello, world!"
it would return:
'"', 'hello', ',', 'world', '!', '"'
While the desired output would be:
'"', 'hello, world!', '"'
I tried this: "(.?)"|'(.?)'
I tried to match the quoted string, but I have no idea what I’m doing since I used a regex playground to experiment. This did work though, capturing the "
and '
but, who,e doing so, it also captured all the symbols inside of the string:
"Hello, World!"
from this, it captured all instances of: "
, ,
and !
which I don’t want, since the pattern is supposed to ignore the things within the strings, including '
and every other symbol.