I have the following script trying to get the first occurrence of the word “symbol”, but it keeps returning the last one (print “symbol in the middle and end”). How can I achieve that with re.search? I need to use re.search so that I can get the contents before and after the first “symbol”.
test_string = "I have three symbol but I want the first occurence of symbol instead of the symbol in the middle and end"
regex = "[sS]*(symbol[sS]*)"
match = re.search(regex, test_string)
if match:
result = match.group(1)
print(result)
5
Your current pattern has multiple issues, one of which is that you should be using a lazy dot to find the first occurrence of symbol
.
test_string = "I have three symbol but I want the first occurence of symbol instead of the symbol in the middle and end"
match = re.search(r'.*?bsymbol (.*?)(?=(?: bsymbolb|$))', test_string)
if match:
result = match.group(1)
print(result) # but I want the first occurence of
While all the methods are relatively fast for small strings, the performance difference becomes more pronounced with larger strings or complex operations (regular expressions or manual looping). partition() remains one of the fastest, as it is optimized for splitting based on a delimiter.
Solution using partition():
test_string = "I have three symbol but I want the first occurrence of symbol instead of the symbol in the middle and end"
before, word, after = test_string.partition("symbol")
print(f"Text before: {before}")
print(f"Text after: {after}")
Solution using re.search():
import re
test_string = "I have three symbol but I want the first occurrence of symbol instead of the symbol in the middle and end"
regex = "(.*?)symbol(.*)"
match = re.search(regex, test_string)
if match:
before = match.group(1) # Text before the first "symbol"
after = match.group(2) # Text after the first "symbol"
print(f"Before the first 'symbol': {before}")
print(f"After the first 'symbol': {after}")
else:
print("No match found.")
Solution using find():
test_string = "I have three symbol but I want the first occurrence of symbol instead of the symbol in the middle and end"
index = test_string.find("symbol")
if index != -1:
print(f"Text before: {test_string[:index]}")
print(f"Text after: {test_string[index + len('symbol'):]}")
Solution using split():
test_string = "I have three symbol but I want the first occurrence of symbol instead of the symbol in the middle and end"
before, after = test_string.split("symbol", 1)
print(f"Text before: {before}")
print(f"Text after: {after}")
Solution using index():
test_string = "I have three symbol but I want the first occurrence of symbol instead of the symbol in the middle and end"
index = test_string.index("symbol")
before = test_string[:index]
after = test_string[index + len("symbol"):]
print(f"Text before: {before}")
print(f"Text after: {after}")
You don’t need re.search
for this. If you want the text before and after the first occurrence of "symbol"
, you can just use str.partition
:
before, match, after = test_string.partition('symbol')
if match:
do_whatever_with(before, after)
And if you were searching for something that actually needed a regex, the way to go would be
match = re.search(r'symbol', string) # pretending this needs a regex
if match:
before = string[:match.start()]
after = string[match.end():]
rather than trying to use [sS]*
to match what comes before and after.
The problem is that [sS]*
matches every character that is either a space (s
) or is not a space (S
). This is equivalent to “any character” (.
). And it matches as many of those as possible (*
), so the symbol
will be as late possible.
Remove them.
test_string = "I have three symbol but I want the first occurence of symbol instead of the symbol in the middle and end"
regex = "(symbol.*)"
match = re.search(regex, test_string)
if match:
result = match.group(1)
print(result)
# Prints:
# symbol but I want the first occurence of symbol instead of the symbol in the middle and end
0