I have the following code:
number_lookaheads = '|'.join(f'(?={number})' for number in code_network)
stationary_phone_regex = rf'(({number_lookaheads})[d]{{8,18}})*(?=D)'
stationary_phone_matches = re.finditer(stationary_phone_regex,text_nospaces)
stationary_phone_matches = [match.group() for match in stationary_phone_matches if match.group()]
Basically what it does:
- creates a string, of the format
'(?=123)|(?=456)|(?=789)'
- creates regex, where it matches a sequence of digits with a length between 8 and 18, if the lookahead is matched (number_lookaheads)
- finds ind all non-overlapping matches of the regex pattern
- creates a list of matched strings from the iterator produced by
re.finditer()
, without empty matches.
This code works on string without spaces.
import re
# Example list of Vorwahlen
code_network = ['123', '456', '789'] # replace with actual Vorwahlen
number_lookaheads = '|'.join(f'(?={number})' for number in code_network)
stationary_phone_regex = rf'(({number_lookaheads})[d]{{8,18}})*(?=D)'
text = 'Some sample text with numbers like 12345678 and 43789654321 and 33333312345671333333 and 12345679'
text_nospaces = text.replace(" ", "").replace("/", "").replace("-", "").replace('(0)','')
text_nospaces += ' '
stationary_phone_matches = re.finditer(stationary_phone_regex,text_nospaces)
stationary_phone_matches = [match.group() for match in stationary_phone_matches if match.group()]
stationary_phone_matches
but unfortunately, it also extracts substrings from the string (33333312345671333333), which is not desired.
What I would like to achieve: I want to extract numbers:
- which stars from the code_network and follows stationary_phone_regex
- which starts from 43, follows by code_network and then follows stationary_phone_regex.
So in my case it should extract only:
- 12345678
- 43789654321
- 12345679
How can I do it