I have a like this =>
list_o_text= [ ‘Random string 1 2 3 45 6789 999999 22222’, ‘Example tech report 444444’ ]
Every string in list_o_text contains definitely contains one or more 5 or 6-digit numbers.
I recently found the RE module. Yet I’m having problems finding the proper function to search for them.
Attempt with findall()
import re
list_o_text= [ 'Random string 1 2 3 45 6789 999999 22222', 'Example tech report 444444' ]
for n in range(len(list_o_text)):
find = re.findall('d{5}+',list_o_text[n])
print(find)
OUTPUT:
[‘99999’, ‘22222’]
[‘44444’]
Note: the six-digit number ‘999999’ is not found in its entirety
Attempt with search()
import re
list_o_text= [ 'Random string 1 2 3 45 6789 999999 22222', 'Example tech report 444444' ]
for n in range(len(list_o_text)):
find = re.search('d{5}+',list_o_text[n])
print(find
OUTPUT:
<re.Match object; span=(28, 33), match='99999'>
<re.Match object; span=(20, 25), match='44444'>
Note: gives positions and on top of that the ranges don’t account for 6-digit numbers
Attempt with search().group()
import re
list_o_text= [ 'Random string 1 2 3 45 6789 999999 22222', 'Example tech report 444444' ]
for n in range(len(list_o_text)):
find = re.search('d{5}+',list_o_text[n]).group()
print(find)
OUTPUT:
99999
44444
Note: the six-digit number ‘999999’ is not found in its entirety
CONVOLUTED SOLUTION
I used all three methods, yet can’t shake the feeling that it could be simpler.
INPUT:
import re
list_o_text= [ 'Random string 1 2 3 45 6789 999999 22222', 'Example tech report 444444' ]
for n in range(len(list_o_text)):
find_all = re.findall('d{5}+',list_o_text[n])
#1st loop result is ['99999','22222']
for five_d_num in find_all:
find_start = re.search(five_d_num,list_o_text[n]).start()
find = re.search('d+',list_o_text[n][find_start: ]).group()
print(find)
OUTPUT:
999999
22222
444444
Cenc is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
The pattern d{5}+
is not what you need, you want d{5,6}
.
I highly recommend regex101.com to construct and test regex pattern. The site comes with a detailed breakdown of the pattern’s components.
If you only need 5 and 6-digit numbers, then [0-9]{5,6}
can be used. But, if you want 5 digits or more you can use [0-9]{5,}
.
You can use {}
in the format of {min_count, max_count}
, right after a char class []
or a group (e.g., (?:)
or ()
).
import re
list_o_text = ['Random string 1 2 3 45 6789 999999 22222',
'Example tech report 444444', 'Example tech report 7777777 8888888888888']
output_a = re.findall(r'[0-9]{5,}', ' '.join(list_o_text))
output_b = ' '.join(output_a)
print(output_a)
print(output_b)
There are two outputs in the code, for whichever you prefer.
Prints:
[‘999999’, ‘22222’, ‘444444’, ‘7777777’, ‘8888888888888’]
999999 22222 444444 7777777 8888888888888