I have a function that is called for every line of a text.
def tokenize_line(line: str, cmd = ''):
matches = re.finditer(Patterns.SUPPORTED_TOKENS, line)
tokens_found, not_found, start_idx = [], [], 0
print(matches)
for match in matches:
pass
# Rest of code
The result of print(matches)
is something like: <callable_iterator object at 0x0000021201445000>
However, when I convert the iterator into a list:
matches = list(re.finditer(Patterns.SUPPORTED_TOKENS, line))
or when I iterate with for
:
for match in matches:
print(match)
…Python freezes.
This issue occurs inconsistently. For example:
tokenize_line('$color AS $length') # Works fine
tokenize_line('FALSE + $length IS GT 7 + $length IS 4') # Freezes
So, the problem arises when converting the callable_iterator into a list or iterating over it.
Here is the pattern (Patterns.SUPPORTED_TOKENS) I’m using:
(°pd+°|°ad+°|°md+°)|((?<!S)(?:!'(?:\.|[^'n\])*'|!"(?:\.|[^n"\])*")(?!S))|((?:'(?:\.|[^'n\])*'|"(?:\.|[^n"\])*"))|(({(.*)}))|((?<!S)([@$][w]*(?:.[w]*)*)(?!S))|((?<!d)-?d*.?d+)|(**|[+-*()/%^]|==|&&||||!=|>=|<=|>|<|~~|!~~|::|!::)|([:/])|(b(?:AS|AND|AT|:|BETWEEN|BY|FROM|IN|INTO|ON|OF|OR|THAN|TO|USING|WITH)b)|(b[a-zA-Z_][a-zA-Z0-9_]* *((?:[^;()'""]*|"(?:[^"\]|\.)*"|'(?:[^'\]|\.)*'|([^)]*))*?;))|((b(?:EMPTY|STRING|NUMBER|BOOL|ARRAY|MAP|TRUE|FALSE|NULL|UNKNOWN|DOTALL|IGNORECASE|MULTILINE|ARRAY_ARRAY|ARRAY_STRING|ARRAY_MAP|ARRAY_NUMBER|ARRAY_NULL|DOT|SPACE|NEWLINE|SEMICOLON|COLON|HASH|COMMA|TAB)b)|(b(?:IS NOT LT|IS NOT GT|IS NOT GEQ|IS NOT LEQ|IS NOT|IS LT|IS GT|IS GEQ|IS LEQ|IS|NOT IN|NOT|IN|HAS NOT|HAS|AND|OR)b))
Explanation of the Regular Expression Pattern:
Custom Tokens: Matches specific custom tokens that start with particular characters and are followed by digits.
Quoted Strings: Matches both single and double-quoted strings, including those with escape characters.
Curly Braces Content: Matches anything enclosed in curly braces.
Variables: Matches variables that start with specific characters (like @ or $) and can include dots for nested properties.
Numbers: Matches both integers and floating-point numbers, including negative numbers.
Operators: Matches various mathematical and logical operators.
Colons and Slashes: Matches specific punctuation characters like colons and slashes.
Keywords: Matches certain keywords that are reserved in the language.
Function Definitions: Matches function definitions or similar structures, ensuring they follow specific syntax rules.
Data Types and Modifiers: Matches keywords that represent data types or modifiers.
Logical Operators: Matches complex logical operators used in conditional expressions.
Any help to understand why this happens and how to fix it would be greatly appreciated!
Martin A. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.