I am using regex to parse the contents of a file using python. I use re.sub
to remove comments before parsing, by removing any text from any line beginning with #
.
import re
filetext = """
# This is the contents of a file
var: 800 # A var I defined
# This is a comment
# Indented comment
# Use "=" to set parameters explicitly
param1 = 123
param2 = 456
# Block of code
block1: {
# var_name = val
var1 = 7
var2 = 9
}
# Block of code 2
block2: {
# var_name = val
var1 = 7
var2 = 9
}
"""
scrubbed_text = re.sub(r'#.*', '', filetext, re.M)
print(scrubbed_text)
Output:
var: 800
param1 = 123
param2 = 456
block1: {
var1 = 7
var2 = 9
}
block2: {
# var_name = val
var1 = 7
var2 = 9
}
It removes every single comment from the text except for the one inside “block2” at the end. If I remove re.M
from the re.sub
call, it removes all the comments which is what I want it to do.
This behavior baffles me. Why does re.M
multiline mode cause the one match to be skipped?
Why does re.sub not work for all matches in multiline mode?
I am using regex to parse the contents of a file using python. I use
re.sub
to remove comments before parsing, by removing any text from any line beginning with#
.Output:
It removes every single comment from the text except for the one inside “block2” at the end. If I remove
re.M
from there.sub
call, it removes all the comments which is what I want it to do.This behavior baffles me. Why does
re.M
multiline mode cause the one match to be skipped?Filed under: Kiến thức lập trình - @ 23:52
Thẻ: pythonregexcommentspython-re
« Can I convert value with multiple format: text, number, date, custom to number at the same time in Excel? ⇐ More Pages ⇒ ‘ML’ used as a predicate or function »