This is somewhat a language specific answer, but no language is stated in the question.
The book “Dive Into Python” suggests implementing comments using Verbose Regular Expressions:
Python allows you to do this with something called verbose regular
expressions. A verbose regular expression is different from a compact
regular expression in two ways:
- Whitespace is ignored. Spaces, tabs, and carriage returns are not
matched as spaces, tabs, and carriage returns. They’re not matched at
all. (If you want to match a space in a verbose regular expression,
you’ll need to escape it by putting a backslash in front of it.)
- Comments are ignored. A comment in a verbose regular expression is
just like a comment in Python code: it starts with a #
character and
goes until the end of the line. In this case it’s a comment within a
multi-line string instead of within your source code, but it works the
same way.
Example:
M{0,4} # thousands - 0 to 4 M's
(CM|CD|D?C{0,3}) # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's),
# or 500-800 (D, followed by 0 to 3 C's)
(XC|XL|L?X{0,3}) # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's),
# or 50-80 (L, followed by 0 to 3 X's)
(IX|IV|V?I{0,3}) # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's),
# or 5-8 (V, followed by 0 to 3 I's)
>>> re.search(pattern, 'M', re.VERBOSE) 1
<code>>>> pattern = """
^ # beginning of string
M{0,4} # thousands - 0 to 4 M's
(CM|CD|D?C{0,3}) # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's),
# or 500-800 (D, followed by 0 to 3 C's)
(XC|XL|L?X{0,3}) # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's),
# or 50-80 (L, followed by 0 to 3 X's)
(IX|IV|V?I{0,3}) # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's),
# or 5-8 (V, followed by 0 to 3 I's)
$ # end of string
"""
>>> re.search(pattern, 'M', re.VERBOSE) 1
</code>
>>> pattern = """
^ # beginning of string
M{0,4} # thousands - 0 to 4 M's
(CM|CD|D?C{0,3}) # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's),
# or 500-800 (D, followed by 0 to 3 C's)
(XC|XL|L?X{0,3}) # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's),
# or 50-80 (L, followed by 0 to 3 X's)
(IX|IV|V?I{0,3}) # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's),
# or 5-8 (V, followed by 0 to 3 I's)
$ # end of string
"""
>>> re.search(pattern, 'M', re.VERBOSE) 1
Source and further details here
This method has a slight disadvantage that the caller must know that the pattern is written in a verbose format and call it accordingly.
Commenting regular expressions
Are there any common practises for commenting the regular expressions: inline comments referring different part of RegEx or general comment for all expression?
4
In my view, a good practice is to concisely state in comments what the general idea of the regular expression is. This saves other developers (or sometimes yourself) the hassle of copy-pasting the regex in a parser like RegExr, only to understand what it does.
4
This is somewhat a language specific answer, but no language is stated in the question.
The book “Dive Into Python” suggests implementing comments using Verbose Regular Expressions:
Example:
Source and further details here
This method has a slight disadvantage that the caller must know that the pattern is written in a verbose format and call it accordingly.
4
Typically, I will write a regex and not explain the individual pieces of the regex, but rather what it’s purpose is. That is that what and why. This is a bit like asking “What should my comments look like?” to which one would say “Don’t write what the code is doing, write why the code is doing what it does“
Unless you are trying to teach someone about regexes via comments in code, I don’t think explaining what each individual piece will do. When working with other programmers, you can safely assume that one would know something as global regular expressions.
1
I guess it really depends on how you’re putting the regex together. Generally speaking I think it would be a bad idea to put comments within the actual regex string itself (not possible in most scenarios, as far as I know). If you really need to comment specific portions of a regular expression (are you trying to teach someone?), then break each chunk into separate strings on their own lines, and comment each line using the normal commenting process for your programming language. Otherwise, pleinolijf’s answer is pretty good.
example:
I usually define a string constant whose name describes the overall purpose of the regular expression.
For example:
You could add a comment above this constant to give it a description, but usually the constant name itself should be enough.
1
In some scenarios, the developer(s) may be using regular expressions to match text outside of their typical domain. The original developers may have gone through a lot of iterations capturing various edge cases that might only have been discovered through that iterative process. Thus, subsequent developers may not be aware of a lot of the edge cases that the original developer(s) dealt with, even if they are aware of the general case.
In cases such as these, it may be worthwhile to document examples of the variations. The location of this documentation may vary depending on amount (e.g., not necessarily in the code).
One way to approach it is to assume that future developers will only have basic knowledge, like how regular expressions work, but not any knowledge that you either (1) had prior to development of the regular expressions that wouldn’t necessarily be known to the future developers or (2) knowledge that you gained during development (e.g., edge cases that were discovered).
For example, if during development you say something like “Oh, I didn’t know that X could take this form,” then it’s worth documenting that (and maybe the part of the regex that handles that variation).
Comments should add useful information that is not obvious from the code.
There are few applications that need every last cycle, if you are pattern matching massive data sets then maybe there is a better way, maybe not, but for most things the extra execution time is not that big a deal.
And remember the next person to come across your code and fix a bug might be you in six months time and there is no way you are going to remember what it was supposed to do.
Extract the RegEx into a separate class into a with a meaningful name. Then I’d document the code with automated tests.
This will ensure
Naturally, your class may host several regex’s.
Filed under: softwareengineering - @ 14:44
Thẻ: coding-style, comments, documentation