Commenting regular expressions

Are there any common practises for commenting the regular expressions: inline comments referring different part of RegEx or general comment for all expression?

4

In my view, a good practice is to concisely state in comments what the general idea of the regular expression is. This saves other developers (or sometimes yourself) the hassle of copy-pasting the regex in a parser like RegExr, only to understand what it does.

4

This is somewhat a language specific answer, but no language is stated in the question.

The book “Dive Into Python” suggests implementing comments using Verbose Regular Expressions:

Python allows you to do this with something called verbose regular
expressions. A verbose regular expression is different from a compact
regular expression in two ways:

  • Whitespace is ignored. Spaces, tabs, and carriage returns are not
    matched as spaces, tabs, and carriage returns. They’re not matched at
    all. (If you want to match a space in a verbose regular expression,
    you’ll need to escape it by putting a backslash in front of it.)
  • Comments are ignored. A comment in a verbose regular expression is
    just like a comment in Python code: it starts with a # character and
    goes until the end of the line. In this case it’s a comment within a
    multi-line string instead of within your source code, but it works the
    same way.

Example:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
<code>>>> pattern = """
^ # beginning of string
M{0,4} # thousands - 0 to 4 M's
(CM|CD|D?C{0,3}) # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's),
# or 500-800 (D, followed by 0 to 3 C's)
(XC|XL|L?X{0,3}) # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's),
# or 50-80 (L, followed by 0 to 3 X's)
(IX|IV|V?I{0,3}) # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's),
# or 5-8 (V, followed by 0 to 3 I's)
$ # end of string
"""
>>> re.search(pattern, 'M', re.VERBOSE) 1
</code>
<code>>>> pattern = """ ^ # beginning of string M{0,4} # thousands - 0 to 4 M's (CM|CD|D?C{0,3}) # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's), # or 500-800 (D, followed by 0 to 3 C's) (XC|XL|L?X{0,3}) # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's), # or 50-80 (L, followed by 0 to 3 X's) (IX|IV|V?I{0,3}) # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's), # or 5-8 (V, followed by 0 to 3 I's) $ # end of string """ >>> re.search(pattern, 'M', re.VERBOSE) 1 </code>
>>> pattern = """
^                   # beginning of string
M{0,4}              # thousands - 0 to 4 M's
(CM|CD|D?C{0,3})    # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's),
                    #            or 500-800 (D, followed by 0 to 3 C's)
(XC|XL|L?X{0,3})    # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's),
                    #        or 50-80 (L, followed by 0 to 3 X's)
(IX|IV|V?I{0,3})    # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's),
                    #        or 5-8 (V, followed by 0 to 3 I's)
$                   # end of string
"""
>>> re.search(pattern, 'M', re.VERBOSE)                1

Source and further details here

This method has a slight disadvantage that the caller must know that the pattern is written in a verbose format and call it accordingly.

4

Typically, I will write a regex and not explain the individual pieces of the regex, but rather what it’s purpose is. That is that what and why. This is a bit like asking “What should my comments look like?” to which one would say “Don’t write what the code is doing, write why the code is doing what it does

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
<code>// Strip the leading "?" and remove the query parameters "offset=<integer>" & "count=<integer> so we have a pattern of the request"
var search = location.search.substring(1).replace(/offset=[0-9]+?&/g, "").replace(/count=[0-9]+?&/g, "");
</code>
<code>// Strip the leading "?" and remove the query parameters "offset=<integer>" & "count=<integer> so we have a pattern of the request" var search = location.search.substring(1).replace(/offset=[0-9]+?&/g, "").replace(/count=[0-9]+?&/g, ""); </code>
// Strip the leading "?" and remove the query parameters "offset=<integer>" & "count=<integer> so we have a pattern of the request"          
var search = location.search.substring(1).replace(/offset=[0-9]+?&/g, "").replace(/count=[0-9]+?&/g, "");

Unless you are trying to teach someone about regexes via comments in code, I don’t think explaining what each individual piece will do. When working with other programmers, you can safely assume that one would know something as global regular expressions.

1

I guess it really depends on how you’re putting the regex together. Generally speaking I think it would be a bad idea to put comments within the actual regex string itself (not possible in most scenarios, as far as I know). If you really need to comment specific portions of a regular expression (are you trying to teach someone?), then break each chunk into separate strings on their own lines, and comment each line using the normal commenting process for your programming language. Otherwise, pleinolijf’s answer is pretty good.

example:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
<code>string myregex = "s" // Match any whitespace once
+ "n" // Match one newline character
+ "[a-zA-Z]"; // Match any letter
</code>
<code>string myregex = "s" // Match any whitespace once + "n" // Match one newline character + "[a-zA-Z]"; // Match any letter </code>
string myregex = "s" // Match any whitespace once
+ "n"  // Match one newline character
+ "[a-zA-Z]";  // Match any letter

I usually define a string constant whose name describes the overall purpose of the regular expression.

For example:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
<code>const string FloatingPointNumberPattern = @"[-+]?[0-9]*.?[0-9]+";
</code>
<code>const string FloatingPointNumberPattern = @"[-+]?[0-9]*.?[0-9]+"; </code>
const string FloatingPointNumberPattern = @"[-+]?[0-9]*.?[0-9]+";

You could add a comment above this constant to give it a description, but usually the constant name itself should be enough.

1

In some scenarios, the developer(s) may be using regular expressions to match text outside of their typical domain. The original developers may have gone through a lot of iterations capturing various edge cases that might only have been discovered through that iterative process. Thus, subsequent developers may not be aware of a lot of the edge cases that the original developer(s) dealt with, even if they are aware of the general case.

In cases such as these, it may be worthwhile to document examples of the variations. The location of this documentation may vary depending on amount (e.g., not necessarily in the code).

One way to approach it is to assume that future developers will only have basic knowledge, like how regular expressions work, but not any knowledge that you either (1) had prior to development of the regular expressions that wouldn’t necessarily be known to the future developers or (2) knowledge that you gained during development (e.g., edge cases that were discovered).

For example, if during development you say something like “Oh, I didn’t know that X could take this form,” then it’s worth documenting that (and maybe the part of the regex that handles that variation).

Comments should add useful information that is not obvious from the code.

  1. Make it easy to understand what the expression is supposed to do at a requirements level, either in the code itself or in a comment. What is the intent behind the expression, is it to validate email addresses or pick out Canadian phone numbers.
  2. Make it easy to understand what the expression is actually doing, i.e. what the expression evaluates to. First try to make it clear by splitting up the expression, if you first check for all hyphens then remove all numbers then make that a two part expression with variables holding the intermediary values, it will make it much easier to read and the reader will be able to step through your logic one step at a time. (There is a famous answer to a question on SE where someone is trying to decipher some old code which involves bit manipulation ‘>>’ and finding out if certain flags are set where the answer lays out not only what the code really does but how the question’s authour should go about deconstructing this kind of code in the future which is exactly what I am trying to describe but I can’t seem to find the question to link to)

There are few applications that need every last cycle, if you are pattern matching massive data sets then maybe there is a better way, maybe not, but for most things the extra execution time is not that big a deal.

And remember the next person to come across your code and fix a bug might be you in six months time and there is no way you are going to remember what it was supposed to do.

Extract the RegEx into a separate class into a with a meaningful name. Then I’d document the code with automated tests.

This will ensure

  • That the code actually works – also for corner cases
  • Ensures that a quick “bugfix” doesn’t screw up a lot of corner cases
  • May document optimizations where backtracking is disabled

Naturally, your class may host several regex’s.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị
Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa
Thiết kế website Thiết kế website Thiết kế website Cách kháng tài khoản quảng cáo Mua bán Fanpage Facebook Dịch vụ SEO Tổ chức sinh nhật