There is a rule in writing: When you break a section into subsections, typically it should have more than one subsection. This goes back to the old high school composition rules: if you have an A. section you also need a B. section.
To check a chapter of a book I’m writing in LaTeX I do this:
$ egrep -h '^\(chapter|(sub)*section)**{' CHAPTERFILE.tex | sed -e 's/*{/{/g'
chapter{My Chapter}
section{Topic}
subsection{Subtopic}
subsection{Another Subtopic}
subsection{Subtopic the Third}
section{More}
subsection{More more}
subsection{Less is more}
section{Odyssey}
subsection{Marge and Homer}
subsubsection{Bart}
subsubsection{Lisa}
subsubsection{Maggie}
subsection{Example}
section{Lore}
section{Data}
What is an efficient algorithm to verify the rule has been obeyed? This rule goes for sub sections in a section, subsub sections in a subsection, and so on. (Though in my case there are no subsubsub sections)
Thanks to Bart’s advice, here is the code I created. It is hacky in places but it is simple and found bugs in my LaTeX files:
#!/usr/bin/env python
import sys
import fileinput
TOKENS = [
['chapter', 0],
['section', 1],
['subsection', 2],
['subsubsection', 3],
['subsubsubsection', 4],
]
def getlevel(line):
"""Report the line's level (number of subsections)."""
for token, level in TOKENS:
if line.startswith(token):
return level
raise Exception()
def store(lst, level, value):
"""Store value in the level'th position."""
while len(lst) < level:
lst.append('EMPTY')
return lst[:level] + [value]
def segments():
"""Read the input text, yield lists of segments."""
path = []
last_level = 0
for line in fileinput.input():
value = line[line.index('{') + 1 : line.index('}')]
path = store(path, getlevel(line), value)
yield path
def get_tree(segs):
"""Build a tree out of segment lists returned by generator segs"""
tree = {}
for s in segs():
t = tree
for i in s:
t = t.setdefault(i, {})
return tree
def audit(t):
"""Audit: Leafs are ok, nodes are ok if they have >1 children."""
error = False
for k, v in t.items():
if audit(v):
error = True
if len(v) == 1:
print 'ERROR: part with only one subpart: %s' % (k, )
error = True
return error
def main():
tree = get_tree(segments)
if audit(tree):
return 1
if __name__ == '__main__':
sys.exit(main())
What you describe is a tree structure where each non-leaf node has at least two children. As leaf nodes don’t have any children (by definition), once you have built your tree, go over all the nodes and verify that none of them have only a single child.
To build your tree, you can use the structure of the LaTeX sectioning commands by counting the number of occurrences of sub
in the sectioning command to determine on which level in the tree it belongs.