This questions attempts to extend the code from Jason S (Thanks, Jason) , found here: /questions/39171989/docutils-traverse-sections
Original from Jason S
import docutils
def doctree_resolved(app, doctree, docname):
for section in doctree.traverse(docutils.nodes.section):
title = section.next_node(docutils.nodes.Titular)
if title:
print title.astext()
def setup(app):
app.connect('doctree-resolved', doctree_resolved)
Now, suppose I want to capture the text of only H2 subsections (or at least all subsections if that’s the only option).
In theory, I’m trying to create a dictionary of subsection titles, with their respective urls/paths.
What is the best way to do so?
My revision -to original above- is unsuccessful, but hopefully, you can understand my approach by reading the code. A simple for-loop using list-building is not successful, I believe, because of the line title = section.next_node(docutils.nodes.Titular)
, specifically, next_node...
. I looked but cannot find documentation on next_node
.
So, what might be a different way to achieve capturing the H2 subsections from each document so I can build a dictionary where each subsection has a path/url? NOTE: I have not yet attempted to construct the full URL in the code below.
import docutils
docname_list = []
section_list = []
def doctree_resolved(app, doctree, docname):
for section in doctree.traverse(docutils.nodes.section):
title = section.next_node(docutils.nodes.Titular)
if title:
print(title.astext())
docname_list.append(docname)
section_list.append(title.astext())
...
url_dict = dict(zip(docname_list, section_list)
...