The reStructuredText parser is implemented as a recursive state machine,
examining its input one line at a time. To understand how the parser works,
please first become familiar with the docutils.statemachine module. In the
description below, references are made to classes defined in this module;
please see the individual classes for details.
Parsing proceeds as follows:
- The state machine examines each line of input, checking each of the
transition patterns of the state Body, in order, looking for a match.
The implicit transitions (blank lines and indentation) are checked before
any others. The 'text' transition is a catch-all (matches anything).
- The method associated with the matched transition pattern is called.
- Some transition methods are self-contained, appending elements to the
document tree (Body.doctest parses a doctest block). The parser's
current line index is advanced to the end of the element, and parsing
continues with step 1.
- Other transition methods trigger the creation of a nested state machine,
whose job is to parse a compound construct ('indent' does a block quote,
'bullet' does a bullet list, 'overline' does a section [first checking
for a valid section header], etc.).
- In the case of lists and explicit markup, a one-off state machine is
created and run to parse contents of the first item.
- A new state machine is created and its initial state is set to the
appropriate specialized state (BulletList in the case of the
'bullet' transition; see SpecializedBody for more detail). This
state machine is run to parse the compound element (or series of
explicit markup elements), and returns as soon as a non-member element
is encountered. For example, the BulletList state machine ends as
soon as it encounters an element which is not a list item of that
bullet list. The optional omission of inter-element blank lines is
enabled by this nested state machine.
- The current line index is advanced to the end of the elements parsed,
and parsing continues with step 1.
- The result of the 'text' transition depends on the next line of text.
The current state is changed to Text, under which the second line is
examined. If the second line is:
- Indented: The element is a definition list item, and parsing proceeds
similarly to step 2.B, using the DefinitionList state.
- A line of uniform punctuation characters: The element is a section
header; again, parsing proceeds as in step 2.B, and Body is still
used.
- Anything else: The element is a paragraph, which is examined for
inline markup and appended to the parent element. Processing
continues with step 1.