Parse a grid table using parse().
Here's an example of a grid table:
+------------------------+------------+----------+----------+ | Header row, column 1 | Header 2 | Header 3 | Header 4 | +========================+============+==========+==========+ | body row 1, column 1 | column 2 | column 3 | column 4 | +------------------------+------------+----------+----------+ | body row 2 | Cells may span columns. | +------------------------+------------+---------------------+ | body row 3 | Cells may | - Table cells | +------------------------+ span rows. | - contain | | body row 4 | | - body elements. | +------------------------+------------+---------------------+
Intersections use '+', row separators use '-' (except for one optional head/body row separator, which uses '='), and column separators use '|'.
Passing the above table to the parse() method will result in the following data structure:
([24, 12, 10, 10], [[(0, 0, 1, ['Header row, column 1']), (0, 0, 1, ['Header 2']), (0, 0, 1, ['Header 3']), (0, 0, 1, ['Header 4'])]], [[(0, 0, 3, ['body row 1, column 1']), (0, 0, 3, ['column 2']), (0, 0, 3, ['column 3']), (0, 0, 3, ['column 4'])], [(0, 0, 5, ['body row 2']), (0, 2, 5, ['Cells may span columns.']), None, None], [(0, 0, 7, ['body row 3']), (1, 0, 7, ['Cells may', 'span rows.', '']), (1, 1, 7, ['- Table cells', '- contain', '- body elements.']), None], [(0, 0, 9, ['body row 4']), None, None, None]])
The first item is a list containing column widths (colspecs). The second item is a list of head rows, and the third is a list of body rows. Each row contains a list of cells. Each cell is either None (for a cell unused because of another cell's span), or a tuple. A cell tuple contains four items: the number of extra rows used by the cell in a vertical span (morerows); the number of extra columns used by the cell in a horizontal span (morecols); the line offset of the first line of the cell contents; and the cell contents, a list of lines of text.
There are no implemented interfaces.
double_width_pad_char
(type:
str
)
'\x00'
head_body_separator_pat
(type: SRE_Pattern
)
<_sre.SRE_Pattern object at 0x41fa9020>
check_parse_complete()
Each text column should have been completely seen.
find_head_body_sep()
Look for a head/body row separator line; store the line index.
mark_done(top, left, bottom, right)
For keeping track of how much of each text column has been seen.
parse(block)
Analyze the text block and return a table data structure.
Given a plaintext-graphic table in block (list of lines of text; no whitespace padding), parse the table, construct and return the data necessary to construct a CALS table or equivalent.
Raise TableMarkupError if there is any problem with the markup.
parse_table()
Start with a queue of upper-left corners, containing the upper-left corner of the table itself. Trace out one rectangular cell, remember it, and add its upper-right and lower-left corners to the queue of potential upper-left corners of further cells. Process the queue in top-to-bottom order, keeping track of how much of each text column has been seen.
We'll end up knowing all the row and column boundaries, cell positions and their dimensions.
scan_cell(top, left)
Starting at the top-left corner, start tracing out a cell.
scan_down(top, left, right)
Look for the bottom-right corner of the cell, making note of all row boundaries.
scan_left(top, left, bottom, right)
Noting column boundaries, look for the bottom-left corner of the cell. It must line up with the starting point.
scan_right(top, left)
Look for the top-right corner of the cell, and make note of all column boundaries ('+').
scan_up(top, left, bottom, right)
Noting row boundaries, see if we can return to the starting point.
setup(block)
structure_from_cells()
From the data collected by scan_cell(), convert to the final data structure.
There are no known subclasses.