This is the basic HTML parser class.
It supports all entity names required by the XHTML 1.0 Recommendation. It also defines handlers for all HTML 2.0 and many HTML 3.0 and 3.2 elements.
There are no implemented interfaces.
entitydefs
(type:
dict
)
{'zwnj': '‌', 'aring': '\xe5', 'gt': '>', 'yen': '\xa5', 'ograve': '\xf2', 'Chi': 'Χ', 'bull': '•', 'Egrave': '\xc8', 'trade': '™', 'Ntilde': '\xd1', 'upsih': 'ϒ', 'Yacute': '\xdd', 'asymp': '≈', 'radic': '√', 'otimes': '⊗', 'nabla': '∇', 'aelig': '\xe6', 'oelig': 'œ', 'equiv': '≡', 'Psi': 'Ψ', 'auml': '\xe4', 'circ': 'ˆ', 'Acirc': '\xc2', 'Epsilon': 'Ε', 'Yuml': 'Ÿ', 'Eta': 'Η', 'lt': '<', 'Icirc': '\xce', 'Upsilon': 'Υ', 'ndash': '–', 'there4': '∴', 'Prime': '″', 'prime': '′', 'psi': 'ψ', 'Kappa': 'Κ', 'rsaquo': '›', 'Tau': 'Τ', 'darr': '↓', 'ocirc': '\xf4', 'lrm': '‎', 'zwj': '‍', 'cedil': '\xb8', 'Ecirc': '\xca', 'not': '\xac', 'amp': '&', 'AElig': '\xc6', 'oslash': '\xf8', 'acute': '\xb4', 'lceil': '⌈', 'laquo': '\xab', 'shy': '\xad', 'rdquo': '”', 'ge': '≥', 'Igrave': '\xcc', 'reg': '\xae', 'Ograve': '\xd2', 'euro': '€', 'dArr': '⇓', 'sdot': '⋅', 'nbsp': '\xa0', 'lfloor': '⌊', 'lArr': '⇐', 'Auml': '\xc4', 'larr': '←', 'Atilde': '\xc3', 'Otilde': '\xd5', 'szlig': '\xdf', 'clubs': '♣', 'diams': '♦', 'agrave': '\xe0', 'Ocirc': '\xd4', 'Iota': 'Ι', 'Theta': 'Θ', 'Pi': 'Π', 'OElig': 'Œ', 'Scaron': 'Š', 'frac14': '\xbc', 'egrave': '\xe8', 'sub': '⊂', 'iexcl': '\xa1', 'frac12': '\xbd', 'sbquo': '‚', 'ordf': '\xaa', 'sum': '∑', 'prop': '∝', 'Uuml': '\xdc', 'ntilde': '\xf1', 'sup': '⊃', 'theta': 'θ', 'prod': '∏', 'nsub': '⊄', 'hArr': '⇔', 'rlm': '‏', 'THORN': '\xde', 'infin': '∞', 'yuml': '\xff', 'Mu': 'Μ', 'le': '≤', 'Eacute': '\xc9', 'thinsp': ' ', 'ecirc': '\xea', 'bdquo': '„', 'Sigma': 'Σ', 'fnof': 'ƒ', 'Aring': '\xc5', 'tilde': '˜', 'frac34': '\xbe', 'emsp': ' ', 'mdash': '—', 'uarr': '↑', 'permil': '‰', 'Ugrave': '\xd9', 'rarr': '→', 'Agrave': '\xc0', 'chi': 'χ', 'forall': '∀', 'eth': '\xf0', 'rceil': '⌉', 'iuml': '\xef', 'gamma': 'γ', 'lambda': 'λ', 'harr': '↔', 'rang': '〉', 'xi': 'ξ', 'dagger': '†', 'divide': '\xf7', 'Ouml': '\xd6', 'image': 'ℑ', 'alefsym': 'ℵ', 'igrave': '\xec', 'otilde': '\xf5', 'Oacute': '\xd3', 'sube': '⊆', 'alpha': 'α', 'frasl': '⁄', 'ETH': '\xd0', 'lowast': '∗', 'Nu': 'Ν', 'plusmn': '\xb1', 'Euml': '\xcb', 'real': 'ℜ', 'sup1': '\xb9', 'sup2': '\xb2', 'sup3': '\xb3', 'Oslash': '\xd8', 'Aacute': '\xc1', 'cent': '\xa2', 'oline': '‾', 'Beta': 'Β', 'perp': '⊥', 'Delta': 'Δ', 'loz': '◊', 'pi': 'π', 'iota': 'ι', 'empty': '∅', 'euml': '\xeb', 'brvbar': '\xa6', 'iacute': '\xed', 'para': '\xb6', 'micro': '\xb5', 'cup': '∪', 'weierp': '℘', 'uuml': '\xfc', 'part': '∂', 'icirc': '\xee', 'delta': 'δ', 'omicron': 'ο', 'upsilon': 'υ', 'copy': '\xa9', 'Iuml': '\xcf', 'Lambda': 'Λ', 'Xi': 'Ξ', 'kappa': 'κ', 'ccedil': '\xe7', 'Ucirc': '\xdb', 'cap': '∩', 'mu': 'μ', 'scaron': 'š', 'lsquo': '‘', 'isin': '∈', 'Zeta': 'Ζ', 'supe': '⊇', 'deg': '\xb0', 'and': '∧', 'tau': 'τ', 'pound': '\xa3', 'hellip': '…', 'curren': '\xa4', 'int': '∫', 'ucirc': '\xfb', 'rfloor': '⌋', 'ensp': ' ', 'crarr': '↵', 'ugrave': '\xf9', 'notin': '∉', 'exist': '∃', 'uArr': '⇑', 'cong': '≅', 'Dagger': '‡', 'oplus': '⊕', 'times': '\xd7', 'atilde': '\xe3', 'piv': 'ϖ', 'ni': '∋', 'Phi': 'Φ', 'lsaquo': '‹', 'quot': '"', 'Uacute': '\xda', 'Omicron': 'Ο', 'ang': '∠', 'ne': '≠', 'iquest': '\xbf', 'eta': 'η', 'yacute': '\xfd', 'Rho': 'Ρ', 'uacute': '\xfa', 'Alpha': 'Α', 'zeta': 'ζ', 'Omega': 'Ω', 'nu': 'ν', 'sim': '∼', 'sect': '\xa7', 'phi': 'φ', 'sigmaf': 'ς', 'macr': '\xaf', 'minus': '−', 'Ccedil': '\xc7', 'ordm': '\xba', 'epsilon': 'ε', 'beta': 'β', 'rArr': '⇒', 'rho': 'ρ', 'aacute': '\xe1', 'eacute': '\xe9', 'omega': 'ω', 'middot': '\xb7', 'Gamma': 'Γ', 'Iacute': '\xcd', 'lang': '〈', 'spades': '♠', 'rsquo': '’', 'uml': '\xa8', 'thorn': '\xfe', 'ouml': '\xf6', 'thetasym': 'ϑ', 'or': '∨', 'raquo': '\xbb', 'acirc': '\xe2', 'ldquo': '“', 'hearts': '♥', 'sigma': 'σ', 'oacute': '\xf3'}
anchor_bgn(href, name, type)
This method is called at the start of an anchor region.
The arguments correspond to the attributes of the tag with the same names. The default implementation maintains a list of hyperlinks (defined by the HREF attribute for tags) within the document. The list of hyperlinks is available as the data attribute anchorlist.
anchor_end()
This method is called at the end of an anchor region.
The default implementation adds a textual footnote marker using an index into the list of hyperlinks created by the anchor_bgn()method.
close()
Handle the remaining data.
ddpop(bl=0)
do_base(attrs)
do_br(attrs)
do_dd(attrs)
do_dt(attrs)
do_hr(attrs)
do_img(attrs)
do_isindex(attrs)
do_li(attrs)
do_link(attrs)
do_meta(attrs)
do_nextid(attrs)
do_p(attrs)
do_plaintext(attrs)
end_a()
end_address()
end_b()
end_blockquote()
end_body()
end_cite()
end_code()
end_dir()
end_dl()
end_em()
end_h1()
end_h2()
end_h3()
end_h4()
end_h5()
end_h6()
end_head()
end_html()
end_i()
end_kbd()
end_listing()
end_menu()
end_ol()
end_pre()
end_samp()
end_strong()
end_title()
end_tt()
end_ul()
end_var()
end_xmp()
error(message)
feed(data)
Call this as often as you want, with as little or as much text
as you want (may include
). (This just saves the text,
all the processing is done by goahead().)
finish_endtag(tag)
finish_shorttag(tag, data)
finish_starttag(tag, attrs)
get_starttag_text()
getpos()
Return current line number and offset.
goahead(end)
handle_charref(name)
Handle character reference, no need to override.
handle_comment(data)
handle_data(data)
handle_decl(decl)
handle_endtag(tag, method)
handle_entityref(name)
Handle entity references.
There should be no need to override this method; it can be tailored by setting up the self.entitydefs mapping appropriately.
handle_image(src, alt, *args)
This method is called to handle images.
The default implementation simply passes the alt value to the handle_data() method.
handle_pi(data)
handle_starttag(tag, method, attrs)
parse_comment(i, report=1)
parse_declaration(i)
parse_endtag(i)
parse_marked_section(i, report=1)
parse_pi(i)
parse_starttag(i)
report_unbalanced(tag)
reset()
save_bgn()
Begins saving character data in a buffer instead of sending it to the formatter object.
Retrieve the stored data via the save_end() method. Use of the save_bgn() / save_end() pair may not be nested.
save_end()
Ends buffering character data and returns all data saved since the preceding call to the save_bgn() method.
If the nofill flag is false, whitespace is collapsed to single spaces. A call to this method without a preceding call to the save_bgn() method will raise a TypeError exception.
setliteral(*args)
Enter literal mode (CDATA).
Intended for derived classes only.
setnomoretags()
Enter literal mode (CDATA) till EOF.
Intended for derived classes only.
start_a(attrs)
start_address(attrs)
start_b(attrs)
start_blockquote(attrs)
start_body(attrs)
start_cite(attrs)
start_code(attrs)
start_dir(attrs)
start_dl(attrs)
start_em(attrs)
start_h1(attrs)
start_h2(attrs)
start_h3(attrs)
start_h4(attrs)
start_h5(attrs)
start_h6(attrs)
start_head(attrs)
start_html(attrs)
start_i(attrs)
start_kbd(attrs)
start_listing(attrs)
start_menu(attrs)
start_ol(attrs)
start_pre(attrs)
start_samp(attrs)
start_strong(attrs)
start_title(attrs)
start_tt(attrs)
start_ul(attrs)
start_var(attrs)
start_xmp(attrs)
unknown_charref(ref)
unknown_decl(data)
unknown_endtag(tag)
unknown_entityref(ref)
unknown_starttag(tag, attrs)
updatepos(i, j)
There are no known subclasses.