Browsable HTML Grammar

Grammar extracted by Vadim Zaytsev, see the Grammar Zoo entry for details: html/cordy/extracted
Source used for this grammar: James R. Cordy, SomeDeveloper, TXL 10.5 HTML Grammar, September 2007

Summary

Total 18 production rules with 41 top alternatives and 153 symbols.
Vocabulary: 48 = 32 nonterminals + 16 terminals + 0 labels + 0 markers.
Total 32 nonterminal symbols: 18 defined (program, element, tag, tag_elements, singleton_tag, singleton_tag_end, singleton_id, comment_tag, comment_text, tag_beg, tag_end, attributes, attribute, attribute_id, equals_attribute_value, attribute_value, text, text_unit), 1 root (program), 0 top (—), 14 bottom (NL⁸, SPON, punctuation², fileref, url, stringlit, x_id², SP⁵, id⁷, token², EX, SPOFF, number², IN).
Total 16 terminal symbols: 7 keywords ("br", "hr", "img", "meta", "base", "basefont", "dt"), 0 letters (—), 0 numerics (—), 9 signs ("<"⁵, ">"⁷, "</"³, "/>", "|", "<!", "=", ")", "(").

Syntax

program ::=
	element*

element ::=
	singleton_tag
	tag
	text
	comment_tag
	tag_beg
	tag_end

tag ::=
	"<" id attributes ">" (NL IN) tag_elements EX "</" id ">" NL

tag_elements ::=
	element*

singleton_tag ::=
	"<" singleton_id attributes ">" singleton_tag_end? NL
	"<" id attributes "/>" NL

singleton_tag_end ::=
	"</" singleton_id ">"

singleton_id ::=
	"br"
	"hr"
	"|" "img"
	"meta"
	"base"
	"basefont"
	x_id
	"dt"

comment_tag ::=
	"<!" comment_text* ">" NL

comment_text ::=
	punctuation SP
	token

tag_beg ::=
	"<" id attributes ">" NL

tag_end ::=
	"</" id ">" NL

attributes ::=
	SPOFF attribute* SPON

attribute ::=
	SP attribute_id equals_attribute_value?

attribute_id ::=
	id
	x_id

equals_attribute_value ::=
	"=" attribute_value

attribute_value ::=
	stringlit
	number
	id
	url
	fileref

text ::=
	text_unit+ NL

text_unit ::=
	punctuation SP
	")" SP
	SP "("
	token
	"<" number

Maintained by Dr. Vadim Zaytsev a.k.a. @grammarware. Last updated in September 2015. [↑]