Definitions and Examples from the XML 1.0 Specification

2.1 Well-Formed XML Documents

Document

[1] document ::= prolog element Misc*

2.2 Characters

Character Range

[2] Char ::= #x9#xA#xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

2.3 Common Syntactic Constructs

White Space

[3] S ::= (#x20#x9#xD#xA)+

Names and Tokens

[4] NameChar ::= LetterDigit | '.' | '-' | '_' | ':' | CombiningCharExtender

[5] Name ::= (Letter | '_' | ':') (NameChar)*

[6] Names ::= Name (S Name)*

[7] Nmtoken ::= (NameChar)+

[8] Nmtokens ::= Nmtoken (S Nmtoken)*

Literals

[9] EntityValue ::= '"' ([^%&"] | PEReferenceReference)* '"' | "'" ([^%&'] | PEReferenceReference)* "'"

[10] AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'"

[11] SystemLiteral ::= ('"' [^"]* '"') | ("'" [^']* "'")

[12] PubidLiteral ::= '"' PubidChar* '"' | "'" (PubidChar - "'")* "'"

[13] PubidChar ::= #x20#xD#xA | [a-zA-Z0-9] | [-'()+,./:=?;!*#@$_%]

2.4 Character Data and Markup

Character Data

[14] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*)

2.5 Comments

Comments

[15] Comment ::= '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'

Example:
<!-- declarations for <head> & <body> -->

2.6 Processing Instructions

Processing Instructions

[16] PI ::= '<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>'

[17] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l'))

2.7 CDATA Sections

CDATA Sections

[18] CDSect ::= CDStart CData CDEnd

[19] CDStart ::= '<![CDATA['

[20] CData ::= (Char* - (Char* ']]>' Char*))

[21] CDEnd ::= ']]>'

Example:
<![CDATA[<greeting>Hello, world!</greeting>]]>

2.8 Prolog and Document Type Declaration

Example:
<?xml version="1.0"?>
<greeting>Hello, world!</greeting>
Example:
<greeting>Hello, world!</greeting>

Prolog

[22] prolog ::= XMLDecl? Misc* (doctypedecl Misc*)?

[23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'

[24] VersionInfo ::= S 'version' Eq (' VersionNum ' | " VersionNum ")

[25] Eq ::= S? '=' S?

[26] VersionNum ::= ([a-zA-Z0-9_.:] | '-')+

[27] Misc ::= CommentPIS

Document Type Definition

[28] doctypedecl ::= '<!DOCTYPE' S Name (S ExternalID)? S? ('[' (markupdeclPEReferenceS)* ']' S?)? '>'

[29] markupdecl ::= elementdeclAttlistDeclEntityDeclNotationDeclPIComment

External Subset

[30] extSubset ::= TextDecl? extSubsetDecl

[31] extSubsetDecl ::= ( markupdeclconditionalSectPEReferenceS )*

Example:
<?xml version="1.0"?>
<!DOCTYPE greeting SYSTEM "hello.dtd">
<greeting>Hello, world!</greeting>
Example:
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE greeting [
  <!ELEMENT greeting (#PCDATA)>
]>
<greeting>Hello, world!</greeting>

2.9 Standalone Document Declaration

Standalone Document Declaration

[32] SDDecl ::= S 'standalone' Eq (("'" ('yes' | 'no') "'") | ('"' ('yes' | 'no') '"'))

Example:
<?xml version="1.0" standalone='yes'?>

2.10 White Space Handling

Example:
    <!ATTLIST poem   xml:space (default|preserve) 'preserve'>

2.11 End-of-Line Handling

2.12 Language Identification

Language Identification

[33] LanguageID ::= Langcode ('-' Subcode)*

[34] Langcode ::= ISO639CodeIanaCodeUserCode

[35] ISO639Code ::= ([a-z] | [A-Z]) ([a-z] | [A-Z])

[36] IanaCode ::= ('i' | 'I') '-' ([a-z] | [A-Z])+

[37] UserCode ::= ('x' | 'X') '-' ([a-z] | [A-Z])+

[38] Subcode ::= ([a-z] | [A-Z])+

Example:
<p xml:lang="en">The quick brown fox jumps over the lazy dog.</p>
<p xml:lang="en-GB">What colour is it?</p>
<p xml:lang="en-US">What color is it?</p>
<sp who="Faust" desc='leise' xml:lang="de">
  <l>Habe nun, ach! Philosophie,</l>
  <l>Juristerei, und Medizin</l>
  <l>und leider auch Theologie</l>
  <l>durchaus studiert mit heißem Bemüh'n.</l>
  </sp>
Example:
xml:lang  NMTOKEN  #IMPLIED
Example:
    <!ATTLIST poem   xml:lang NMTOKEN 'fr'>
    <!ATTLIST gloss  xml:lang NMTOKEN 'en'>
    <!ATTLIST note   xml:lang NMTOKEN 'en'>

3. Logical Structures

Element

[39] element ::= EmptyElemTagSTag content ETag

Start-tag

[40] STag ::= '<' Name (S Attribute)* S? '>'

[41] Attribute ::= Name Eq AttValue

Example:
<termdef id="dt-dog" term="dog">

End-tag

[42] ETag ::= '</' Name S? '>'

Example:
</termdef>

Content of Elements

[43] content ::= (elementCharDataReferenceCDSectPIComment)*

Tags for Empty Elements

[44] EmptyElemTag ::= '<' Name (S Attribute)* S? '/>'

Example:
<IMG align="left"
 src="http://www.w3.org/Icons/WWW/w3c_home" />
<br></br>
<br/>

3.2 Element Type Declarations

Element Type Declaration

[45] elementdecl ::= '<!ELEMENT' S Name S contentspec S? '>'

[46] contentspec ::= 'EMPTY' | 'ANY' | Mixedchildren

Example:
<!ELEMENT br EMPTY>
<!ELEMENT p (#PCDATA|emph)* >
<!ELEMENT %name.para; %content.para; >
<!ELEMENT container ANY>

3.2.1 Element Content

Element-content Models

[47] children ::= (choiceseq) ('?' | '*' | '+')?

[48] cp ::= (Namechoiceseq) ('?' | '*' | '+')?

[49] choice ::= '(' S? cp ( S? '|' S? cp )* S? ')'

[50] seq ::= '(' S? cp ( S? ',' S? cp )* S? ')'

Example:
<!ELEMENT spec (front, body, back?)>
<!ELEMENT div1 (head, (p | list | note)*, div2*)>
<!ELEMENT dictionary-body (%div.mix; | %dict.mix;)*>

3.2.2 Mixed Content

Mixed-content Declaration

[51] Mixed ::= '(' S? '#PCDATA' (S? '|' S? Name)* S? ')*' | '(' S? '#PCDATA' S? ')'

Example:
<!ELEMENT p (#PCDATA|a|ul|b|i|em)*>
<!ELEMENT p (#PCDATA | %font; | %phrase; | %special; | %form;)* >
<!ELEMENT b (#PCDATA)>

3.3 Attribute-List Declarations

Attribute-list Declaration

[52] AttlistDecl ::= '<!ATTLIST' S Name AttDef* S? '>'

[53] AttDef ::= S Name S AttType S DefaultDecl

3.3.1 Attribute Types

Attribute Types

[54] AttType ::= StringTypeTokenizedTypeEnumeratedType

[55] StringType ::= 'CDATA'

[56] TokenizedType ::= 'ID' | 'IDREF' | 'IDREFS' | 'ENTITY' | 'ENTITIES' | 'NMTOKEN' | 'NMTOKENS'

Enumerated Attribute Types

[57] EnumeratedType ::= NotationTypeEnumeration

[58] NotationType ::= 'NOTATION' S '(' S? Name (S? '|' S? Name)* S? ')'

[59] Enumeration ::= '(' S? Nmtoken (S? '|' S? Nmtoken)* S? ')'

3.3.2 Attribute Defaults

Attribute Defaults

[60] DefaultDecl ::= '#REQUIRED' | '#IMPLIED' | (('#FIXED' S)? AttValue)

Example:
<!ATTLIST termdef
          id      ID      #REQUIRED
          name    CDATA   #IMPLIED>
<!ATTLIST list
          type    (bullets|ordered|glossary)  "ordered">
<!ATTLIST form
          method  CDATA   #FIXED "POST">

3.3.3 Attribute-Value Normalization

3.4 Conditional Sections

Conditional Section

[61] conditionalSect ::= includeSectignoreSect

[62] includeSect ::= '<![' S? 'INCLUDE' S? '[' extSubsetDecl ']]>'

[63] ignoreSect ::= '<![' S? 'IGNORE' S? '[' ignoreSectContents* ']]>'

[64] ignoreSectContents ::= Ignore ('<![' ignoreSectContents ']]>' Ignore)*

[65] Ignore ::= Char* - (Char* ('<![' | ']]>') Char*)

Example:
<!ENTITY % draft 'INCLUDE' >
<!ENTITY % final 'IGNORE' >
 
<![%draft;[
<!ELEMENT book (comments*, title, body, supplements?)>
]]>
<![%final;[
<!ELEMENT book (title, body, supplements?)>
]]>

4. Physical Structures

4.1 Character and Entity References

Character Reference

[66] CharRef ::= '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';'

If the character reference begins with "&#x", the digits and letters up to the terminating ; provide a hexadecimal representation of the character's code point in ISO/IEC 10646. If it begins just with "&#", the digits up to the terminating ; provide a decimal representation of the character's code point.

Entity Reference

[67] Reference ::= EntityRefCharRef

[68] EntityRef ::= '&' Name ';'

[69] PEReference ::= '%' Name ';'

Example:
Type <key>less-than</key> (&#x3C;) to save options.
This document was prepared on &docdate; and
is classified &security-level;.
Example:
<!-- declare the parameter entity "ISOLat2"... -->
<!ENTITY % ISOLat2
         SYSTEM "http://www.xml.com/iso/isolat2-xml.entities" >
<!-- ... now reference it. -->
%ISOLat2;

4.2 Entity Declarations

Entity Declaration

[70] EntityDecl ::= GEDeclPEDecl

[71] GEDecl ::= '<!ENTITY' S Name S EntityDef S? '>'

[72] PEDecl ::= '<!ENTITY' S '%' S Name S PEDef S? '>'

[73] EntityDef ::= EntityValue | (ExternalID NDataDecl?)

[74] PEDef ::= EntityValueExternalID

4.2.1 Internal Entities

Example:
<!ENTITY Pub-Status "This is a pre-release of the
 specification.">

4.2.2 External Entities

External Entity Declaration

[75] ExternalID ::= 'SYSTEM' S SystemLiteral | 'PUBLIC' S PubidLiteral S SystemLiteral

[76] NDataDecl ::= S 'NDATA' S Name

Example:
<!ENTITY open-hatch
         SYSTEM "http://www.textuality.com/boilerplate/OpenHatch.xml">
<!ENTITY open-hatch
         PUBLIC "-//Textuality//TEXT Standard open-hatch boilerplate//EN"
         "http://www.textuality.com/boilerplate/OpenHatch.xml">
<!ENTITY hatch-pic
         SYSTEM "../grafix/OpenHatch.gif"
         NDATA gif >

4.3 Parsed Entities

4.3.1 The Text Declaration

Text Declaration

[77] TextDecl ::= '<?xml' VersionInfo? EncodingDecl S? '?>'

4.3.2 Well-Formed Parsed Entities

Well-Formed External Parsed Entity

[78] extParsedEnt ::= TextDecl? content

[79] extPE ::= TextDecl? extSubsetDecl

4.3.3 Character Encoding in Entities

Encoding Declaration

[80] EncodingDecl ::= S 'encoding' Eq ('"' EncName '"' |  "'" EncName "'" )

[81] EncName ::= [A-Za-z] ([A-Za-z0-9._] | '-')*

Example:
<?xml encoding='UTF-8'?>
<?xml encoding='EUC-JP'?>

4.4 XML Processor Treatment of Entities and References

4.4.2 Included

4.4.3 Included If Validating

4.4.4 Forbidden

Example:
<!ENTITY % YN '"Yes"' >
<!ENTITY WhatHeSaid "He said &YN;" >
Example:
<!ENTITY EndAttr "27'" >
<element attribute='a-&EndAttr;>

4.4.6 Notify

4.4.7 Bypassed

4.4.8 Included as PE

4.5 Construction of Internal Entity Replacement Text

Example:
<!ENTITY % pub    "&#xc9;ditions Gallimard" >
<!ENTITY   rights "All rights reserved" >
<!ENTITY   book   "La Peste: Albert Camus, 
&#xA9; 1947 %pub;. &rights;" >
Example:
La Peste: Albert Camus, 
© 1947 Éditions Gallimard. &rights;

4.6 Predefined Entities

Example:
<!ENTITY lt     "&#38;#60;"> 
<!ENTITY gt     "&#62;"> 
<!ENTITY amp    "&#38;#38;"> 
<!ENTITY apos   "&#39;"> 
<!ENTITY quot   "&#34;"> 

4.7 Notation Declarations

Notation Declarations

[82] NotationDecl ::= '<!NOTATION' S Name S (ExternalIDPublicID) S? '>'

[83] PublicID ::= 'PUBLIC' S PubidLiteral

4.8 Document Entity

5. Conformance

5.1 Validating and Non-Validating Processors

5.2 Using XML Processors

6. Notation

Example:
symbol ::= expression

Characters

[84] Letter ::= BaseCharIdeographic

[85] BaseChar ::= [A-Z] | [a-z] | [À-Ö] | [Ø-ö] | [ø-ÿ] | [Ā-ı] | [Ĵ-ľ] | [Ł-ň] | [Ŋ-ž] | [ƀ-ǃ] | [Ǎ-ǰ] | [Ǵ-ǵ] | [Ǻ-ȗ] | [ɐ-ʨ] | [ʻ-ˁ] | Ά | [Έ-Ί] | Ό | [Ύ-Ρ] | [Σ-ώ] | [ϐ-ϖ] | ϚϜϞϠ | [Ϣ-ϳ] | [Ё-Ќ] | [Ў-я] | [ё-ќ] | [ў-ҁ] | [Ґ-ӄ] | [Ӈ-ӈ] | [Ӌ-ӌ] | [Ӑ-ӫ] | [Ӯ-ӵ] | [Ӹ-ӹ] | [Ա-Ֆ] | ՙ | [ա-ֆ] | [א-ת] | [װ-ײ] | [ء-غ] | [ف-ي] | [ٱ-ڷ] | [ں-ھ] | [ۀ-ێ] | [ې-ۓ] | ە | [ۥ-ۦ] | [-] |  | [-] | [-] | [-] | [-] | [-] |  | [-] | [-] | [-] | [-] | [-] | [-] | [-] | [-] | [-] | [-] | [-] | [-] |  | [-] | [-] |  | [-] | [-] | [-] | [-] | [-] |  | [-] | [-] | [-] | [-] | [-] | [-] |  | [-] | [-] | [-] | [-] | [-] | [-] |  | [-] | [-] | [-] | [-] | [-] | [-] | [-] | [-] | [-] | [-] | [-] | [-] | [-] | [-] | [-] | [-] |  | [-] | [-] | [-] | [-] | [-] | [-] | [-] |  | [-] | [-] | [-] |  | [-] |  | [-] | [-] | [-] |  | [-] | [-] |  | [-] |  | [-] | [-] | [-] | [-] | [-] |  | [-] | [-] |  | [-] | [-] |  | [-] |  | [-] |  | [-] | [-] |  | [-] | [-] |  | [-] |  | [-] | [-] | [-] | [-] | [-] | [-] | [-] |  | [-] | [-] | [-] |  | [-] | [-] | [-] | [-] | [-] | [-] | [-] |  | [-] |  | [-] | [-] | [-] | [-] | [-]

[86] Ideographic ::= [-] |  | [-]

[87] CombiningChar ::= [̀-ͅ] | [͠-͡] | [҃-҆] | [֑-֡] | [֣-ֹ] | [ֻ-ֽ] | ֿ | [ׁ-ׂ] | ׄ | [ً-ْ] | ٰ | [ۖ-ۜ] | [۝-۟] | [۠-ۤ] | [ۧ-ۨ] | [۪-ۭ] | [-] |  | [-] |  | [-] | [-] | [-] | ি | [-] | [-] | [-] |  | [-] | ਿ | [-] | [-] | [-] | [-] | [-] |  | [-] | [-] | [-] | [-] |  | [-] | [-] | [-] | [-] | [-] | [-] | [-] | [-] |  | [-] | [-] | [-] | [-] | [-] | [-] | [-] | [-] | [-] | [-] | [-] | [-] | [-] | [-] |  | [-] | [-] |  | [-] | [-] | [-] | [-] | ༿ | [-] | [-] | [-] |  | [-] | [-] |  | [-] |  | [-] | 

[88] Digit ::= [0-9] | [٠-٩] | [۰-۹] | [-] | [-] | [-] | [-] | [-] | [-] | [-] | [-] | [-] | [-] | [-] | [-]

[89] Extender ::= ·ːˑ·ـ | [-] | [-] | [-]

D. Expansion of Entity and Character References (Non-Normative)

Example:
<!ENTITY example "<p>An ampersand (&#38;#38;) may be escaped
numerically (&#38;#38;#38;) or with a general entity
(&amp;amp;).</p>" >
Example:
<p>An ampersand (&#38;) may be escaped
numerically (&#38;#38;) or with a general entity
(&amp;amp;).</p>
Example:
An ampersand (&) may be escaped
numerically (&#38;) or with a general entity
(&amp;).
Example:
1 <?xml version='1.0'?>
2 <!DOCTYPE test [
3 <!ELEMENT test (#PCDATA) >
4 <!ENTITY % xx '&#37;zz;'>
5 <!ENTITY % zz '&#60;!ENTITY tricky "error-prone" >' >
6 %xx;
7 ]>
8 <test>This sample shows a &tricky; method.</test>

[End]

Reformatted by Sean M. Burke