1 files changed, 251 insertions, 59 deletions
diff --git a/spec.txt b/spec.txt
index fce8792..e3cf027 100644
--- a/spec.txt
+++ b/spec.txt
@@ -2,8 +2,8 @@
 title: CommonMark Spec
 author:
 - John MacFarlane
-version: 2
-date: 2014-09-19
+version: 0.3
+date: 2014-10-24
 ...
 
 # Introduction
@@ -192,10 +192,10 @@ In the examples, the `→` character is used to represent tabs.
 # Preprocessing
 
 A [line](#line) <a id="line"></a>
-is a sequence of zero or more characters followed by a line
-ending (CR, LF, or CRLF) or by the end of
-file.
+is a sequence of zero or more [characters](#character) followed by a
+line ending (CR, LF, or CRLF) or by the end of file.
 
+A [character](#character)<a id="character"></a> is a unicode code point.
 This spec does not specify an encoding; it thinks of lines as composed
 of characters rather than bytes.  A conforming parser may be limited
 to a certain encoding.
@@ -662,7 +662,10 @@ ATX headers can be empty:
 A [setext header](#setext-header) <a id="setext-header"></a>
 consists of a line of text, containing at least one nonspace character,
 with no more than 3 spaces indentation, followed by a [setext header
-underline](#setext-header-underline).  A [setext header
+underline](#setext-header-underline).  The line of text must be
+one that, were it not followed by the setext header underline,
+would be interpreted as part of a paragraph:  it cannot be a code
+block, header, blockquote, horizontal rule, or list.  A [setext header
 underline](#setext-header-underline) <a id="setext-header-underline"></a>
 is a sequence of `=` characters or a sequence of `-` characters, with no
 more than 3 spaces indentation and any number of trailing
@@ -863,6 +866,56 @@ Setext headers cannot be empty:
 <p>====</p>
 .
 
+Setext header text lines must not be interpretable as block
+constructs other than paragraphs.  So, the line of dashes
+in these examples gets interpreted as a horizontal rule:
+
+.
+---
+---
+.
+<hr />
+<hr />
+.
+
+.
+- foo
+-----
+.
+<ul>
+<li>foo</li>
+</ul>
+<hr />
+.
+
+.
+    foo
+---
+.
+<pre><code>foo
+</code></pre>
+<hr />
+.
+
+.
+> foo
+-----
+.
+<blockquote>
+<p>foo</p>
+</blockquote>
+<hr />
+.
+
+If you want a header with `> foo` as its literal text, you can
+use backslash escapes:
+
+.
+\> foo
+------
+.
+<h2>&gt; foo</h2>
+.
 
 ## Indented code blocks
 
@@ -1355,8 +1408,8 @@ name is one of the following (case-insensitive):
 `output`, `col`, `p`, `colgroup`, `pre`, `dd`, `progress`, `div`,
 `section`, `dl`, `table`, `td`, `dt`, `tbody`, `embed`, `textarea`,
 `fieldset`, `tfoot`, `figcaption`, `th`, `figure`, `thead`, `footer`,
-`footer`, `tr`, `form`, `ul`, `h1`, `h2`, `h3`, `h4`, `h5`, `h6`,
-`video`, `script`, `style`.
+`tr`, `form`, `ul`, `h1`, `h2`, `h3`, `h4`, `h5`, `h6`, `video`,
+`script`, `style`.
 
 An [HTML block](#html-block) <a id="html-block"></a> begins with an
 [HTML block tag](#html-block-tag), [HTML comment](#html-comment),
@@ -1447,11 +1500,11 @@ A processing instruction:
 
 .
 <?php
-  echo 'foo'
+  echo '>';
 ?>
 .
 <?php
-  echo 'foo'
+  echo '>';
 ?>
 .
 
@@ -2010,7 +2063,7 @@ The following rules define [block quotes](#block-quote):
 <a id="block-quote"></a>
 
 1.  **Basic case.**  If a string of lines *Ls* constitute a sequence
-    of blocks *Bs*, then the result of appending a [block quote
+    of blocks *Bs*, then the result of prepending a [block quote
     marker](#block-quote-marker) to the beginning of each line in *Ls*
     is a [block quote](#block-quote) containing *Bs*.
 
@@ -3005,6 +3058,21 @@ A list item may be empty:
 </ul>
 .
 
+A list item can contain a header:
+
+.
+- # Foo
+- Bar
+  ---
+  baz
+.
+<ul>
+<li><h1>Foo</h1></li>
+<li><h2>Bar</h2>
+<p>baz</p></li>
+</ul>
+.
+
 ### Motivation
 
 John Gruber's Markdown spec says the following about list items:
@@ -3214,7 +3282,7 @@ A list is [loose](#loose) if it any of its constituent list items are
 separated by blank lines, or if any of its constituent list items
 directly contain two block-level elements with a blank line between
 them.  Otherwise a list is [tight](#tight).  (The difference in HTML output
-is that paragraphs in a loose with are wrapped in `<p>` tags, while
+is that paragraphs in a loose list are wrapped in `<p>` tags, while
 paragraphs in a tight list are not.)
 
 Changing the bullet or ordered list delimiter starts a new list:
@@ -3686,9 +3754,9 @@ raw HTML:
 .
 
 .
-<http://google.com?find=\*>
+<http://example.com?find=\*>
 .
-<p><a href="http://google.com?find=%5C*">http://google.com?find=\*</a></p>
+<p><a href="http://example.com?find=%5C*">http://example.com?find=\*</a></p>
 .
 
 .
@@ -3727,21 +3795,25 @@ foo
 
 ## Entities
 
-With the goal of making this standard as HTML-agnostic as possible, all HTML valid HTML Entities in any
-context are recognized as such and converted into their actual values (i.e. the UTF8 characters representing
-the entity itself) before they are stored in the AST.
+With the goal of making this standard as HTML-agnostic as possible, all
+valid HTML entities in any context are recognized as such and
+converted into unicode characters before they are stored in the AST.
 
-This allows implementations that target HTML output to trivially escape the entities when generating HTML,
-and simplifies the job of implementations targetting other languages, as these will only need to handle the
-UTF8 chars and need not be HTML-entity aware.
+This allows implementations that target HTML output to trivially escape
+the entities when generating HTML, and simplifies the job of
+implementations targetting other languages, as these will only need to
+handle the unicode chars and need not be HTML-entity aware.
 
 [Named entities](#name-entities) <a id="named-entities"></a> consist of `&`
-+ any of the valid HTML5 entity names + `;`. The [following document](http://www.whatwg.org/specs/web-apps/current-work/multipage/entities.json)
-is used as an authoritative source of the valid entity names and their corresponding codepoints.
++ any of the valid HTML5 entity names + `;`. The
+[following document](http://www.whatwg.org/specs/web-apps/current-work/multipage/entities.json)
+is used as an authoritative source of the valid entity names and their
+corresponding codepoints.
 
-Conforming implementations that target Markdown don't need to generate entities for all the valid
-named entities that exist, with the exception of `"` (`&quot;`), `&` (`&amp;`), `<` (`&lt;`) and `>` (`&gt;`),
-which always need to be written as entities for security reasons.
+Conforming implementations that target HTML don't need to generate
+entities for all the valid named entities that exist, with the exception
+of `"` (`&quot;`), `&` (`&amp;`), `<` (`&lt;`) and `>` (`&gt;`), which
+always need to be written as entities for security reasons.
 
 .
 &nbsp; &amp; &copy; &AElig; &Dcaron; &frac34; &HilbertSpace; &DifferentialD; &ClockwiseContourIntegral;
@@ -3750,9 +3822,10 @@ which always need to be written as entities for security reasons.
 .
 
 [Decimal entities](#decimal-entities) <a id="decimal-entities"></a>
-consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these entities need to be recognised
-and tranformed into their corresponding UTF8 codepoints. Invalid Unicode codepoints will be written
-as the "unknown codepoint" character (`0xFFFD`)
+consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these
+entities need to be recognised and tranformed into their corresponding
+UTF8 codepoints. Invalid Unicode codepoints will be written as the
+"unknown codepoint" character (`0xFFFD`)
 
 .
 &#35; &#1234; &#992; &#98765432;
@@ -3779,7 +3852,8 @@ Here are some nonentities:
 .
 
 Although HTML5 does accept some entities without a trailing semicolon
-(such as `&copy`), these are not recognized as entities here, because it makes the grammar too ambiguous:
+(such as `&copy`), these are not recognized as entities here, because it
+makes the grammar too ambiguous:
 
 .
 &copy
@@ -3787,7 +3861,8 @@ Although HTML5 does accept some entities without a trailing semicolon
 <p>&amp;copy</p>
 .
 
-Strings that are not on the list of HTML5 named entities are not recognized as entities either:
+Strings that are not on the list of HTML5 named entities are not
+recognized as entities either:
 
 .
 &MadeUpEntity;
@@ -4035,7 +4110,7 @@ for efficient parsing strategies that do not backtrack:
     (a) it is not part of a sequence of four or more unescaped `*`s,
     (b) it is not followed by whitespace, and
     (c) either it is not followed by a `*` character or it is
-        followed immediately by strong emphasis.
+        followed immediately by emphasis or strong emphasis.
 
 2.  A single `_` character [can open emphasis](#can-open-emphasis) iff
 
@@ -4043,7 +4118,7 @@ for efficient parsing strategies that do not backtrack:
     (b) it is not followed by whitespace,
     (c) it is not preceded by an ASCII alphanumeric character, and
     (d) either it is not followed by a `_` character or it is
-        followed immediately by strong emphasis.
+        followed immediately by emphasis or strong emphasis.
 
 3.  A single `*` character [can close emphasis](#can-close-emphasis)
     <a id="can-close-emphasis"></a> iff
@@ -4088,16 +4163,42 @@ for efficient parsing strategies that do not backtrack:
     (c) it is not followed by an ASCII alphanumeric character.
 
 9.  Emphasis begins with a delimiter that [can open
-    emphasis](#can-open-emphasis) and includes inlines parsed
-    sequentially until a delimiter that [can close
+    emphasis](#can-open-emphasis) and ends with a delimiter that [can close
     emphasis](#can-close-emphasis), and that uses the same
-    character (`_` or `*`) as the opening delimiter, is reached.
+    character (`_` or `*`) as the opening delimiter.  The inlines
+    between the open delimiter and the closing delimiter are the
+    contents of the emphasis inline.
 
 10. Strong emphasis begins with a delimiter that [can open strong
-    emphasis](#can-open-strong-emphasis) and includes inlines parsed
-    sequentially until a delimiter that [can close strong
-    emphasis](#can-close-strong-emphasis), and that uses the
-    same character (`_` or `*`) as the opening delimiter, is reached.
+    emphasis](#can-open-strong-emphasis) and ends with a delimiter that
+    [can close strong emphasis](#can-close-strong-emphasis), and that uses the
+    same character (`_` or `*`) as the opening delimiter.  The inlines
+    between the open delimiter and the closing delimiter are the
+    contents of the strong emphasis inline.
+
+Where rules 1--10 above are compatible with multiple parsings,
+the following principles resolve ambiguity:
+
+11. An interpretation `<strong>...</strong>` is always preferred to
+    `<em><em>...</em></em>`.
+
+12. An interpretation `<strong><em>...</em></strong>` is always
+    preferred to `<em><strong>..</strong></em>`.
+
+13. Earlier closings are preferred to later closings.  Thus,
+    when two potential emphasis or strong emphasis spans overlap,
+    the first takes precedence: for example, `*foo _bar* baz_`
+    is parsed as `<em>foo _bar</em> baz_` rather than
+    `*foo <em>bar* baz</em>`.  For the same reason,
+    `**foo*bar**` is parsed as `<em><em>foo</em>bar</em>*`
+    rather than `<strong>foo*bar</strong>`.
+
+14. Inline code spans, links, images, and HTML tags group more tightly
+    than emphasis.  So, when there is a choice between an interpretation
+    that contains one of these elements and one that does not, the
+    former always wins.  Thus, for example, `*[foo*](bar)` is
+    parsed as `*<a href="bar">foo*</a>` rather than as
+    `<em>[foo</em>](bar)`.
 
 These rules can be illustrated through a series of examples.
 
@@ -4345,6 +4446,32 @@ __this is a double underscore (`__`)__
 <p><strong>this is a double underscore (<code>__</code>)</strong></p>
 .
 
+Or use the other emphasis character:
+
+.
+*_*
+.
+<p><em>_</em></p>
+.
+
+.
+_*_
+.
+<p><em>*</em></p>
+.
+
+.
+*__*
+.
+<p><em>__</em></p>
+.
+
+.
+_**_
+.
+<p><em>**</em></p>
+.
+
 `*` delimiters allow intra-word emphasis; `_` delimiters do not:
 
 .
@@ -4520,6 +4647,36 @@ __foo _bar_ baz__
 <p><strong>foo <em>bar</em> baz</strong></p>
 .
 
+.
+**foo, *bar*, baz**
+.
+<p><strong>foo, <em>bar</em>, baz</strong></p>
+.
+
+.
+__foo, _bar_, baz__
+.
+<p><strong>foo, <em>bar</em>, baz</strong></p>
+.
+
+But note:
+
+.
+*foo**bar**baz*
+.
+<p><em>foo</em><em>bar</em><em>baz</em></p>
+.
+
+.
+**foo*bar*baz**
+.
+<p><em><em>foo</em>bar</em>baz**</p>
+.
+
+The difference is that in the two preceding cases,
+the internal delimiters [can close emphasis](#can-close-emphasis),
+while in the cases with spaces, they cannot.
+
 Note that you cannot nest emphasis directly inside emphasis
 using the same delimeter, or strong emphasis directly inside
 strong emphasis:
@@ -4601,7 +4758,7 @@ However, a string of four or more `****` can never close emphasis:
 <p>*foo****</p>
 .
 
-Note that there are some asymmetries here:
+We retain symmetry in these cases:
 
 .
 *foo**
@@ -4609,7 +4766,7 @@ Note that there are some asymmetries here:
 **foo*
 .
 <p><em>foo</em>*</p>
-<p>**foo*</p>
+<p>*<em>foo</em></p>
 .
 
 .
@@ -4618,18 +4775,12 @@ Note that there are some asymmetries here:
 **foo* bar*
 .
 <p><em>foo <em>bar</em></em></p>
-<p>**foo* bar*</p>
+<p><em><em>foo</em> bar</em></p>
 .
 
 More cases with mismatched delimiters:
 
 .
-**foo* bar*
-.
-<p>**foo* bar*</p>
-.
-
-.
 *bar***
 .
 <p><em>bar</em>**</p>
@@ -4638,7 +4789,7 @@ More cases with mismatched delimiters:
 .
 ***foo*
 .
-<p>***foo*</p>
+<p>**<em>foo</em></p>
 .
 
 .
@@ -4650,7 +4801,7 @@ More cases with mismatched delimiters:
 .
 ***foo**
 .
-<p>***foo**</p>
+<p>*<strong>foo</strong></p>
 .
 
 .
@@ -4659,6 +4810,46 @@ More cases with mismatched delimiters:
 <p>***foo <em>bar</em></p>
 .
 
+The following cases illustrate rule 13:
+
+.
+*foo _bar* baz_
+.
+<p><em>foo _bar</em> baz_</p>
+.
+
+.
+**foo bar* baz**
+.
+<p><em><em>foo bar</em> baz</em>*</p>
+.
+
+The following cases illustrate rule 14:
+
+.
+*[foo*](bar)
+.
+<p>*<a href="bar">foo*</a></p>
+.
+
+.
+*![foo*](bar)
+.
+<p>*<img src="bar" alt="foo*" /></p>
+.
+
+.
+*<img src="foo" title="*"/>
+.
+<p>*<img src="foo" title="*"/></p>
+.
+
+.
+*a`a*`
+.
+<p>*a<code>a*</code></p>
+.
+
 ## Links
 
 A link contains a [link label](#link-label) (the visible text),
@@ -4817,9 +5008,10 @@ in Markdown:
 <p><a href="foo):">link</a></p>
 .
 
-URL-escaping and should be left alone inside the destination, as all URL-escaped characters
-are also valid URL characters. HTML entities in the destination will be parsed into their UTF8
-codepoints, as usual, and optionally URL-escaped when written as HTML.
+URL-escaping should be left alone inside the destination, as all
+URL-escaped characters are also valid URL characters. HTML entities in
+the destination will be parsed into their UTF-8 codepoints, as usual, and
+optionally URL-escaped when written as HTML.
 
 .
 [link](foo%20b&auml;)
@@ -5504,9 +5696,9 @@ spec](http://www.whatwg.org/specs/web-apps/current-work/multipage/forms.html#e-m
 Examples of email autolinks:
 
 .
-<foo@bar.baz.com>
+<foo@bar.example.com>
 .
-<p><a href="mailto:foo@bar.baz.com">foo@bar.baz.com</a></p>
+<p><a href="mailto:foo@bar.example.com">foo@bar.example.com</a></p>
 .
 
 .
@@ -5548,15 +5740,15 @@ These are not autolinks:
 .
 
 .
-http://google.com
+http://example.com
 .
-<p>http://google.com</p>
+<p>http://example.com</p>
 .
 
 .
-foo@bar.baz.com
+foo@bar.example.com
 .
-<p>foo@bar.baz.com</p>
+<p>foo@bar.example.com</p>
 .
 
 ## Raw HTML