From 5cea66f5e271dc93285be2edd4e9d205ebcaf9b5 Mon Sep 17 00:00:00 2001
From: John MacFarlane - bar #5 bolt #foobar #hashtag #→foo Bar foofoo
.
-More than six `#` characters is not a header:
+More than six `#` characters is not a heading:
.
####### foo
@@ -613,23 +613,31 @@ More than six `#` characters is not a header:
.
At least one space is required between the `#` characters and the
-header's contents, unless the header is empty. Note that many
+heading's contents, unless the heading is empty. Note that many
implementations currently do not require the space. However, the
space was required by the
[original ATX implementation](http://www.aaronsw.com/2002/atx/atx.py),
and it helps prevent things like the following from being parsed as
-headers:
+headings:
.
#5 bolt
-#foobar
+#hashtag
.
foo #
.
-ATX headers need not be separated from surrounding content by blank
+ATX headings need not be separated from surrounding content by blank
lines, and they can interrupt paragraphs:
.
@@ -764,7 +772,7 @@ Bar foo
Foo
.
-The header content can be indented up to three spaces, and need
+The heading content can be indented up to three spaces, and need
not line up with the underlining:
.
@@ -866,7 +874,7 @@ Foo
.
-The setext header underline can be indented up to three spaces, and
+The setext heading underline can be indented up to three spaces, and
may have trailing spaces:
.
@@ -886,7 +894,7 @@ Foo
---
of dashes"/>
. -The setext header underline cannot be a [lazy continuation +The setext heading underline cannot be a [lazy continuation line] in a list item or block quote: . @@ -960,7 +968,7 @@ line] in a list item or block quote:Baz
. -Setext headers cannot be empty: +Setext headings cannot be empty: . @@ -1004,9 +1012,9 @@ Setext headers cannot be empty:====
. -Setext header text lines must not be interpretable as block +Setext heading text lines must not be interpretable as block constructs other than paragraphs. So, the line of dashes -in these examples gets interpreted as a horizontal rule: +in these examples gets interpreted as a thematic break: . --- @@ -1045,7 +1053,7 @@ in these examples gets interpreted as a horizontal rule:foo
-foo
[bar]
. -However, it can directly follow other block elements, such as headers -and horizontal rules, and it need not be followed by a blank line. +However, it can directly follow other block elements, such as headings +and thematic breaks, and it need not be followed by a blank line. . # [Foo] @@ -4036,7 +4044,7 @@ A list may be the first block in a list item: . -A list item can contain a header: +A list item can contain a heading: . - # Foo @@ -4854,7 +4862,7 @@ not have their usual Markdown meanings: \`not code` 1\. not a list \* not a list -\# not a header +\# not a heading \[foo]: /url "not a reference" .*not emphasized* @@ -4863,7 +4871,7 @@ not have their usual Markdown meanings: `not code` 1. not a list * not a list -# not a header +# not a heading [foo]: /url "not a reference"
. @@ -4949,21 +4957,21 @@ foo . -## Entities +## Entity and numeric character references -With the goal of making this standard as HTML-agnostic as possible, all -valid HTML entities (except in code blocks and code spans) -are recognized as such and converted into Unicode characters before -they are stored in the AST. This means that renderers to formats other -than HTML need not be HTML-entity aware. HTML renderers may either escape -Unicode characters as entities or leave them as they are. (However, -`"`, `&`, `<`, and `>` must always be rendered as entities.) +All valid HTML entity references and numeric character +references, except those occuring in code blocks, code spans, +and raw HTML, are recognized as such and treated as equivalent to the +corresponding Unicode characters. Conforming CommonMark parsers +need not store information about whether a particular character +was represented in the source using a Unicode character or +an entity reference. -[Named entities](@name-entities) consist of `&` + any of the valid +[Entity references](@entity-references) consist of `&` + any of the valid HTML5 entity names + `;`. The -[following document](https://html.spec.whatwg.org/multipage/entities.json) -is used as an authoritative source of the valid entity names and their -corresponding code points. +document# Ӓ Ϡ � �
. -[Hexadecimal entities](@hexadecimal-entities) consist of `` + either -`X` or `x` + a string of 1-8 hexadecimal digits + `;`. They will also -be parsed and turned into the corresponding Unicode code points in the -AST. +[Hexadecimal numeric character +references](@hexadecimal-numeric-character-references) consist of `` + +either `X` or `x` + a string of 1-8 hexadecimal digits + `;`. +They too are parsed as the corresponding Unicode character (this +time specified with a hexadecimal numeral instead of decimal). . " ആ ಫ @@ -5002,14 +5012,16 @@ AST. Here are some nonentities: . -  &x; &ThisIsWayTooLongToBeAnEntityIsntIt; &hi?; +  &x; +&ThisIsWayTooLongToBeAnEntityIsntIt; &hi?; . -  &x; &#; &#x; &ThisIsWayTooLongToBeAnEntityIsntIt; &hi?;
+  &x; &#; &#x; +&ThisIsWayTooLongToBeAnEntityIsntIt; &hi?;
. -Although HTML5 does accept some entities without a trailing semicolon -(such as `©`), these are not recognized as entities here, because it -makes the grammar too ambiguous: +Although HTML5 does accept some entity references +without a trailing semicolon (such as `©`), these are not +recognized here, because it makes the grammar too ambiguous: . © @@ -5018,7 +5030,7 @@ makes the grammar too ambiguous: . Strings that are not on the list of HTML5 named entities are not -recognized as entities either: +recognized as entity references either: . &MadeUpEntity; @@ -5026,9 +5038,9 @@ recognized as entities either:&MadeUpEntity;
. -Entities are recognized in any context besides code spans or -code blocks, including raw HTML, URLs, [link title]s, and -[fenced code block] [info string]s: +Entity and numeric character references are recognized in any +context besides code spans or code blocks or raw HTML, including +URLs, [link title]s, and [fenced code block][] [info string]s: . @@ -5059,7 +5071,8 @@ foo . -Entities are treated as literal text in code spans and code blocks: +Entity and numeric character references are treated as literal +text in code spans and code blocks, and in raw HTML: . `föö` @@ -5074,6 +5087,12 @@ Entities are treated as literal text in code spans and code blocks: . +. + +. + +. + ## Code spans A [backtick string](@backtick-string) @@ -6597,11 +6616,11 @@ A link can contain fragment identifiers and queries: [link](http://example.com#fragment) -[link](http://example.com?foo=bar&baz#fragment) +[link](http://example.com?foo=3#frag) . - + . Note that a backslash before a non-escapable character is @@ -6614,9 +6633,13 @@ just a backslash: . URL-escaping should be left alone inside the destination, as all -URL-escaped characters are also valid URL characters. HTML entities in -the destination will be parsed into the corresponding Unicode -code points, as usual, and optionally URL-escaped when written as HTML. +URL-escaped characters are also valid URL characters. Entity and +numerical character references in the destination will be parsed +into the corresponding Unicode code points, as usual. These may +be optionally URL-escaped when written as HTML, but this spec +does not enforce any particular policy for rendering URLs in +HTML or other formats. Renderers may make different decisions +about how to escape or normalize URLs in the output. . [link](foo%20bä) @@ -6646,7 +6669,8 @@ Titles may be in single quotes, double quotes, or parentheses: link . -Backslash escapes and entities may be used in titles: +Backslash escapes and entity and numeric character references +may be used in titles: . [link](/url "title \""") @@ -6674,15 +6698,16 @@ But it is easy to work around this by using a different quote type: title, and its test suite included a test demonstrating this. But it is hard to see a good rationale for the extra complexity this brings, since there are already many ways---backslash escaping, -entities, or using a different quote type for the enclosing title---to -write titles containing double quotes. `Markdown.pl`'s handling of -titles has a number of other strange features. For example, it allows -single-quoted titles in inline links, but not reference links. And, in -reference links but not inline links, it allows a title to begin with -`"` and end with `)`. `Markdown.pl` 1.0.1 even allows titles with no closing -quotation mark, though 1.0.2b8 does not. It seems preferable to adopt -a simple, rational rule that works the same way in inline links and -link reference definitions.) +entity and numeric character references, or using a different +quote type for the enclosing title---to write titles containing +double quotes. `Markdown.pl`'s handling of titles has a number +of other strange features. For example, it allows single-quoted +titles in inline links, but not reference links. And, in +reference links but not inline links, it allows a title to begin +with `"` and end with `)`. `Markdown.pl` 1.0.1 even allows +titles with no closing quotation mark, though 1.0.2b8 does not. +It seems preferable to adopt a simple, rational rule that works +the same way in inline links and link reference definitions.) [Whitespace] is allowed around the destination and title: @@ -6813,7 +6838,7 @@ There are three kinds of [reference link](@reference-link)s: and [shortcut](#shortcut-reference-link). A [full reference link](@full-reference-link) -consists of a [link text], optional [whitespace], and a [link label] +consists of a [link text] immediately followed by a [link label] that [matches] a [link reference definition] elsewhere in the document. A [link label](@link-label) begins with a left bracket (`[`) and ends @@ -6983,14 +7008,15 @@ purposes of determining matching: . -There can be [whitespace] between the [link text] and the [link label]: +No [whitespace] is allowed between the [link text] and the +[link label]: . [foo] [bar] [bar]: /url "title" . - +[foo] bar
. . @@ -6999,9 +7025,37 @@ There can be [whitespace] between the [link text] and the [link label]: [bar]: /url "title" . - +[foo] +bar
. +This is a departure from John Gruber's original Markdown syntax +description, which explicitly allows whitespace between the link +text and the link label. It brings reference links in line with +[inline link]s, which (according to both original Markdown and +this spec) cannot have whitespace after the link text. More +importantly, it prevents inadvertent capture of consecutive +[shortcut reference link]s. If whitespace is allowed between the +link text and the link label, then in the following we will have +a single reference link, not two shortcut reference links, as +intended: + +``` markdown +[foo] +[bar] + +[foo]: /url1 +[bar]: /url2 +``` + +(Note that [shortcut reference link]s were introduced by Gruber +himself in a beta version of `Markdown.pl`, but never included +in the official syntax description. Without shortcut reference +links, it is harmless to allow space between the link text and +link label; but once shortcut references are introduced, it is +too dangerous to allow this, as it frequently leads to +unintended results.) + When there are multiple matching [link reference definition]s, the first is used: @@ -7065,6 +7119,16 @@ backslash-escaped: . +Note that in this example `]` is not backslash-escaped: + +. +[bar\\]: /uri + +[bar\\] +. + +. + A [link label] must contain at least one [non-whitespace character]: . @@ -7092,7 +7156,7 @@ A [link label] must contain at least one [non-whitespace character]: A [collapsed reference link](@collapsed-reference-link) consists of a [link label] that [matches] a [link reference definition] elsewhere in the -document, optional [whitespace], and the string `[]`. +document, followed by the string `[]`. The contents of the first link label are parsed as inlines, which are used as the link's text. The link's URI and title are provided by the matching reference link definition. Thus, @@ -7125,8 +7189,8 @@ The link labels are case-insensitive: . -As with full reference links, [whitespace] is allowed -between the two sets of brackets: +As with full reference links, [whitespace] is not +allowed between the two sets of brackets: . [foo] @@ -7134,7 +7198,8 @@ between the two sets of brackets: [foo]: /url "title" . - +foo +[]
. A [shortcut reference link](@shortcut-reference-link) @@ -7355,7 +7420,7 @@ My ![foo bar](/path/to/train.jpg "title" ) Reference-style: . -![foo] [bar] +![foo][bar] [bar]: /url . @@ -7363,7 +7428,7 @@ Reference-style: . . -![foo] [bar] +![foo][bar] [BAR]: /url . @@ -7398,7 +7463,7 @@ The labels are case-insensitive: . -As with full reference links, [whitespace] is allowed +As with reference links, [whitespace] is not allowed between the two sets of brackets: . @@ -7407,7 +7472,8 @@ between the two sets of brackets: [foo]: /url "title" . - ++[]
. Shortcut: @@ -7749,16 +7815,9 @@ _boolean zoop:33=zoop:33 /> Custom tag names can be used: . -Foo
foo &<]]>
. -Entities are preserved in HTML attributes: +Entity and numeric character references are preserved in HTML +attributes: . - +foo . - + . Backslash escapes do not work in HTML attributes: . - +foo . - + . . @@ -8104,7 +8162,7 @@ list items, and so on---is constructed. Text is assigned to these blocks but not parsed. Link reference definitions are parsed and a map of links is constructed. -2. In the second phase, the raw text contents of paragraphs and headers +2. In the second phase, the raw text contents of paragraphs and headings are parsed into sequences of Markdown inline elements (strings, code spans, links, emphasis, and so on), using the map of link references constructed in phase 1. @@ -8167,10 +8225,10 @@ matched block. 3. Finally, we look at the remainder of the line (after block markers like `>`, list markers, and indentation have been consumed). This is text that can be incorporated into the last open -block (a paragraph, code block, header, or raw HTML). +block (a paragraph, code block, heading, or raw HTML). -Setext headers are formed when we detect that the second line of -a paragraph is a setext header line. +Setext headings are formed when we detect that the second line of +a paragraph is a setext heading line. Reference link definitions are detected when a paragraph is closed; the accumulated text lines are parsed to see if they begin with @@ -8279,7 +8337,7 @@ We thus obtain the final tree: Once all of the input has been parsed, all open blocks are closed. We then "walk the tree," visiting every node, and parse raw -string contents of paragraphs and headers as inlines. At this +string contents of paragraphs and headings as inlines. At this point we have seen all the link reference definitions, so we can resolve reference links as we go. -- cgit v1.2.3