summaryrefslogtreecommitdiff
path: root/test/spec.txt
diff options
context:
space:
mode:
Diffstat (limited to 'test/spec.txt')
-rw-r--r--test/spec.txt84
1 files changed, 56 insertions, 28 deletions
diff --git a/test/spec.txt b/test/spec.txt
index bdb9569..3aa4ee4 100644
--- a/test/spec.txt
+++ b/test/spec.txt
@@ -2,7 +2,7 @@
title: CommonMark Spec
author: John MacFarlane
version: 0.21
-date:
+date: 2015-07-14
license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)'
...
@@ -204,9 +204,13 @@ In the examples, the `→` character is used to represent tabs.
Any sequence of [character]s is a valid CommonMark
document.
-A [character](@character) is a unicode code point.
+A [character](@character) is a Unicode code point. Although some
+code points (for example, combining accents) do not correspond to
+characters in an intuitive sense, all code points count as characters
+for purposes of this spec.
+
This spec does not specify an encoding; it thinks of lines as composed
-of characters rather than bytes. A conforming parser may be limited
+of [character]s rather than bytes. A conforming parser may be limited
to a certain encoding.
A [line](@line) is a sequence of zero or more [character]s
@@ -227,13 +231,13 @@ form feed (`U+000C`), or carriage return (`U+000D`).
[Whitespace](@whitespace) is a sequence of one or more [whitespace
character]s.
-A [unicode whitespace character](@unicode-whitespace-character) is
-any code point in the unicode `Zs` class, or a tab (`U+0009`),
+A [Unicode whitespace character](@unicode-whitespace-character) is
+any code point in the Unicode `Zs` class, or a tab (`U+0009`),
carriage return (`U+000D`), newline (`U+000A`), or form feed
(`U+000C`).
[Unicode whitespace](@unicode-whitespace) is a sequence of one
-or more [unicode whitespace character]s.
+or more [Unicode whitespace character]s.
A [space](@space) is `U+0020`.
@@ -247,7 +251,7 @@ is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`,
A [punctuation character](@punctuation-character) is an [ASCII
punctuation character] or anything in
-the unicode classes `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`.
+the Unicode classes `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`.
## Tabs
@@ -1648,7 +1652,7 @@ followed by one of the strings (case-insensitive) `address`,
`footer`, `form`, `frame`, `frameset`, `h1`, `head`, `header`, `hr`,
`html`, `legend`, `li`, `link`, `main`, `menu`, `menuitem`, `meta`,
`nav`, `noframes`, `ol`, `optgroup`, `option`, `p`, `param`, `pre`,
-`section`, `source`, `title`, `summary`, `table`, `tbody`, `td`,
+`section`, `source`, `summary`, `table`, `tbody`, `td`,
`tfoot`, `th`, `thead`, `title`, `tr`, `track`, `ul`, followed
by [whitespace], the end of the line, the string `>`, or
the string `/>`.\
@@ -2831,8 +2835,8 @@ foo</p>
.
Laziness only applies to lines that would have been continuations of
-paragraphs had they been prepended with `>`. For example, the
-`>` cannot be omitted in the second line of
+paragraphs had they been prepended with [block quote marker]s.
+For example, the `> ` cannot be omitted in the second line of
``` markdown
> foo
@@ -2851,7 +2855,7 @@ without changing the meaning:
<hr />
.
-Similarly, if we omit the `>` in the second line of
+Similarly, if we omit the `> ` in the second line of
``` markdown
> - foo
@@ -2874,7 +2878,7 @@ then the block quote ends after the first line:
</ul>
.
-For the same reason, we can't omit the `>` in front of
+For the same reason, we can't omit the `> ` in front of
subsequent lines of an indented or fenced code block:
.
@@ -2901,6 +2905,30 @@ foo
<pre><code></code></pre>
.
+Note that in the following case, we have a paragraph
+continuation line:
+
+.
+> foo
+ - bar
+.
+<blockquote>
+<p>foo
+- bar</p>
+</blockquote>
+.
+
+To see why, note that in
+
+```markdown
+> foo
+> - bar
+```
+
+the `- bar` is indented too far to start a list, and can't
+be an indented code block because indented code blocks cannot
+interrupt paragraphs, so it is a [paragraph continuation line].
+
A block quote can be empty:
.
@@ -4849,17 +4877,17 @@ foo
With the goal of making this standard as HTML-agnostic as possible, all
valid HTML entities (except in code blocks and code spans)
-are recognized as such and converted into unicode characters before
+are recognized as such and converted into Unicode characters before
they are stored in the AST. This means that renderers to formats other
than HTML need not be HTML-entity aware. HTML renderers may either escape
-unicode characters as entities or leave them as they are. (However,
+Unicode characters as entities or leave them as they are. (However,
`"`, `&`, `<`, and `>` must always be rendered as entities.)
[Named entities](@name-entities) consist of `&`
+ any of the valid HTML5 entity names + `;`. The
[following document](https://html.spec.whatwg.org/multipage/entities.json)
is used as an authoritative source of the valid entity names and their
-corresponding codepoints.
+corresponding code points.
.
&nbsp; &amp; &copy; &AElig; &Dcaron;
@@ -4874,9 +4902,9 @@ corresponding codepoints.
[Decimal entities](@decimal-entities)
consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these
entities need to be recognised and transformed into their corresponding
-unicode codepoints. Invalid unicode codepoints will be replaced by
-the "unknown codepoint" character (`U+FFFD`). For security reasons,
-the codepoint `U+0000` will also be replaced by `U+FFFD`.
+Unicode code points. Invalid Unicode code points will be replaced by
+the "unknown code point" character (`U+FFFD`). For security reasons,
+the code point `U+0000` will also be replaced by `U+FFFD`.
.
&#35; &#1234; &#992; &#98765432; &#0;
@@ -4887,7 +4915,7 @@ the codepoint `U+0000` will also be replaced by `U+FFFD`.
[Hexadecimal entities](@hexadecimal-entities)
consist of `&#` + either `X` or `x` + a string of 1-8 hexadecimal digits
+ `;`. They will also be parsed and turned into the corresponding
-unicode codepoints in the AST.
+Unicode code points in the AST.
.
&#X22; &#XD06; &#xcab;
@@ -5179,18 +5207,18 @@ followed by a `*` character, or a sequence of one or more `_`
characters that is not preceded or followed by a `_` character.
A [left-flanking delimiter run](@left-flanking-delimiter-run) is
-a [delimiter run] that is (a) not followed by [unicode whitespace],
+a [delimiter run] that is (a) not followed by [Unicode whitespace],
and (b) either not followed by a [punctuation character], or
-preceded by [unicode whitespace] or a [punctuation character].
+preceded by [Unicode whitespace] or a [punctuation character].
For purposes of this definition, the beginning and the end of
-the line count as unicode whitespace.
+the line count as Unicode whitespace.
A [right-flanking delimiter run](@right-flanking-delimiter-run) is
-a [delimiter run] that is (a) not preceded by [unicode whitespace],
+a [delimiter run] that is (a) not preceded by [Unicode whitespace],
and (b) either not preceded by a [punctuation character], or
-followed by [unicode whitespace] or a [punctuation character].
+followed by [Unicode whitespace] or a [punctuation character].
For purposes of this definition, the beginning and the end of
-the line count as unicode whitespace.
+the line count as Unicode whitespace.
Here are some examples of delimiter runs.
@@ -6511,8 +6539,8 @@ just a backslash:
URL-escaping should be left alone inside the destination, as all
URL-escaped characters are also valid URL characters. HTML entities in
-the destination will be parsed into the corresponding unicode
-codepoints, as usual, and optionally URL-escaped when written as HTML.
+the destination will be parsed into the corresponding Unicode
+code points, as usual, and optionally URL-escaped when written as HTML.
.
[link](foo%20b&auml;)
@@ -6721,7 +6749,7 @@ characters inside the square brackets.
One label [matches](@matches)
another just in case their normalized forms are equal. To normalize a
-label, perform the *unicode case fold* and collapse consecutive internal
+label, perform the *Unicode case fold* and collapse consecutive internal
[whitespace] to a single space. If there are multiple
matching reference link definitions, the one that comes first in the
document is used. (It is desirable in such cases to emit a warning.)