From bd271515770a17f3c320eb394f2012ccd51a417b Mon Sep 17 00:00:00 2001 From: John MacFarlane Date: Tue, 9 Sep 2014 22:30:54 -0700 Subject: spec: change nesting order of strong/emph in ***a***. --- spec.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'spec.txt') diff --git a/spec.txt b/spec.txt index 4a9e9fd..88c8dea 100644 --- a/spec.txt +++ b/spec.txt @@ -4392,13 +4392,13 @@ The rules are sufficient for the following nesting patterns: . ***foo bar*** . -

foo bar

+

foo bar

. . ___foo bar___ . -

foo bar

+

foo bar

. . -- cgit v1.2.3 From 905b5d4d11cf1e56137fea1e68eb503863f1b113 Mon Sep 17 00:00:00 2001 From: John MacFarlane Date: Wed, 10 Sep 2014 08:42:39 -0700 Subject: Revert "spec: change nesting order of strong/emph in ***a***." This reverts commit 49a03b7666e2901d1ab2813fc0bdd23968d22979. --- spec.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'spec.txt') diff --git a/spec.txt b/spec.txt index 88c8dea..4a9e9fd 100644 --- a/spec.txt +++ b/spec.txt @@ -4392,13 +4392,13 @@ The rules are sufficient for the following nesting patterns: . ***foo bar*** . -

foo bar

+

foo bar

. . ___foo bar___ . -

foo bar

+

foo bar

. . -- cgit v1.2.3 From e245f1a2d5ec76807633806a5af1ebe52fe5bd6d Mon Sep 17 00:00:00 2001 From: John MacFarlane Date: Wed, 10 Sep 2014 08:56:20 -0700 Subject: Updated spec (but not yet examples) with new rules. These reflect the current parsing algorithm. We now get a symmetry that we lacked before: **a* b* *a *b** are both emphasis within emphasis. One asymmetry remains: **a* has no emphasis, while *a** has emphasis. Further tweaking of the algorithm could regularize this. --- spec.txt | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) (limited to 'spec.txt') diff --git a/spec.txt b/spec.txt index 4a9e9fd..37f92c5 100644 --- a/spec.txt +++ b/spec.txt @@ -4024,7 +4024,7 @@ for efficient parsing strategies that do not backtrack: (a) it is not part of a sequence of four or more unescaped `*`s, (b) it is not followed by whitespace, and (c) either it is not followed by a `*` character or it is - followed immediately by strong emphasis. + followed immediately by emphasis or strong emphasis. 2. A single `_` character [can open emphasis](#can-open-emphasis) iff @@ -4032,7 +4032,7 @@ for efficient parsing strategies that do not backtrack: (b) it is not followed by whitespace, (c) is is not preceded by an ASCII alphanumeric character, and (d) either it is not followed by a `_` character or it is - followed immediately by strong emphasis. + followed immediately by emphasis or strong emphasis. 3. A single `*` character [can close emphasis](#can-close-emphasis) iff @@ -4088,6 +4088,11 @@ for efficient parsing strategies that do not backtrack: emphasis](#can-close-strong-emphasis), and that uses the same character (`_` or `*`) as the opening delimiter, is reached. +11. In case of ambiguity, strong emphasis takes precedence. Thus, + `**foo**` is `foo`, not `foo`, + and `***foo***` is `foo`, not + `foo` or `foo`. + These rules can be illustrated through a series of examples. Simple emphasis: -- cgit v1.2.3 From 5cd513026fe49e83cfd544a7b375bf4fa1466b21 Mon Sep 17 00:00:00 2001 From: John MacFarlane Date: Wed, 10 Sep 2014 09:00:40 -0700 Subject: Updated test cases in spec to reflect last change. --- spec.txt | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) (limited to 'spec.txt') diff --git a/spec.txt b/spec.txt index 37f92c5..e1aa502 100644 --- a/spec.txt +++ b/spec.txt @@ -4612,17 +4612,11 @@ Note that there are some asymmetries here: **foo* bar* .

foo bar

-

**foo* bar*

+

foo bar

. More cases with mismatched delimiters: -. -**foo* bar* -. -

**foo* bar*

-. - . *bar*** . -- cgit v1.2.3 From 8a2b85da34e1de10abaf55b212b0660a7917b5d8 Mon Sep 17 00:00:00 2001 From: John MacFarlane Date: Tue, 7 Oct 2014 09:05:27 -0700 Subject: Removed spurious 'and', reflowed. --- spec.txt | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'spec.txt') diff --git a/spec.txt b/spec.txt index bc2e381..c520272 100644 --- a/spec.txt +++ b/spec.txt @@ -4817,9 +4817,10 @@ in Markdown:

link

. -URL-escaping and should be left alone inside the destination, as all URL-escaped characters -are also valid URL characters. HTML entities in the destination will be parsed into their UTF8 -codepoints, as usual, and optionally URL-escaped when written as HTML. +URL-escaping should be left alone inside the destination, as all +URL-escaped characters are also valid URL characters. HTML entities in +the destination will be parsed into their UTF8 codepoints, as usual, and +optionally URL-escaped when written as HTML. . [link](foo%20bä) -- cgit v1.2.3 From 4dc7bbb0c3fb1057c921dedc2f83786caaa6f0ad Mon Sep 17 00:00:00 2001 From: John MacFarlane Date: Tue, 7 Oct 2014 09:05:27 -0700 Subject: Removed spurious 'and', reflowed. --- spec.txt | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'spec.txt') diff --git a/spec.txt b/spec.txt index 0a62b80..990ae8c 100644 --- a/spec.txt +++ b/spec.txt @@ -4816,9 +4816,10 @@ in Markdown:

link

. -URL-escaping and should be left alone inside the destination, as all URL-escaped characters -are also valid URL characters. HTML entities in the destination will be parsed into their UTF8 -codepoints, as usual, and optionally URL-escaped when written as HTML. +URL-escaping should be left alone inside the destination, as all +URL-escaped characters are also valid URL characters. HTML entities in +the destination will be parsed into their UTF8 codepoints, as usual, and +optionally URL-escaped when written as HTML. . [link](foo%20bä) -- cgit v1.2.3 From 3d99baba064091f74b9da78eaed38fcf4875af46 Mon Sep 17 00:00:00 2001 From: John MacFarlane Date: Tue, 7 Oct 2014 22:21:03 -0700 Subject: Adjusted tests for new js parser. --- spec.txt | 26 ++++++++++++++++++++++---- 1 file changed, 22 insertions(+), 4 deletions(-) (limited to 'spec.txt') diff --git a/spec.txt b/spec.txt index 990ae8c..db62f53 100644 --- a/spec.txt +++ b/spec.txt @@ -4525,6 +4525,24 @@ __foo _bar_ baz__

foo bar baz

. +But note: + +. +*foo**bar**baz* +. +

foobarbaz

+. + +. +**foo*bar*baz** +. +

foobarbaz**

+. + +The difference is that in the two preceding cases, +the internal delimiters [can close emphasis](#can-close-emphasis), +while in the cases with spaces, they cannot. + Note that you cannot nest emphasis directly inside emphasis using the same delimeter, or strong emphasis directly inside strong emphasis: @@ -4606,7 +4624,7 @@ However, a string of four or more `****` can never close emphasis:

*foo****

. -Note that there are some asymmetries here: +We retain symmetry in these cases: . *foo** @@ -4614,7 +4632,7 @@ Note that there are some asymmetries here: **foo* .

foo*

-

**foo*

+

*foo

. . @@ -4637,7 +4655,7 @@ More cases with mismatched delimiters: . ***foo* . -

***foo*

+

**foo

. . @@ -4649,7 +4667,7 @@ More cases with mismatched delimiters: . ***foo** . -

***foo**

+

*foo

. . -- cgit v1.2.3 From d3c3e749f4f7b95a9604f751cf993fd488a15b19 Mon Sep 17 00:00:00 2001 From: John MacFarlane Date: Tue, 7 Oct 2014 22:24:53 -0700 Subject: Cleaned up entity section of spec. We convert entities to unicode characters, not UTF-8 sequences. (Though they might ultimately be output that way.) --- spec.txt | 41 ++++++++++++++++++++++++----------------- 1 file changed, 24 insertions(+), 17 deletions(-) (limited to 'spec.txt') diff --git a/spec.txt b/spec.txt index db62f53..489b9c0 100644 --- a/spec.txt +++ b/spec.txt @@ -3727,21 +3727,25 @@ foo ## Entities -With the goal of making this standard as HTML-agnostic as possible, all HTML valid HTML Entities in any -context are recognized as such and converted into their actual values (i.e. the UTF8 characters representing -the entity itself) before they are stored in the AST. +With the goal of making this standard as HTML-agnostic as possible, all +valid HTML entities in any context are recognized as such and +converted into unicode characters before they are stored in the AST. -This allows implementations that target HTML output to trivially escape the entities when generating HTML, -and simplifies the job of implementations targetting other languages, as these will only need to handle the -UTF8 chars and need not be HTML-entity aware. +This allows implementations that target HTML output to trivially escape +the entities when generating HTML, and simplifies the job of +implementations targetting other languages, as these will only need to +handle the unicode chars and need not be HTML-entity aware. [Named entities](#name-entities) consist of `&` -+ any of the valid HTML5 entity names + `;`. The [following document](http://www.whatwg.org/specs/web-apps/current-work/multipage/entities.json) -is used as an authoritative source of the valid entity names and their corresponding codepoints. ++ any of the valid HTML5 entity names + `;`. The +[following document](http://www.whatwg.org/specs/web-apps/current-work/multipage/entities.json) +is used as an authoritative source of the valid entity names and their +corresponding codepoints. -Conforming implementations that target Markdown don't need to generate entities for all the valid -named entities that exist, with the exception of `"` (`"`), `&` (`&`), `<` (`<`) and `>` (`>`), -which always need to be written as entities for security reasons. +Conforming implementations that target HTML don't need to generate +entities for all the valid named entities that exist, with the exception +of `"` (`"`), `&` (`&`), `<` (`<`) and `>` (`>`), which +always need to be written as entities for security reasons. .   & © Æ Ď ¾ ℋ ⅆ ∲ @@ -3750,9 +3754,10 @@ which always need to be written as entities for security reasons. . [Decimal entities](#decimal-entities) -consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these entities need to be recognised -and tranformed into their corresponding UTF8 codepoints. Invalid Unicode codepoints will be written -as the "unknown codepoint" character (`0xFFFD`) +consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these +entities need to be recognised and tranformed into their corresponding +UTF8 codepoints. Invalid Unicode codepoints will be written as the +"unknown codepoint" character (`0xFFFD`) . # Ӓ Ϡ � @@ -3779,7 +3784,8 @@ Here are some nonentities: . Although HTML5 does accept some entities without a trailing semicolon -(such as `©`), these are not recognized as entities here, because it makes the grammar too ambiguous: +(such as `©`), these are not recognized as entities here, because it +makes the grammar too ambiguous: . © @@ -3787,7 +3793,8 @@ Although HTML5 does accept some entities without a trailing semicolon

&copy

. -Strings that are not on the list of HTML5 named entities are not recognized as entities either: +Strings that are not on the list of HTML5 named entities are not +recognized as entities either: . &MadeUpEntity; @@ -4836,7 +4843,7 @@ in Markdown: URL-escaping should be left alone inside the destination, as all URL-escaped characters are also valid URL characters. HTML entities in -the destination will be parsed into their UTF8 codepoints, as usual, and +the destination will be parsed into their UTF-8 codepoints, as usual, and optionally URL-escaped when written as HTML. . -- cgit v1.2.3 From 8122177e49f9d28b6606ce8168788113508e3306 Mon Sep 17 00:00:00 2001 From: John MacFarlane Date: Tue, 7 Oct 2014 22:45:19 -0700 Subject: Added test case from issue #147. --- spec.txt | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'spec.txt') diff --git a/spec.txt b/spec.txt index 2a7e3de..fa2a877 100644 --- a/spec.txt +++ b/spec.txt @@ -4532,6 +4532,18 @@ __foo _bar_ baz__

foo bar baz

. +. +**foo, *bar*, baz** +. +

foo, bar, baz

+. + +. +__foo, _bar_, baz__ +. +

foo, bar, baz

+. + But note: . -- cgit v1.2.3 From 735f77b2a6a016abd56dfd1717de5a4b14528c36 Mon Sep 17 00:00:00 2001 From: John MacFarlane Date: Tue, 7 Oct 2014 23:00:56 -0700 Subject: Added cases from #51 to spec. Closes #51. --- spec.txt | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) (limited to 'spec.txt') diff --git a/spec.txt b/spec.txt index fa2a877..7b447f1 100644 --- a/spec.txt +++ b/spec.txt @@ -4357,6 +4357,32 @@ __this is a double underscore (`__`)__

this is a double underscore (__)

. +Or use the other emphasis character: + +. +*_* +. +

_

+. + +. +_*_ +. +

*

+. + +. +*__* +. +

__

+. + +. +_**_ +. +

**

+. + `*` delimiters allow intra-word emphasis; `_` delimiters do not: . -- cgit v1.2.3 From 1806a06c34aeec717e521b86d9e70894ff632e41 Mon Sep 17 00:00:00 2001 From: Will Bond Date: Wed, 8 Oct 2014 11:29:47 -0400 Subject: Remove duplicate `footer` --- spec.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'spec.txt') diff --git a/spec.txt b/spec.txt index 7b447f1..0c09c43 100644 --- a/spec.txt +++ b/spec.txt @@ -1355,8 +1355,8 @@ name is one of the following (case-insensitive): `output`, `col`, `p`, `colgroup`, `pre`, `dd`, `progress`, `div`, `section`, `dl`, `table`, `td`, `dt`, `tbody`, `embed`, `textarea`, `fieldset`, `tfoot`, `figcaption`, `th`, `figure`, `thead`, `footer`, -`footer`, `tr`, `form`, `ul`, `h1`, `h2`, `h3`, `h4`, `h5`, `h6`, -`video`, `script`, `style`. +`tr`, `form`, `ul`, `h1`, `h2`, `h3`, `h4`, `h5`, `h6`, `video`, +`script`, `style`. An [HTML block](#html-block) begins with an [HTML block tag](#html-block-tag), [HTML comment](#html-comment), -- cgit v1.2.3