diff options
Diffstat (limited to 'spec.txt')
-rw-r--r-- | spec.txt | 84 |
1 files changed, 54 insertions, 30 deletions
@@ -3727,21 +3727,25 @@ foo ## Entities -With the goal of making this standard as HTML-agnostic as possible, all HTML valid HTML Entities in any -context are recognized as such and converted into their actual values (i.e. the UTF8 characters representing -the entity itself) before they are stored in the AST. +With the goal of making this standard as HTML-agnostic as possible, all +valid HTML entities in any context are recognized as such and +converted into unicode characters before they are stored in the AST. -This allows implementations that target HTML output to trivially escape the entities when generating HTML, -and simplifies the job of implementations targetting other languages, as these will only need to handle the -UTF8 chars and need not be HTML-entity aware. +This allows implementations that target HTML output to trivially escape +the entities when generating HTML, and simplifies the job of +implementations targetting other languages, as these will only need to +handle the unicode chars and need not be HTML-entity aware. [Named entities](#name-entities) <a id="named-entities"></a> consist of `&` -+ any of the valid HTML5 entity names + `;`. The [following document](http://www.whatwg.org/specs/web-apps/current-work/multipage/entities.json) -is used as an authoritative source of the valid entity names and their corresponding codepoints. ++ any of the valid HTML5 entity names + `;`. The +[following document](http://www.whatwg.org/specs/web-apps/current-work/multipage/entities.json) +is used as an authoritative source of the valid entity names and their +corresponding codepoints. -Conforming implementations that target Markdown don't need to generate entities for all the valid -named entities that exist, with the exception of `"` (`"`), `&` (`&`), `<` (`<`) and `>` (`>`), -which always need to be written as entities for security reasons. +Conforming implementations that target HTML don't need to generate +entities for all the valid named entities that exist, with the exception +of `"` (`"`), `&` (`&`), `<` (`<`) and `>` (`>`), which +always need to be written as entities for security reasons. . & © Æ Ď ¾ ℋ ⅆ ∲ @@ -3750,9 +3754,10 @@ which always need to be written as entities for security reasons. . [Decimal entities](#decimal-entities) <a id="decimal-entities"></a> -consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these entities need to be recognised -and tranformed into their corresponding UTF8 codepoints. Invalid Unicode codepoints will be written -as the "unknown codepoint" character (`0xFFFD`) +consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these +entities need to be recognised and tranformed into their corresponding +UTF8 codepoints. Invalid Unicode codepoints will be written as the +"unknown codepoint" character (`0xFFFD`) . # Ӓ Ϡ � @@ -3779,7 +3784,8 @@ Here are some nonentities: . Although HTML5 does accept some entities without a trailing semicolon -(such as `©`), these are not recognized as entities here, because it makes the grammar too ambiguous: +(such as `©`), these are not recognized as entities here, because it +makes the grammar too ambiguous: . © @@ -3787,7 +3793,8 @@ Although HTML5 does accept some entities without a trailing semicolon <p>&copy</p> . -Strings that are not on the list of HTML5 named entities are not recognized as entities either: +Strings that are not on the list of HTML5 named entities are not +recognized as entities either: . &MadeUpEntity; @@ -4035,7 +4042,7 @@ for efficient parsing strategies that do not backtrack: (a) it is not part of a sequence of four or more unescaped `*`s, (b) it is not followed by whitespace, and (c) either it is not followed by a `*` character or it is - followed immediately by strong emphasis. + followed immediately by emphasis or strong emphasis. 2. A single `_` character [can open emphasis](#can-open-emphasis) iff @@ -4043,7 +4050,7 @@ for efficient parsing strategies that do not backtrack: (b) it is not followed by whitespace, (c) it is not preceded by an ASCII alphanumeric character, and (d) either it is not followed by a `_` character or it is - followed immediately by strong emphasis. + followed immediately by emphasis or strong emphasis. 3. A single `*` character [can close emphasis](#can-close-emphasis) <a id="can-close-emphasis"></a> iff @@ -4099,6 +4106,11 @@ for efficient parsing strategies that do not backtrack: emphasis](#can-close-strong-emphasis), and that uses the same character (`_` or `*`) as the opening delimiter, is reached. +11. In case of ambiguity, strong emphasis takes precedence. Thus, + `**foo**` is `<strong>foo</strong>`, not `<em><em>foo</em></em>`, + and `***foo***` is `<strong><em>foo</em></strong>`, not + `<em><strong>foo</strong></em>` or `<em><em><em>foo</em></em></em>`. + These rules can be illustrated through a series of examples. Simple emphasis: @@ -4520,6 +4532,24 @@ __foo _bar_ baz__ <p><strong>foo <em>bar</em> baz</strong></p> . +But note: + +. +*foo**bar**baz* +. +<p><em>foo</em><em>bar</em><em>baz</em></p> +. + +. +**foo*bar*baz** +. +<p><em><em>foo</em>bar</em>baz**</p> +. + +The difference is that in the two preceding cases, +the internal delimiters [can close emphasis](#can-close-emphasis), +while in the cases with spaces, they cannot. + Note that you cannot nest emphasis directly inside emphasis using the same delimeter, or strong emphasis directly inside strong emphasis: @@ -4601,7 +4631,7 @@ However, a string of four or more `****` can never close emphasis: <p>*foo****</p> . -Note that there are some asymmetries here: +We retain symmetry in these cases: . *foo** @@ -4609,7 +4639,7 @@ Note that there are some asymmetries here: **foo* . <p><em>foo</em>*</p> -<p>**foo*</p> +<p>*<em>foo</em></p> . . @@ -4618,18 +4648,12 @@ Note that there are some asymmetries here: **foo* bar* . <p><em>foo <em>bar</em></em></p> -<p>**foo* bar*</p> +<p><em><em>foo</em> bar</em></p> . More cases with mismatched delimiters: . -**foo* bar* -. -<p>**foo* bar*</p> -. - -. *bar*** . <p><em>bar</em>**</p> @@ -4638,7 +4662,7 @@ More cases with mismatched delimiters: . ***foo* . -<p>***foo*</p> +<p>**<em>foo</em></p> . . @@ -4650,7 +4674,7 @@ More cases with mismatched delimiters: . ***foo** . -<p>***foo**</p> +<p>*<strong>foo</strong></p> . . @@ -4819,7 +4843,7 @@ in Markdown: URL-escaping should be left alone inside the destination, as all URL-escaped characters are also valid URL characters. HTML entities in -the destination will be parsed into their UTF8 codepoints, as usual, and +the destination will be parsed into their UTF-8 codepoints, as usual, and optionally URL-escaped when written as HTML. . |