diff options
Diffstat (limited to 'spec.txt')
| -rw-r--r-- | spec.txt | 35 | 
1 files changed, 23 insertions, 12 deletions
| @@ -3688,7 +3688,7 @@ raw HTML:  .  <http://google.com?find=\*>  . -<p><a href="http://google.com?find=\*">http://google.com?find=\*</a></p> +<p><a href="http://google.com?find=%5C*">http://google.com?find=\*</a></p>  .  . @@ -3727,25 +3727,37 @@ foo  ## Entities -Entities are parsed as entities, not as literal text, in all contexts -except code spans and code blocks. Three kinds of entities are recognized. +With the goal of making this standard as HTML-agnostic as possible, all HTML valid HTML Entities in any +context are recognized as such and converted into their actual values (i.e. the UTF8 characters representing +the entity itself) before they are stored in the AST. + +This allows implementations that target HTML output to trivially escape the entities when generating HTML, +and simplifies the job of implementations targetting other languages, as these will only need to handle the +UTF8 chars and need not be HTML-entity aware.  [Named entities](#name-entities) <a id="named-entities"></a> consist of `&` -+ a string of 2-32 alphanumerics beginning with a letter + `;`. ++ any of the valid HTML5 entity names + `;`. The [following document](http://www.whatwg.org/specs/web-apps/current-work/multipage/entities.json) +is used as an authoritative source of the valid entity names and their corresponding codepoints. + +Conforming implementations that target Markdown don't need to generate entities for all the valid +named entities that exist, with the exception of `"` (`"`), `&` (`&`), `<` (`<`) and `>` (`>`), +which always need to be written as entities for security reasons.  .    & © Æ Ď ¾ ℋ ⅆ ∲  . -<p>  & © Æ Ď ¾ ℋ ⅆ ∲</p> +<p>  & © Æ Ď ¾ ℋ ⅆ ∲</p>  .  [Decimal entities](#decimal-entities) <a id="decimal-entities"></a> -consist of `&#` + a string of 1--8 arabic digits + `;`. +consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these entities need to be recognised +and tranformed into their corresponding UTF8 codepoints. Invalid Unicode codepoints will be written +as the "unknown codepoint" character (`0xFFFD`)  . - # Ӓ Ϡ � +# Ӓ Ϡ �  . -<p> # Ӓ Ϡ �</p> +<p># Ӓ Ϡ �</p>  .  [Hexadecimal entities](#hexadecimal-entities) <a id="hexadecimal-entities"></a> @@ -3767,7 +3779,7 @@ Here are some nonentities:  .  Although HTML5 does accept some entities without a trailing semicolon -(such as `©`), these are not recognized as entities here: +(such as `©`), these are not recognized as entities here, because it makes the grammar too ambiguous:  .  © @@ -3775,13 +3787,12 @@ Although HTML5 does accept some entities without a trailing semicolon  <p>&copy</p>  . -On the other hand, many strings that are not on the list of HTML5 -named entities are recognized as entities here: +Strings that are not on the list of HTML5 named entities are not recognized as entities either:  .  &MadeUpEntity;  . -<p>&MadeUpEntity;</p> +<p>&MadeUpEntity;</p>  .  Entities are recognized in any context besides code spans or | 
