Age | Commit message (Collapse) | Author |
|
|
|
A link destination can't start with `<` unless it is
an angle-bracket link that also ends with `>`.
(If your URL really starts with `<`, URL-escape it.)
|
|
both have lengths that are multiples of 3.
See commonmark/commonmark#528.
|
|
|
|
|
|
We can't rely on anything in `subj` since it's been modified while parsing the
subject and could represent line info from a future line. This is simple and
works.
|
|
Closes #263.
|
|
|
|
These affect both parsing and writing commonmark.
|
|
To conform to recent spec change.
|
|
> 32 nested balanced parens in a link is bananas
|
|
|
|
|
|
|
|
|
|
Closes #227.
|
|
|
|
|
|
|
|
|
|
A UBSAN warning can be triggered because the link title is an empty string:
src/inlines.c:113:20: runtime error: null pointer passed as argument 2, which is declared to never be null
which can be triggered by:
```
[f]:_
[f]
```
The length of the memcpy is zero so the NULL pointer is not dereferenced but it
is still undefined behaviour.
|
|
This also brings the code into closer alignment with the wording
of the spec.
See jgm/CommonMark#467.
|
|
Only ascii punctuation characters are escapable,
per the spec.
Closes #192.
|
|
Note, however, that this may not be needed at all:
the old code would have gone into an infinite loop
if the delimiter stack were not already freed.
If we can prove that the delimiter stack is empty
at this point, we could remove this; on the other hand,
it may not hurt to keep it here defensively.
Closes #189.
|
|
Strong now goes inside Emph rather than the reverse,
when both scopes are possible.
The code is much simpler.
This also avoids a spec inconsistency that cmark had previously:
`***hi***` became Strong (Emph "hi")) but
`***hi****` became Emph (Strong "hi")) "*"
|
|
|
|
Noticed the need for this through fuzzing.
|
|
We now use a much smaller array.
|
|
|
|
|
|
This reverts commit 9e643720ec903f3b448bd2589a0c02c2514805ae.
|
|
This reverts commit 4fbe344df43ed7f60a3d3a53981088334cb709fc.
|
|
We need to store the length of the original delimiter run,
instead of using the length of the remaining delimiters
after some have been subtracted.
Test case:
a***b* c*
Thanks to Raph Levin for reporting.
|
|
* Improve strbuf guarantees
Introduce BUFSIZE_MAX macro and make sure that the strbuf implementation
can handle strings up to this size.
* Abort early if document size exceeds internal limit
* Change types for source map offsets
Switch to size_t for the public API, making the public headers
C89-compatible again.
Switch to bufsize_t internally, reducing memory usage and improving
performance on 32-bit platforms.
* Make parser return NULL on internal index overflow
Make S_parser_feed set an error and ignore subsequent chunks if the
total input document size exceeds an internal limit. Make
cmark_parser_finish return NULL if an error was encountered. Add
public API functions to retrieve error code and error message.
strbuf overflow in renderers and OOM in parser or renderers still
cause an abort.
|
|
* open_new_blocks: always create child before advancing offset
* Source map
* Extent's typology
* In-depth python bindings
|
|
|
|
|
|
|
|
- Removed recursion in scan_to_closing_backticks
- Added an array of pointers to potential backtick closers
to subject
- This array is used to avoid traversing the subject again
when we've already seen all the potential backtick closers.
- Added a max bound of 1000 for backtick code span delimiters.
- This helps with pathological cases like:
x
x `
x ``
x ```
x ````
...
Thanks to Martin Mitáš for identifying the problem and for
discussion of solutions.
|
|
|
|
|
|
See jgm/CommonMark#427
|
|
|
|
|
|
This will need corresponding spec changes.
The change is this: when considering matches between an interior
delimiter run (one that can open and can close) and another delimiter
run, we require that the sum of the lengths of the two delimiter
runs mod 3 is not 0.
Thus, for example, in
*a**b*
1 23 4
delimiter 1 cannot match 2, since the sum of the lengths of
the first delimiter run (1) and the second (1,2) == 3.
Thus we get `<em>a**b</em>` instead of `<em>a</em><em>b</em>`.
This gives better behavior on things like
*a**b**c*
which previously got parsed as
<em>a</em><em>b</em><em>c</em>
and now would be parsed as
<em>a<strong>b</strong>c</em>
With this change we get four spec test failures, but in each
case the output seems more "intuitive":
```
Example 386 (lines 6490-6494) Emphasis and strong emphasis
*foo**bar**baz*
--- expected HTML
+++ actual HTML
@@ -1 +1 @@
-<p><em>foo</em><em>bar</em><em>baz</em></p>
+<p><em>foo<strong>bar</strong>baz</em></p>
Example 389 (lines 6518-6522) Emphasis and strong emphasis
*foo**bar***
--- expected HTML
+++ actual HTML
@@ -1 +1 @@
-<p><em>foo</em><em>bar</em>**</p>
+<p><em>foo<strong>bar</strong></em></p>
Example 401 (lines 6620-6624) Emphasis and strong emphasis
**foo*bar*baz**
--- expected HTML
+++ actual HTML
@@ -1 +1 @@
-<p><em><em>foo</em>bar</em>baz**</p>
+<p><strong>foo<em>bar</em>baz</strong></p>
Example 442 (lines 6944-6948) Emphasis and strong emphasis
**foo*bar**
--- expected HTML
+++ actual HTML
@@ -1 +1 @@
-<p><em><em>foo</em>bar</em>*</p>
+<p><strong>foo*bar</strong></p>
```
|
|
It is no longer needed; only the brackets struct needs it.
Thanks to @robinst.
|
|
See https://github.com/jgm/commonmark.js/pull/101
This uses a separate stack for brackets, instead of
putting them on the delimiter stack. This avoids the
need for looking through the delimiter stack for the next
bracket.
It also avoids a shortcut reference lookup when the reference
text contains brackets.
The change dramatically improved performance on the nested links
pathological test for commonmark.js. It has a smaller but measurable
effect here.
|
|
This reverts commit c069cb55bcadfd0f45890d846ff412b3c892eb87.
|
|
We reuse the parser for reference labels, instead
of just assuming that a slice of the link text
will be a valid reference label. (It might contain
interior brackets, for example.)
|
|
|