Age | Commit message (Collapse) | Author |
|
in accordance with spec change.
|
|
|
|
Closes #334.
|
|
Introduce multi-purpose data/len members in struct cmark_node. This
is mainly used to store literal text for inlines, code and HTML blocks.
Move the content strbuf for blocks from cmark_node to cmark_parser.
When finalizing nodes that allow inlines (paragraphs and headings),
detach the strbuf and store the block content in the node's data/len
members. Free the block content after processing inlines.
Reduces size of struct cmark_node by 8 bytes.
|
|
Use zero-terminated C strings and a separate length field instead of
cmark_chunks. Literal inline text will now be copied from the parent
block's content buffer, slowing the benchmark down by 10-15%.
The node struct never references memory of other nodes now, fixing #309.
Node accessors don't have to check for delayed creation of C strings,
so parsing and iterating all literals using the public API should
actually be faster than before.
|
|
Use zero-terminated C strings instead of cmark_chunks without storing
the length. The length of code literals will be readded in a later
commit. strlen overhead for code info should be negligible.
Reduces size of struct cmark_node by 8 bytes.
|
|
A setext header line after a link reference should not
create a header, according to the spec.
See commonmark/commonmark-spec#395.
|
|
Keep track of the last position where a thematic break
failed to match on a line, to avoid rescanning unnecessarily.
See commonmark/cmark#284.
|
|
|
|
As with other static functions.
|
|
Use this to avoid unnecessary recursion in ends_with_blank_line.
Closes #284.
|
|
to avoid unnecessary repetition. Once we settle
whether a list item ends in a blank line, we don't
need to revisit this in considering parent list items.
See commonmark/cmark#284.
|
|
We were needlessly redoing things we'd already done.
Now we skip the work if the first nonspace is greater
than the current offset.
This fixes pathological slowdown with deeply nested
lists (#255). For N = 3000, the time goes from over
17s to about 0.7s.
Thanks to @mity for diagnosing the problem.
|
|
See commonmark/CommonMark#497.
|
|
|
|
|
|
|
|
See https://github.com/jgm/cmark/issues/206.
|
|
as documented!
Closes #202.
|
|
|
|
|
|
The overflow could occur in the following condition:
the buffer ends with `\r` and the next memory address
contains `\n`.
Closes #184.
|
|
This reverts commit 9e643720ec903f3b448bd2589a0c02c2514805ae.
|
|
This reverts commit 4fbe344df43ed7f60a3d3a53981088334cb709fc.
|
|
* Improve strbuf guarantees
Introduce BUFSIZE_MAX macro and make sure that the strbuf implementation
can handle strings up to this size.
* Abort early if document size exceeds internal limit
* Change types for source map offsets
Switch to size_t for the public API, making the public headers
C89-compatible again.
Switch to bufsize_t internally, reducing memory usage and improving
performance on 32-bit platforms.
* Make parser return NULL on internal index overflow
Make S_parser_feed set an error and ignore subsequent chunks if the
total input document size exceeds an internal limit. Make
cmark_parser_finish return NULL if an error was encountered. Add
public API functions to retrieve error code and error message.
strbuf overflow in renderers and OOM in parser or renderers still
cause an abort.
|
|
* open_new_blocks: always create child before advancing offset
* Source map
* Extent's typology
* In-depth python bindings
|
|
The `alloc` member wasn't initialized.
This also allows to add an assertion in `chunk_rtrim` which doesn't
work for alloced chunks.
|
|
|
|
- only check once for "not at end of line"
- check for null before we check for newline characters (the
previous patch would fail for NULL + CR)
See #160.
|
|
|
|
Fix memory leak in list parsing
|
|
If `parse_list_marker` returns 1, but the second part of the `&&` clause
is false, we leak `data` here.
|
|
|
|
|
|
Fixes #142.
|
|
|
|
|
|
Closes #141.
|
|
|
|
...they start with 1.
|
|
|
|
|
|
|
|
|
|
Reduce the storage size for the `cmark_code` struct
|
|
Save node information in flags instead of using one boolean for each
property.
|
|
|
|
|
|
The previous work for unbounded memory usage and overflows on the buffer
API had several shortcomings:
1. The total size of the buffer was limited by arbitrarily small
precision on the storage type for buffer indexes (typedef'd as
`bufsize_t`). This is not a good design pattern in secure applications,
particualarly since it requires the addition of helper functions to cast
to/from the native `size` types and the custom type for the buffer, and
check for overflows.
2. The library was calling `abort` on overflow and memory allocation
failures. This is not a good practice for production libraries, since it
turns a potential RCE into a trivial, guaranteed DoS to the whole
application that is linked against the library. It defeats the whole
point of performing overflow or allocation checks when the checks will
crash the library and the enclosing program anyway.
3. The default size limits for buffers were essentially unbounded
(capped to the precision of the storage type) and could lead to DoS
attacks by simple memory exhaustion (particularly critical in 32-bit
platforms). This is not a good practice for a library that handles
arbitrary user input.
Hence, this patchset provides slight (but in my opinion critical)
improvements on this area, copying some of the patterns we've used in
the past for high throughput, security sensitive Markdown parsers:
1. The storage type for buffer sizes is now platform native (`ssize_t`).
Ideally, this would be a `size_t`, but several parts of the code expect
buffer indexes to be possibly negative. Either way, switching to a
`size` type is an strict improvement, particularly in 64-bit platforms.
All the helpers that assured that values cannot escape the `size` range
have been removed, since they are superfluous.
2. The overflow checks have been removed. Instead, the maximum size for
a buffer has been set to a safe value for production usage (32mb) that
can be proven not to overflow in practice. Users that need to parse
particularly large Markdown documents can increase this value. A static,
compile-time check has been added to ensure that the maximum buffer size
cannot overflow on any growth operations.
3. The library no longer aborts on buffer overflow. The CMark library
now follows the convention of other Markdown implementations (such as
Hoedown and Sundown) and silently handles buffer overflows and
allocation failures by dropping data from the buffer. The result is
that pathological Markdown documents that try to exploit the library
will instead generate truncated (but valid, and safe) outputs.
All tests after these small refactorings have been verified to pass.
---
NOTE: Regarding 32 bit overflows, generating test cases that crash the
library is trivial (any input document larger than 2gb will crash
CMark), but most Python implementations have issues with large strings
to begin with, so a test case cannot be added to the pathological tests
suite, since it's written in Python.
|
|
|