Age | Commit message (Collapse) | Author |
|
|
|
The previous work for unbounded memory usage and overflows on the buffer
API had several shortcomings:
1. The total size of the buffer was limited by arbitrarily small
precision on the storage type for buffer indexes (typedef'd as
`bufsize_t`). This is not a good design pattern in secure applications,
particualarly since it requires the addition of helper functions to cast
to/from the native `size` types and the custom type for the buffer, and
check for overflows.
2. The library was calling `abort` on overflow and memory allocation
failures. This is not a good practice for production libraries, since it
turns a potential RCE into a trivial, guaranteed DoS to the whole
application that is linked against the library. It defeats the whole
point of performing overflow or allocation checks when the checks will
crash the library and the enclosing program anyway.
3. The default size limits for buffers were essentially unbounded
(capped to the precision of the storage type) and could lead to DoS
attacks by simple memory exhaustion (particularly critical in 32-bit
platforms). This is not a good practice for a library that handles
arbitrary user input.
Hence, this patchset provides slight (but in my opinion critical)
improvements on this area, copying some of the patterns we've used in
the past for high throughput, security sensitive Markdown parsers:
1. The storage type for buffer sizes is now platform native (`ssize_t`).
Ideally, this would be a `size_t`, but several parts of the code expect
buffer indexes to be possibly negative. Either way, switching to a
`size` type is an strict improvement, particularly in 64-bit platforms.
All the helpers that assured that values cannot escape the `size` range
have been removed, since they are superfluous.
2. The overflow checks have been removed. Instead, the maximum size for
a buffer has been set to a safe value for production usage (32mb) that
can be proven not to overflow in practice. Users that need to parse
particularly large Markdown documents can increase this value. A static,
compile-time check has been added to ensure that the maximum buffer size
cannot overflow on any growth operations.
3. The library no longer aborts on buffer overflow. The CMark library
now follows the convention of other Markdown implementations (such as
Hoedown and Sundown) and silently handles buffer overflows and
allocation failures by dropping data from the buffer. The result is
that pathological Markdown documents that try to exploit the library
will instead generate truncated (but valid, and safe) outputs.
All tests after these small refactorings have been verified to pass.
---
NOTE: Regarding 32 bit overflows, generating test cases that crash the
library is trivial (any input document larger than 2gb will crash
CMark), but most Python implementations have issues with large strings
to begin with, so a test case cannot be added to the pathological tests
suite, since it's written in Python.
|
|
|
|
This change allows us to pass the new test introduced in
75f231503d2b5854f1ff517402d2751811295bf7.
Previously when a list marker was followed only by spaces,
cmark expected the following content to be indented by
the same number of spaces. But in this case we should
treat the line just like a blank line and set list padding
accordingly.
|
|
Adds an internal field to the parser struct to keep track
of last_buffer_ended_with_cr.
|
|
Fixes issue #114.
|
|
Newer MSVC versions support enough of C99 to be able to compile cmark
in plain C mode. Only the "inline" keyword is still unsupported.
We have to use "__inline" instead.
|
|
|
|
|
|
This reverts commit 4d2d486333c358eb3adf3d0649163e319a3b8b69.
This commit caused a valgrind invalid read.
==29731== Invalid read of size 4
==29731== at 0x40500E: S_process_line (blocks.c:1050)
==29731== by 0x403CF7: S_parser_feed (blocks.c:526)
==29731== by 0x403BC9: cmark_parser_feed (blocks.c:494)
==29731== by 0x433A95: main (main.c:168)
==29731== Address 0x51d5b60 is 64 bytes inside a block of size 128 free'd
==29731== at 0x4C27D4E: free (vg_replace_malloc.c:427)
==29731== by 0x4015F0: S_free_nodes (node.c:134)
==29731== by 0x401634: cmark_node_free (node.c:142)
==29731== by 0x4033B1: finalize (blocks.c:259)
==29731== by 0x40365E: add_child (blocks.c:337)
==29731== by 0x4046D8: try_new_container_starts (blocks.c:836)
==29731== by 0x404F12: S_process_line (blocks.c:1015)
==29731== by 0x403CF7: S_parser_feed (blocks.c:526)
==29731== by 0x403BC9: cmark_parser_feed (blocks.c:494)
==29731== by 0x433A95: main (main.c:168)
|
|
|
|
|
|
|
|
|
|
|
|
It's a programming error if the type is out of range.
|
|
https://github.com/MathieuDuponchelle/cmark into MathieuDuponchelle-refactor-S_processLine
|
|
|
|
It's the core of the program and I had too much trouble making
sense of it, two loops with many cases and other code
interspersed hurt my head.
All the tests still passed before rebasing, now I've got the
exact same set of issues as master.
|
|
|
|
|
|
|
|
|
|
E.g. in
```
- foo
<TAB><TAB>bar
```
we should consume two spaces from the second tab,
including two spaces in the code block.
|
|
This keeps track of when we have gotten partway
through a tab when consuming initial indentation.
|
|
Closes #101.
This patch fixes `S_advance_offset` so that it doesn't gobble
a tab character when advancing less than the width of a tab.
|
|
|
|
|
|
|
|
API change. Sorry, but this is the time to break things,
before 1.0 is released. This matches the recent changes to
CommonMark.dtd.
|
|
Ultimately I think we can get rid of parser->curline and
avoid an unnecessary allocation per line.
|
|
CMARK_NODE_HRULE -> CMARK_NODE_THEMATIC_BREAK.
However we've defined the former as the latter to keep
backwards compatibility.
See jgm/CommonMark 8fa94cb460f5e516b0e57adca33f50a669d51f6c
|
|
Defined CMARK_NODE_HEADER to CMARK_NODE_HEADING to ease
the transition.
|
|
See jgm/CommonMark commit 0cdbcee4e840abd0ac7db93797b2b75ca4104314
Note that we have defined
cmark_node_get_header_level = cmark_node_get_heading_level
and
cmark_node_set_header_level = camrk_node_set_heading_level
for backwards compatibility in the API.
|
|
|
|
|
|
This previously caused cmark to break out of a list,
thinking it had two consecutive blanks.
|
|
|
|
So `S_process_line` sees only unix style line endings.
Closes #72, avoiding mixed line endings.
Ultimately we probably want a better solution, allowing
the line ending style of the input file to be preserved.
This solution forces output with newlines.
|
|
Closes #71.
Added a test to api_test.
|
|
|
|
See jgm/CommonMark#332
|
|
* Reformatted all source files.
* Added 'format' target to Makefile.
* Removed 'astyle' target.
* Updated .editorconfig.
|
|
|
|
|
|
This should be added to the spec.
|
|
(It uses GNU extensions, and we don't need it anyway.)
|
|
* Rewrote spec for HTML blocks. A few other spec examples
also changed as a result.
* Removed old `html_block_tag` scanner. Added new
`html_block_start` and `html_block_start_7`, as well
as `html_block_end_n` for n = 1-5.
* Rewrote block parser for new HTML block spec.
|
|
|
|
This caused certain NULLs not to be replaced.
Found my 'make fuzztest'.
|