Age | Commit message (Collapse) | Author |
|
|
|
Closes #188.
@nwellnhof - could you have a look and let me know if you
think this is a bad idea or could be improved?
|
|
This reverts commit 4fbe344df43ed7f60a3d3a53981088334cb709fc.
|
|
* Improve strbuf guarantees
Introduce BUFSIZE_MAX macro and make sure that the strbuf implementation
can handle strings up to this size.
* Abort early if document size exceeds internal limit
* Change types for source map offsets
Switch to size_t for the public API, making the public headers
C89-compatible again.
Switch to bufsize_t internally, reducing memory usage and improving
performance on 32-bit platforms.
* Make parser return NULL on internal index overflow
Make S_parser_feed set an error and ignore subsequent chunks if the
total input document size exceeds an internal limit. Make
cmark_parser_finish return NULL if an error was encountered. Add
public API functions to retrieve error code and error message.
strbuf overflow in renderers and OOM in parser or renderers still
cause an abort.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The previous work for unbounded memory usage and overflows on the buffer
API had several shortcomings:
1. The total size of the buffer was limited by arbitrarily small
precision on the storage type for buffer indexes (typedef'd as
`bufsize_t`). This is not a good design pattern in secure applications,
particualarly since it requires the addition of helper functions to cast
to/from the native `size` types and the custom type for the buffer, and
check for overflows.
2. The library was calling `abort` on overflow and memory allocation
failures. This is not a good practice for production libraries, since it
turns a potential RCE into a trivial, guaranteed DoS to the whole
application that is linked against the library. It defeats the whole
point of performing overflow or allocation checks when the checks will
crash the library and the enclosing program anyway.
3. The default size limits for buffers were essentially unbounded
(capped to the precision of the storage type) and could lead to DoS
attacks by simple memory exhaustion (particularly critical in 32-bit
platforms). This is not a good practice for a library that handles
arbitrary user input.
Hence, this patchset provides slight (but in my opinion critical)
improvements on this area, copying some of the patterns we've used in
the past for high throughput, security sensitive Markdown parsers:
1. The storage type for buffer sizes is now platform native (`ssize_t`).
Ideally, this would be a `size_t`, but several parts of the code expect
buffer indexes to be possibly negative. Either way, switching to a
`size` type is an strict improvement, particularly in 64-bit platforms.
All the helpers that assured that values cannot escape the `size` range
have been removed, since they are superfluous.
2. The overflow checks have been removed. Instead, the maximum size for
a buffer has been set to a safe value for production usage (32mb) that
can be proven not to overflow in practice. Users that need to parse
particularly large Markdown documents can increase this value. A static,
compile-time check has been added to ensure that the maximum buffer size
cannot overflow on any growth operations.
3. The library no longer aborts on buffer overflow. The CMark library
now follows the convention of other Markdown implementations (such as
Hoedown and Sundown) and silently handles buffer overflows and
allocation failures by dropping data from the buffer. The result is
that pathological Markdown documents that try to exploit the library
will instead generate truncated (but valid, and safe) outputs.
All tests after these small refactorings have been verified to pass.
---
NOTE: Regarding 32 bit overflows, generating test cases that crash the
library is trivial (any input document larger than 2gb will crash
CMark), but most Python implementations have issues with large strings
to begin with, so a test case cannot be added to the pathological tests
suite, since it's written in Python.
|
|
Newer MSVC versions support enough of C99 to be able to compile cmark
in plain C mode. Only the "inline" keyword is still unsupported.
We have to use "__inline" instead.
|
|
This helps compiling on systems like luarocks that don't
have all the cmake configuration goodness.
Thanks to @carlmartus
|
|
|
|
Now all characters that satisfy cmark_isspace are
recognized as whitespace. Previously CR and TAB
(and others) weren't included.
Partially addresses #73.
|
|
* Reformatted all source files.
* Added 'format' target to Makefile.
* Removed 'astyle' target.
* Updated .editorconfig.
|
|
These are no longer needed, and cause complications for MSVC.
Also removed HAVE_VA_COPY and HAVE_C99_SNPRINTF feature tests.
|
|
|
|
|
|
|
|
|
|
This function was missing a couple of range checks that I'm too lazy
to fix.
|
|
Avoid potential overflow and allow for different bufsize types.
|
|
|
|
Replace macro ENSURE_SIZE with inline function S_strbuf_grow_by that
checks for overflow.
|
|
cmark_strbuf_grow will never truncate a buffer.
|
|
This simplifies overflow checks.
|
|
|
|
Always add 50% on top of target size. No need for a loop.
|
|
This makes it easier to change the type later.
No functional change. The rest of the code base still has to be
adjusted to use the new type.
Also add some TODO comments in buffer.c.
|
|
Users of the strbuf API are supposed to check for an OOM condition
after appending to strbufs, but:
* This is never done in the whole code base.
* The implementation was flawed because only `ptr` was set to the
OOM value without adjusting `size` and `asize`. After an error,
subsequent calls could very well lead to segfaults, contrary to the
documentation.
Change the code to always abort on errors with a message printed to
stderr. The only alternative is to propagate errors throughout the
whole library which seems infeasible.
|
|
|
|
On Windows, snprintf returns -1 if the output was truncated. Fall back to
Windows-specific _scprintf.
|
|
The old function called 'continue' when seeing a backslash,
but this gave incorrect results on input like:
\\*
since the next backslash would be treated as escaping the `*`
instead of being escaped itself.
|
|
|
|
MSVC doesn't support va_copy.
|
|
|
|
This reverts commit 485ef21b95e257e9d9cbcaa804c3c164f1f49a80.
Apparently the va_copy IS needed, because without this code
we get segfaults in some cases.
Closes #253.
@nwellnhof, can you have a look at this issue and comment?
I understand that this code was removed for portability reasons.
Is there an alternative solution?
|
|
|
|
Otherwise cmark's behavior varies unpredictably with the locale.
`is_punctuation` in utf8.h has also been adjusted so that everything
that counts all ASCII symbol characters count as punctuation, even
though some are not in P* character classes.
|
|
Reverts 225d720.
|
|
This isn't needed any more since we don't expose these in the API.
|
|
Inline functions must be defined in header files in order to be inlined in
other compilation units. This also fixes the MSVC build where out-of-line
versions weren't created and allows to remove the -fgnu89-inline flag.
|
|
va_copy isn't needed here. See
http://stackoverflow.com/questions/26953289
Remove it because it isn't part of C89 and not implemented by MSVC.
|
|
Fixes cross-platform issues.
|
|
Needed for C++ compatibility.
|
|
The new functions cmark_new_doc_parser,
cmark_free_doc_parser, cmark_process_line, and cmark_finish
allow you to feed lines one by one (possibly from several
files) to the parser and call finish when you're done.
This is now used in main for mulitple files.
|
|
|
|
|
|
|