buffer: proper safety checks for unbounded memory

The previous work for unbounded memory usage and overflows on the buffer API had several shortcomings: 1. The total size of the buffer was limited by arbitrarily small precision on the storage type for buffer indexes (typedef'd as `bufsize_t`). This is not a good design pattern in secure applications, particualarly since it requires the addition of helper functions to cast to/from the native `size` types and the custom type for the buffer, and check for overflows. 2. The library was calling `abort` on overflow and memory allocation failures. This is not a good practice for production libraries, since it turns a potential RCE into a trivial, guaranteed DoS to the whole application that is linked against the library. It defeats the whole point of performing overflow or allocation checks when the checks will crash the library and the enclosing program anyway. 3. The default size limits for buffers were essentially unbounded (capped to the precision of the storage type) and could lead to DoS attacks by simple memory exhaustion (particularly critical in 32-bit platforms). This is not a good practice for a library that handles arbitrary user input. Hence, this patchset provides slight (but in my opinion critical) improvements on this area, copying some of the patterns we've used in the past for high throughput, security sensitive Markdown parsers: 1. The storage type for buffer sizes is now platform native (`ssize_t`). Ideally, this would be a `size_t`, but several parts of the code expect buffer indexes to be possibly negative. Either way, switching to a `size` type is an strict improvement, particularly in 64-bit platforms. All the helpers that assured that values cannot escape the `size` range have been removed, since they are superfluous. 2. The overflow checks have been removed. Instead, the maximum size for a buffer has been set to a safe value for production usage (32mb) that can be proven not to overflow in practice. Users that need to parse particularly large Markdown documents can increase this value. A static, compile-time check has been added to ensure that the maximum buffer size cannot overflow on any growth operations. 3. The library no longer aborts on buffer overflow. The CMark library now follows the convention of other Markdown implementations (such as Hoedown and Sundown) and silently handles buffer overflows and allocation failures by dropping data from the buffer. The result is that pathological Markdown documents that try to exploit the library will instead generate truncated (but valid, and safe) outputs. All tests after these small refactorings have been verified to pass. --- NOTE: Regarding 32 bit overflows, generating test cases that crash the library is trivial (any input document larger than 2gb will crash CMark), but most Python implementations have issues with large strings to begin with, so a test case cannot be added to the pathological tests suite, since it's written in Python.
author: Vicent Marti <tanoku@gmail.com> 2016-05-23 16:43:13 +0200
committer: John MacFarlane <jgm@berkeley.edu> 2016-06-06 15:39:05 -0700
commit: b2107ae4f05d7fe4388d37025105005592d32ef7 (patch)
tree: 5d9f37aaf75c8f6352a2f68a2f4ae2d82e7f65e8 /src/chunk.h
parent: 258a88d6c903234831281d4495eff7382932b617 (diff)
1 files changed, 2 insertions, 2 deletions
diff --git a/src/chunk.h b/src/chunk.h
index 7007492..234a4f3 100644
--- a/src/chunk.h
+++ b/src/chunk.h
@@ -83,7 +83,7 @@ static CMARK_INLINE void cmark_chunk_set_cstr(cmark_chunk *c, const char *str) {
     c->data = NULL;
     c->alloc = 0;
   } else {
-    c->len = cmark_strbuf_safe_strlen(str);
+    c->len = strlen(str);
     c->data = (unsigned char *)malloc(c->len + 1);
     c->alloc = 1;
     memcpy(c->data, str, c->len + 1);
@@ -91,7 +91,7 @@ static CMARK_INLINE void cmark_chunk_set_cstr(cmark_chunk *c, const char *str) {
 }
 
 static CMARK_INLINE cmark_chunk cmark_chunk_literal(const char *data) {
-  bufsize_t len = data ? cmark_strbuf_safe_strlen(data) : 0;
+  bufsize_t len = data ? strlen(data) : 0;
   cmark_chunk c = {(unsigned char *)data, len, 0};
   return c;
 }
author	Vicent Marti <tanoku@gmail.com>	2016-05-23 16:43:13 +0200
committer	John MacFarlane <jgm@berkeley.edu>	2016-06-06 15:39:05 -0700
commit	b2107ae4f05d7fe4388d37025105005592d32ef7 (patch)
tree	5d9f37aaf75c8f6352a2f68a2f4ae2d82e7f65e8 /src/chunk.h
parent	258a88d6c903234831281d4495eff7382932b617 (diff)