cmark - My own fork of cmark for commonmark conversion

Age	Commit message (Collapse)	Author
2016-06-06	buffer: proper safety checks for unbounded memory	Vicent Marti
	The previous work for unbounded memory usage and overflows on the buffer API had several shortcomings: 1. The total size of the buffer was limited by arbitrarily small precision on the storage type for buffer indexes (typedef'd as `bufsize_t`). This is not a good design pattern in secure applications, particualarly since it requires the addition of helper functions to cast to/from the native `size` types and the custom type for the buffer, and check for overflows. 2. The library was calling `abort` on overflow and memory allocation failures. This is not a good practice for production libraries, since it turns a potential RCE into a trivial, guaranteed DoS to the whole application that is linked against the library. It defeats the whole point of performing overflow or allocation checks when the checks will crash the library and the enclosing program anyway. 3. The default size limits for buffers were essentially unbounded (capped to the precision of the storage type) and could lead to DoS attacks by simple memory exhaustion (particularly critical in 32-bit platforms). This is not a good practice for a library that handles arbitrary user input. Hence, this patchset provides slight (but in my opinion critical) improvements on this area, copying some of the patterns we've used in the past for high throughput, security sensitive Markdown parsers: 1. The storage type for buffer sizes is now platform native (`ssize_t`). Ideally, this would be a `size_t`, but several parts of the code expect buffer indexes to be possibly negative. Either way, switching to a `size` type is an strict improvement, particularly in 64-bit platforms. All the helpers that assured that values cannot escape the `size` range have been removed, since they are superfluous. 2. The overflow checks have been removed. Instead, the maximum size for a buffer has been set to a safe value for production usage (32mb) that can be proven not to overflow in practice. Users that need to parse particularly large Markdown documents can increase this value. A static, compile-time check has been added to ensure that the maximum buffer size cannot overflow on any growth operations. 3. The library no longer aborts on buffer overflow. The CMark library now follows the convention of other Markdown implementations (such as Hoedown and Sundown) and silently handles buffer overflows and allocation failures by dropping data from the buffer. The result is that pathological Markdown documents that try to exploit the library will instead generate truncated (but valid, and safe) outputs. All tests after these small refactorings have been verified to pass. --- NOTE: Regarding 32 bit overflows, generating test cases that crash the library is trivial (any input document larger than 2gb will crash CMark), but most Python implementations have issues with large strings to begin with, so a test case cannot be added to the pathological tests suite, since it's written in Python.
2016-06-06	Fix character type detection in commonmark.c	Nick Wellnhofer
	- Implement cmark_isalpha. - Check for ASCII character before implicit cast to char. - Use internal ctype functions in commonmark.c. Fixes test failures on Windows and undefined behavior.
2016-06-02	commonmark renderer: fixed code block as first in list item.	John MacFarlane
	We don't want a blank line before a code block when it's the first thing in a list item.
2016-06-01	renderer: no_linebreaks instead of no_wrap.	John MacFarlane
	We generally want this option to prohibit any breaking in things like headers (not just wraps, but softbreaks).
2016-04-09	Reformatted.	John MacFarlane

2016-04-09	Fixed a number of issues relating to line wrapping.	John MacFarlane
	- Extend CMARK_OPT_NOBREAKS to all renderers and add `--nobreaks`. - Do not autowrap, regardless of width parameter, if CMARK_OPT_NOBREAKS is set. - Fixed CMARK_OPT_HARDBREAKS for LaTeX and man renderers. - Ensure that no auto-wrapping occurs if CMARK_OPT_NOBREAKS is enabled, or if output is CommonMark and CMARK_OPT_HARDBREAKS is enabled. - Updated man pages.
2016-03-12	Don't use variable length arrays	Nick Wellnhofer
	They're not supported by MSVC.
2016-03-12	Switch from "inline" to "CMARK_INLINE"	Nick Wellnhofer
	Newer MSVC versions support enough of C99 to be able to compile cmark in plain C mode. Only the "inline" keyword is still unsupported. We have to use "__inline" instead.
2016-02-28	Fix ctype(3) usage on NetBSD	Kamil Rytarowski
	We need to cast value passed to isspace(3) to unsigned char to explicitly prevent possibly undefined behavior. /tmp/pkgsrc-tmp/wip/cmark/work/cmark-0.24.1/src/commonmark.c: In function 'S_render_node': /tmp/pkgsrc-tmp/wip/cmark/work/cmark-0.24.1/src/commonmark.c:273:9: warning: array subscript has type 'char' [-Wchar-subscripts] (code_len > 2 && !isspace(code[0]) && ^ /tmp/pkgsrc-tmp/wip/cmark/work/cmark-0.24.1/src/commonmark.c:274:10: warning: array subscript has type 'char' [-Wchar-subscripts] !(isspace(code[code_len - 1]) && isspace(code[code_len - 2]))) && ^ /tmp/pkgsrc-tmp/wip/cmark/work/cmark-0.24.1/src/commonmark.c:274:10: warning: array subscript has type 'char' [-Wchar-subscripts] CTYPE(3) Library Functions Manual CTYPE(3) NAME isalpha, isupper, islower, isdigit, isxdigit, isalnum, isspace, ispunct, isprint, isgraph, iscntrl, isblank, toupper, tolower, - character classification and mapping functions LIBRARY Standard C Library (libc, -lc) CAVEATS The first argument of these functions is of type int, but only a very restricted subset of values are actually valid. The argument must either be the value of the macro EOF (which has a negative value), or must be a non-negative value within the range representable as unsigned char. Passing invalid values leads to undefined behavior. NetBSD 7.99 February 25, 2015 NetBSD 7.99
2016-01-18	Automatic code reformat.	John MacFarlane

2016-01-18	Merge branch 'master' of https://github.com/mbenelli/cmark into mbenelli-master	John MacFarlane

2016-01-17	commonmark: is_autolink - handle case where link has no children.	John MacFarlane

2016-01-17	Improved escaping in commonmark renderer.	John MacFarlane
	We try not to escape punctuation unless we absolutely have to. So, `)` and `.` are no longer escaped whenever they occur after digits; now they are only escaped if they are geuninely in a position where they'd cause a list item. This required a couple changes to render.c. - `renderer->begin_content` is only set to false AFTER a string of digits at the beginning of the line. (This is slightly unprincipled.) - We never break before a numeral (also slightly unprincipled).
2016-01-17	Commonmark renderer: use HTML comment to separate list from	John MacFarlane
	following list or code block. This has several advantages. First, the two blank lines breaks out of list syntax is still controversial in CommonMark. And it isn't used in other implementations. HTML comments will always work. Second, two blank lines breaks out of all lists; an HTML comment can be used to break out of just one level of nesting.
2016-01-17	commonmark renderer: use 4-space indent for bullet lists.	John MacFarlane
	This makes the output compatible with more implementations.
2016-01-16	Use 2 space + cr for line break in commonmark output.	John MacFarlane
	This is more portable. Closes #90.
2016-01-08	Fixed get_containing_block logic in src/commonmark.c.	John MacFarlane
	This did not allow for the possibility that a node might have no containing block, causing the commonmark renderer to segfault if passed an inline node with no block parent.
2015-12-28	Commonmark renderer: ensure that literal characters get escaped	John MacFarlane
	when they're at the beginning of a block, e.g. > \- foo
2015-12-28	Rename NODE_HTML -> NODE_HTML_BLOCK, NODE_INLINE_HTML -> NODE_HTML_INLINE.	John MacFarlane
	API change. Sorry, but this is the time to break things, before 1.0 is released. This matches the recent changes to CommonMark.dtd.
2015-12-22	Rename hrule -> thematic_break.	John MacFarlane
	CMARK_NODE_HRULE -> CMARK_NODE_THEMATIC_BREAK. However we've defined the former as the latter to keep backwards compatibility. See jgm/CommonMark 8fa94cb460f5e516b0e57adca33f50a669d51f6c
2015-12-22	CMARK_NODE_HEADER -> CMARK_NODE_HEADING.	John MacFarlane
	Defined CMARK_NODE_HEADER to CMARK_NODE_HEADING to ease the transition.
2015-12-22	Rename 'header' -> 'heading'.	John MacFarlane
	See jgm/CommonMark commit 0cdbcee4e840abd0ac7db93797b2b75ca4104314 Note that we have defined cmark_node_get_header_level = cmark_node_get_heading_level and cmark_node_set_header_level = camrk_node_set_heading_level for backwards compatibility in the API.
2015-12-19	Commonmark renderer: ensure html blocks surrounded by blanks.	John MacFarlane
	Otherwise we get failures of roundtrip tests.
2015-12-19	Changed API for CUSTOM_BLOCK and CUSTOM_INLINE.	John MacFarlane
	Instead of using their `as.literal` content, we now give each custom node two literal fields, one to be printed on entering the node (before rendering the children, if any), the other on exiting (after rendering children). This gives us the flexibility to have custom nodes with children.
2015-12-19	Rename RAW_BLOCK -> CUSTOM_BLOCK, RAW_INLINE -> CUSTOM_INLINE.	John MacFarlane

2015-12-19	Added RAW_BLOCK and RAW_INLINE node types.	John MacFarlane
	These are passed through verbatim by all writers, with no escaping. They are never generated by the parser, and do not correspond to CommonMark elements. They are designed to be inserted by filters that postprocess the AST. For example, a filter might convert specially marked code blocks to svg diagrams in HTML and tikz diagrams in LaTeX, passing these through to the renderer as a RAW_BLOCK.
2015-12-10	Fix warnings about dropping const qualifier	Kevin Wojniak

2015-12-01	Fix "declaration shadows a local variable"	Kevin Wojniak

2015-11-02	Replaced sprintf with snprintf.	Marco Benelli

2015-10-22	commonmark: fix size_t to int	Kevin Wojniak
	This fixes an MSVC warning "conversion from 'size_t' to 'int', possible loss of data"
2015-07-27	Use clang-format, llvm style, for formatting.	John MacFarlane
	* Reformatted all source files. * Added 'format' target to Makefile. * Removed 'astyle' target. * Updated .editorconfig.
2015-07-25	Avoided another use of strbuf_printf.	John MacFarlane

2015-07-25	commonmark renderer - use regular sprintf for list markers.	John MacFarlane
	This avoids an allocation and use of strbuf_printf.
2015-07-14	astyle reformatting.	John MacFarlane

2015-07-12	commonmark renderer - escape !.	John MacFarlane
	Now all round-trip tests pass.
2015-07-12	commonmark writer - escape all #'s, not just at beginning of line.	John MacFarlane
	This is needed for #s at the end of atx headers.
2015-07-12	Fixed soft breaks in commonmark writer.	John MacFarlane

2015-07-12	Small cleanups.	John MacFarlane
	Moved begin_line setting into render.c, so you don't need to worry about it in outc.
2015-07-12	Use cmark_render_code_point in renderers.	John MacFarlane

2015-07-12	Removed options field from renderer struct.	John MacFarlane
	Added options argument to render_node function, and rearrange argument order.
2015-07-12	cmark_render: ensure final newline.	John MacFarlane
	This allows us to remove direct manipulation of buffer from the latex and commonmark renderers.
2015-07-12	commonmark renderer - don't need to manually adjust need_cr.	John MacFarlane

2015-07-11	Fixed some windows warnings.	John MacFarlane

2015-07-11	Restructured common renderer code.	John MacFarlane
	* Added functions for cr, blankline, out to renderer object. * Removed lit (we'll handle this with a macro). * Changed type of out so it takes a regular string instead of a chunk. * Use macros LIT, OUT, BLANKLINE, CR in renderers to simplify code. (Not sure about this, but `renderer->out(renderer, ...)` sure is verbose.)
2015-07-11	Rename cmark_render_state -> cmark_renderer.	John MacFarlane

2015-07-11	Factored out common bits of rendering into separate render module.	John MacFarlane
	* Added render.c, render.h. * Moved common functions and definitions from latex.c and commonmark.c to render.c, render.h. * Added a wrapper, cmark_render, that creates a renderer given a character-escaper and a node renderer. Closes #63.
2015-07-05	astyle formatting improvements.	John MacFarlane

2015-07-05	commonmark writer: correctly handle email autolinks.	John MacFarlane

2015-06-07	Avoid strlen in commonmark.c	Nick Wellnhofer

2015-06-07	Convert code base to strbuf_t	Nick Wellnhofer
	There are probably a couple of places I missed. But this will only be a problem if we use a 64-bit bufsize_t at some point. Then, we'll get warnings from -Wshorten-64-to-32.