Age | Commit message (Collapse) | Author |
|
|
|
|
|
|
|
|
|
This reverts commit 54c990d17385156958556d86feca0c6e24da94e7.
|
|
|
|
This reverts commit abc45c57d368383eb05ca5fbb79d33b0370b419c.
|
|
|
|
|
|
|
|
This reverts commit 745b877835fed47e06daa3295aaf86312867f6f1.
|
|
|
|
This reverts commit c5732b26bb4d98cbec9de48cefad480cb880eb45.
|
|
|
|
|
|
|
|
|
|
This reverts commit 9f760cefdee9dbc18e6294d78d139b629062fad7.
|
|
|
|
|
|
|
|
|
|
|
|
Closes #334.
|
|
This is kivikakk's commit 62166fe3b6b07068ed4c4207113e3c4b060ad4a8
in cmark-gfm.
|
|
This commit ports Vicent Marti's fix in cmark-gfm.
(384cc9db4cd7a90f59c0751e58eb7b3023d38b85)
His commit message follows:
As explained on the previous commit, it is trivial to DoS the CMark
parser by generating a document where all the link reference names hash
to the same bucket in the hash table.
This will cause the lookup process for each reference to take linear
time on the amount of references in the document, and with enough link
references to lookup, the end result is a pathological O(N^2) that
causes medium-sized documents to finish parsing in 5+ minutes.
To avoid this issue, we propose the present commit.
Based on the fact that all reference lookup/resolution in a Markdown
document is always performed as a last step during the parse process,
we've reimplemented reference storage as follows:
1. New references are always inserted at the end of a linked list. This
is an O(1) operation, and does not check whether an existing (duplicate)
reference with the same label already exists in the document.
2. Upon the first call to `cmark_reference_lookup` (when it is expected
that no further references will be added to the reference map), the
linked list of references is written into a fixed-size array.
3. The fixed size array can then be efficiently sorted in-place in O(n
log n). This operation only happens once. We perform this sort in a
_stable_ manner to ensure that the earliest link reference in the
document always has preference, as the spec dictates. To accomplish
this, every reference is tagged with a generation number when initially
inserted in the linked list.
4. The sorted array is then compacted in O(n). Since it was sorted in a
stable way, the first reference for each label is preserved and the
duplicates are removed, matching the spec.
5. We can now simply perform a binary search for the current
`cmark_reference_lookup` query in O(log n). Any further lookup calls
will also be O(log n), since the sorted references table only needs to
be generated once.
The resulting implementation is notably simple (as it uses standard
library builtins `qsort` and `bsearch`), whilst performing better than
the fixed size hash table in documents that have a high number of
references and never becoming pathological regardless of the input.
|
|
This is taken from GitHub's fix:
https://github.com/github/cmark-gfm/commit/66a0836dc91e1653f7931e1218446664493da520
|
|
|
|
Closes #332.
|
|
See #332
|
|
API change: This adds a new exported function in cmark.h.
Closes #330.
|
|
In a recent commit, the check was changed to strcmp, but we really
have to use strncmp.
|
|
Introduced by a recent commit. Found by OSS-Fuzz.
|
|
This resorts to the variable substitution to ensure the path embedded is
correct. Without this, the path at the time of the configuration. In
the case of the Swift project, this ended up searching in the *source*
directory rather than the *build* directory. This will ensure that we
export the file to an absolute location and we use the same location in
the `cmarkConfig.cmake` file by means of CMake's `configure_file`
subsitution.
|
|
Adjust the include of the CMake file to use a cmarkConfig.cmake relative
location which enables use without considerations for the path.
|
|
Introduce multi-purpose data/len members in struct cmark_node. This
is mainly used to store literal text for inlines, code and HTML blocks.
Move the content strbuf for blocks from cmark_node to cmark_parser.
When finalizing nodes that allow inlines (paragraphs and headings),
detach the strbuf and store the block content in the node's data/len
members. Free the block content after processing inlines.
Reduces size of struct cmark_node by 8 bytes.
|
|
Allows to reduce size of struct cmark_node later.
|
|
Fix another place where an "allocated" cmark_chunk was used.
|
|
Use zero-terminated C strings and a separate length field instead of
cmark_chunks. Literal inline text will now be copied from the parent
block's content buffer, slowing the benchmark down by 10-15%.
The node struct never references memory of other nodes now, fixing #309.
Node accessors don't have to check for delayed creation of C strings,
so parsing and iterating all literals using the public API should
actually be faster than before.
|
|
Reduces size of struct cmark_node by 8 bytes.
|
|
Use zero-terminated C strings instead of cmark_chunks without storing
the length. This introduces a few additional strlen computations,
but overhead should be low.
Allows to reduce size of struct cmark_node later.
|
|
Use zero-terminated C strings instead of cmark_chunks without storing
the length. The length of code literals will be readded in a later
commit. strlen overhead for code info should be negligible.
Reduces size of struct cmark_node by 8 bytes.
|
|
|
|
When using multiprocessing on Windows, the main program must be
guarded with a __name__ check.
|
|
These checks don't seem to be required and broke pathological_tests.py
on Windows where multiprocessing sets __name__ to "__mp_main__".
|
|
|
|
The flag is only required for old MSVC versions.
|
|
|
|
When CMARK_OPT_SMART is enabled, we escape literal `-`,
`.`, and quote characters when needed to avoid their
being "smartified."
See e.g. jgm/pandoc#6041 for an application.
|
|
This is an internal change, as this isn't part of the
public API.
|