Age | Commit message (Collapse) | Author |
|
Also command line option `--validate-utf8`.
This option causes cmark to check for valid UTF-8,
replacing invalid sequences with the replacement
character, U+FFFD.
Reinstated api tests for utf8.
|
|
|
|
We now replace null characters in the line splitting code.
|
|
Now it just replaces bad UTF-8 sequences and NULLs.
This restores benchmarks to near their previous levels.
|
|
We no longer preprocess tabs to spaces before parsing.
Instead, we keep track of both the byte offset and
the (virtual) column as we parse block starts.
This allows us to handle tabs without converting
to spaces first. Tabs are left as tabs in the output.
Added `column` and `first_nonspace_column` fields to `parser`.
Added utility function to advance the offset, computing
the virtual column too.
Note that we don't need to deal with UTF-8 here at all.
Only ASCII occurs in block starts.
Significant performance improvement due to the fact that
we're not doing UTF-8 validation -- though we might want
to add that back in.
|
|
|
|
This isn't actually needed.
|
|
Guard against too large chunks passed via the API.
|
|
There are probably a couple of places I missed. But this will only
be a problem if we use a 64-bit bufsize_t at some point. Then, we'll
get warnings from -Wshorten-64-to-32.
|
|
|
|
Added fields `offset`, `first_nonspace`, `indent`, and `blank`
to `cmark_parser` struct.
This just removes some repetition in the code.
|
|
|
|
This fixes cases like:
```
1. a
2. b
3. c
```
|
|
|
|
From btrask's alternate code in the comment on
https://github.com/jgm/cmark/pull/18.
Note: this gives a 1-2% performance boot in our benchmark,
probably enough to make it worth while.
|
|
Conflicts:
src/blocks.c
|
|
Closes #52.
|
|
|
|
|
|
|
|
|
|
|
|
By the time we check for a list start, we've already checked
for an HRULE, so we don't need to repeat that check here.
Thanks to Robin Stocker for pointing out a similar redundancy
in commonmark.js.
|
|
|
|
Closes #9, confirmed with ASAN.
Avoid using `parser->current` in the loop that creates new
blocks, since `finalize` in `add_child` may have removed
the current parser (if it contains only reference definitions).
This isn't a great solution; in the long run we need to rewrite
to make the logic clearer and to make it harder to make
mistakes like this one.
|
|
This arose when a paragraph containing only reference links and
blank space was finalized. Finalization would remove the
node. `finalize` returns the parent node, but the problem
arose because we had both `cur` and `parser->current`, and
only one was being updated. Solution: remove `cur`, which is
a holdover from before we had `parser->current`.
I believe this will close #9 -- @JordanMilne can you test and confirm?
|
|
For consistency with the API.
|
|
|
|
Also to some non-exported functions in blocks and inlines.
|
|
|
|
Added assertion to raise an error if finalize is called
on a closed block (as was happening undetected because of
the fallback behavior).
|
|
|
|
|
|
This is a more logical arrangement and follows recent changes to
the JS implementation.
|
|
|
|
Closes #286.
|
|
Minor code reformatting:
This corrects an overzealous global replace from earlier.
|
|
|
|
|
|
Otherwise cmark's behavior varies unpredictably with the locale.
`is_punctuation` in utf8.h has also been adjusted so that everything
that counts all ASCII symbol characters count as punctuation, even
though some are not in P* character classes.
|
|
|
|
API exports cmark_node_get_column.
XML writer indicates start and end line and column for block-level
nodes.
|
|
|
|
Also break_out_of_lists.
|
|
|
|
In the last few commits we were using as.code.fenced and as.literal at
the same time for NODE_CODE_BLOCK, which obviously led to problems.
|
|
Reverts 225d720.
|
|
|
|
|
|
This is for consistency with the other types of nodes that have
literal strings as contents.
|