Markdown parser C library
MD4C Readme
- Home: http://github.com/mity/md4c
- Wiki: http://github.com/mity/md4c/wiki
- Issue tracker: http://github.com/mity/md4c/issues
MD4C stands for "Markdown for C" and that's exactly what this project is about.
What is Markdown
In short, Markdown is the markup language this README.md
file is written in.
The following resources can explain more if you are unfamiliar with it:
What is MD4C
MD4C is Markdown parser implementation in C, with the following features:
-
Compliance: Generally, MD4C aims to be compliant to the latest version of CommonMark specification. Currently, we are fully compliant to CommonMark 0.29.
-
Extensions: MD4C supports some commonly requested and accepted extensions. See below.
-
Performance: MD4C is very fast.
-
Compactness: MD4C parser is implemented in one source file and one header file. There are no dependencies other than standard C library.
-
Embedding: MD4C parser is easy to reuse in other projects, its API is very straightforward: There is actually just one function,
md_parse()
. -
Push model: MD4C parses the complete document and calls few callback functions provided by the application to inform it about a start/end of every block, a start/end of every span, and with any textual contents.
-
Portability: MD4C builds and works on Windows and POSIX-compliant OSes. (It should be simple to make it run also on most other platforms, at least as long as the platform provides C standard library, including a heap memory management.)
-
Encoding: MD4C by default expects UTF-8 encoding of the input document. But it can be compiled to recognize ASCII-only control characters (i.e. to disable all Unicode-specific code), or (on Windows) to expect UTF-16 (i.e. what is on Windows commonly called just "Unicode"). See more details below.
-
Permissive license: MD4C is available under the MIT license.
Using MD4C
Parsing Markdown
If you need just to parse a Markdown document, you need to include md4c.h
and link against MD4C library (-lmd4c
); or alternatively add md4c.[hc]
directly to your code base as the parser is only implemented in the single C
source file.
The main provided function is md_parse()
. It takes a text in the Markdown
syntax and a pointer to a structure which provides pointers to several callback
functions.
As md_parse()
processes the input, it calls the callbacks (when entering or
leaving any Markdown block or span; and when outputting any textual content of
the document), allowing application to convert it into another format or render
it onto the screen.
Converting to HTML
If you need to convert Markdown to HTML, include md4c-html.h
and link against
MD4C-HTML library (-lmd4c-html
); or alternatively add the sources md4c.[hc]
,
md4c-html.[hc]
and entity.[hc]
into your code base.
To convert a Markdown input, call md_html()
function. It takes the Markdown
input and calls the provided callback function. The callback is fed with
chunks of the HTML output. Typical callback implementation just appends the
chunks into a buffer or writes them to a file.
Markdown Extensions
The default behavior is to recognize only Markdown syntax defined by the CommonMark specification.
However, with appropriate flags, the behavior can be tuned to enable some extensions:
-
With the flag
MD_FLAG_COLLAPSEWHITESPACE
, a non-trivial whitespace is collapsed into a single space. -
With the flag
MD_FLAG_TABLES
, GitHub-style tables are supported. -
With the flag
MD_FLAG_TASKLISTS
, GitHub-style task lists are supported. -
With the flag
MD_FLAG_STRIKETHROUGH
, strike-through spans are enabled (text enclosed in tilde marks, e.g.~foo bar~
). -
With the flag
MD_FLAG_PERMISSIVEURLAUTOLINKS
permissive URL autolinks (not enclosed in<
and>
) are supported. -
With the flag
MD_FLAG_PERMISSIVEEMAILAUTOLINKS
, permissive e-mail autolinks (not enclosed in<
and>
) are supported. -
With the flag
MD_FLAG_PERMISSIVEWWWAUTOLINKS
permissive WWW autolinks without any scheme specified (e.g.www.example.com
) are supported. MD4C then assumeshttp:
scheme. -
With the flag
MD_FLAG_LATEXMATHSPANS
LaTeX math spans ($...$
) and LaTeX display math spans ($$...$$
) are supported. (Note though that the HTML renderer outputs them verbatim in a custom tag<x-equation>
.) -
With the flag
MD_FLAG_WIKILINKS
, wiki-style links ([[link label]]
and[[target article|link label]]
) are supported. (Note that the HTML renderer outputs them in a custom tag<x-wikilink>
.) -
With the flag
MD_FLAG_UNDERLINE
, underscore (_
) denotes an underline instead of an ordinary emphasis or strong emphasis.
Few features of CommonMark (those some people see as mis-features) may be disabled with the following flags:
-
With the flag
MD_FLAG_NOHTMLSPANS
orMD_FLAG_NOHTMLBLOCKS
, raw inline HTML or raw HTML blocks respectively are disabled. -
With the flag
MD_FLAG_NOINDENTEDCODEBLOCKS
, indented code blocks are disabled.
Input/Output Encoding
The CommonMark specification declares that any sequence of Unicode code points is a valid CommonMark document.
But, under a closer inspection, Unicode plays any role in few very specific situations when parsing Markdown documents:
-
For detection of word boundaries when processing emphasis and strong emphasis, some classification of Unicode characters (whether it is a whitespace or a punctuation) is needed.
-
For (case-insensitive) matching of a link reference label with the corresponding link reference definition, Unicode case folding is used.
-
For translating HTML entities (e.g.
&
) and numeric character references (e.g.#
orಫ
) into their Unicode equivalents.However note MD4C leaves this translation on the renderer/application; as the renderer is supposed to really know output encoding and whether it really needs to perform this kind of translation. (For example, when the renderer outputs HTML, it may leave the entities untranslated and defer the work to a web browser.)
MD4C relies on this property of the CommonMark and the implementation is, to a large degree, encoding-agnostic. Most of MD4C code only assumes that the encoding of your choice is compatible with ASCII. I.e. that the codepoints below 128 have the same numeric values as ASCII.
Any input MD4C does not understand is simply seen as part of the document text and sent to the renderer's callback functions unchanged.
The two situations (word boundary detection and link reference matching) where MD4C has to understand Unicode are handled as specified by the following preprocessor macros (as specified at the time MD4C is being built):
-
If preprocessor macro
MD4C_USE_UTF8
is defined, MD4C assumes UTF-8 for the word boundary detection and for the case-insensitive matching of link labels.When none of these macros is explicitly used, this is the default behavior.
-
On Windows, if preprocessor macro
MD4C_USE_UTF16
is defined, MD4C usesWCHAR
instead ofchar
and assumes UTF-16 encoding in those situations. (UTF-16 is what Windows developers usually call just "Unicode" and what Win32API generally works with.)Note that because this macro affects also the types in
md4c.h
, you have to define the macro both when building MD4C as well as when includingmd4c.h
.Also note this is only supported in the parser (
md4c.[hc]
). The HTML renderer does not support this and you will have to write your own custom renderer to use this feature. -
If preprocessor macro
MD4C_USE_ASCII
is defined, MD4C assumes nothing but an ASCII input.That effectively means that non-ASCII whitespace or punctuation characters won't be recognized as such and that link reference matching will work in a case-insensitive way only for ASCII letters (
[a-zA-Z]
).
Documentation
The API of the parser is quite well documented in the comments in the md4c.h
.
Similarly, the markdown-to-html API is described in its header md4c-html.h
.
There is also project wiki which provides some more comprehensive documentation. However note it is incomplete and some details may be somewhat outdated.
FAQ
Q: How does MD4C compare to a parser XY?
A: Some other implementations combine Markdown parser and HTML generator into a single entangled code hidden behind an interface which just allows the conversion from Markdown to HTML. They are often unusable if you want to process the input in any other way.
Even when the parsing is available as a standalone feature, most parsers (if not all of them; at least within the scope of C/C++ language) are full DOM-like parsers: They construct abstract syntax tree (AST) representation of the whole Markdown document. That takes time and it leads to bigger memory footprint.
It's completely fine as long as you really need it. If you don't need the full AST, there is a very high chance that using MD4C will be substantially faster and less hungry in terms of memory consumption.
Last but not least, some Markdown parsers are implemented in a naive way. When fed with a smartly crafted input pattern, they may exhibit quadratic (or even worse) parsing times. What MD4C can still parse in a fraction of second may turn into long minutes or possibly hours with them. Hence, when such a naive parser is used to process an input from an untrusted source, the possibility of denial-of-service attacks becomes a real danger.
A lot of our effort went into providing linear parsing times no matter what kind of crazy input MD4C parser is fed with. (If you encounter an input pattern which leads to a sub-linear parsing times, please do not hesitate and report it as a bug.)
Q: Does MD4C perform any input validation?
A: No. And we are proud of it. :-)
CommonMark specification states that any sequence of Unicode characters is a valid Markdown document. (In practice, this more or less always means UTF-8 encoding.)
In other words, according to the specification, it does not matter whether some Markdown syntax construction is in some way broken or not. If it is broken, it will simply not be recognized and the parser should see it just as a verbatim text.
MD4C takes this a step further: It sees any sequence of bytes as a valid input, following completely the GIGO philosophy (garbage in, garbage out). I.e. any ill-formed UTF-8 byte sequence will propagate to the respective callback as a part of the text.
If you need to validate that the input is, say, a well-formed UTF-8 document, you have to do it on your own. The easiest way how to do this is to simply validate the whole document before passing it to the MD4C parser.
License
MD4C is covered with MIT license, see the file LICENSE.md
.
Links to Related Projects
Ports and bindings to other languages:
-
commonmark-d: Port of MD4C to D language.
-
markdown-wasm: Port of MD4C to WebAssembly.
-
PyMD4C: Python bindings for MD4C
Software using MD4C:
-
QOwnNotes: A plain-text file notepad and todo-list manager with markdown support and ownCloud / Nextcloud integration.
-
Qt: Cross-platform C++ GUI framework.
-
Textosaurus: Cross-platform text editor based on Qt and Scintilla.
-
8th: Cross-platform concatenative programming language.
version | 0.4.8 |
---|---|
license | MIT |
repository | https://pkg.cppget.org/1/stable |
download | libmd4c-0.4.8.tar.gz |
sha256 | 8c48151d8aa4846f1db9ea9838e10cdbfc0f344a8893cf54803abb1a74924e66 |
project | md4c |
---|---|
url | github.com/mity/md4c |
doc-url | github.com/mity/md4c/wiki |
src-url | github.com/mity/md4c |
package-url | github.com/build2-packaging/md4c |
package-email | packaging@build2.orgMailing list |
Reviews
fail | 0 |
---|---|
pass | 1 |
Builds
toolchain | public-0.17.0 |
---|---|
target | x86_64-linux-gnu |
tgt config | linux_debian_12-clang_18_libc++-static_O3 |
timestamp | 2025-04-22 06:17:11 UTC (03:35:02 hours ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-linux-gnu |
tgt config | linux_debian_12-clang_18_libc++ |
timestamp | 2025-04-22 06:16:37 UTC (03:35:37 hours ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-linux-gnu |
tgt config | linux_debian_12-clang_18_libc++-O3 |
timestamp | 2025-04-22 06:15:58 UTC (03:36:16 hours ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-linux-gnu |
tgt config | linux_debian_12-clang_18-O3 |
timestamp | 2025-04-22 06:14:35 UTC (03:37:38 hours ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-linux-gnu |
tgt config | linux_debian_12-clang_18-static_O3 |
timestamp | 2025-04-22 06:14:03 UTC (03:38:11 hours ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-linux-gnu |
tgt config | linux_debian_12-clang_18 |
timestamp | 2025-04-22 06:13:26 UTC (03:38:48 hours ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | aarch64-linux-gnu |
tgt config | linux_debian_12-clang_17_libc++ |
timestamp | 2025-04-21 12:40:00 UTC (21:12:13 hours ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | aarch64-linux-gnu |
tgt config | linux_debian_12-clang_17 |
timestamp | 2025-04-21 12:39:34 UTC (21:12:39 hours ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | aarch64-linux-gnu |
tgt config | linux_debian_12-gcc_13 |
timestamp | 2025-04-21 12:20:17 UTC (21:31:56 hours ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-apple-darwin22.5.0 |
tgt config | macos_13-clang_15.0 |
timestamp | 2025-04-21 09:05:09 UTC (01 00:47:04 days ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-microsoft-win32-msvc14.3 |
tgt config | windows_10-clang_18_llvm_msvc_17.10-static_O2 |
timestamp | 2025-04-21 08:21:37 UTC (01 01:30:36 days ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-microsoft-win32-msvc14.3 |
tgt config | windows_10-clang_18_llvm_msvc_17.10-O2 |
timestamp | 2025-04-21 08:20:49 UTC (01 01:31:24 days ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-microsoft-win32-msvc14.3 |
tgt config | windows_10-clang_18_llvm_msvc_17.10 |
timestamp | 2025-04-21 08:18:41 UTC (01 01:33:32 days ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-linux-gnu |
tgt config | linux_ubuntu_24.04-gcc_13-bindist |
timestamp | 2025-04-21 07:52:32 UTC (01 01:59:42 days ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-microsoft-win32-msvc14.3 |
tgt config | windows_10-clang_17_msvc_msvc_17.10 |
timestamp | 2025-04-21 07:46:39 UTC (01 02:05:34 days ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-w64-mingw32 |
tgt config | windows_10-gcc_13.2_mingw_w64-O2 |
timestamp | 2025-04-21 07:45:57 UTC (01 02:06:16 days ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-w64-mingw32 |
tgt config | windows_10-gcc_13.2_mingw_w64-static_O2 |
timestamp | 2025-04-21 07:45:14 UTC (01 02:06:59 days ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-w64-mingw32 |
tgt config | windows_10-gcc_13.2_mingw_w64 |
timestamp | 2025-04-21 07:45:05 UTC (01 02:07:08 days ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-linux-gnu |
tgt config | linux_debian_12-gcc_14-static_O3 |
timestamp | 2025-04-21 07:40:03 UTC (01 02:12:10 days ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-freebsd13.3 |
tgt config | freebsd_13-clang_17 |
timestamp | 2025-04-21 07:39:28 UTC (01 02:12:46 days ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-linux-gnu |
tgt config | linux_debian_12-gcc_14-O3 |
timestamp | 2025-04-21 07:39:03 UTC (01 02:13:10 days ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-linux-gnu |
tgt config | linux_debian_12-gcc_14-ndebug_O3 |
timestamp | 2025-04-21 07:38:31 UTC (01 02:13:42 days ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-linux-gnu |
tgt config | linux_debian_12-gcc_14 |
timestamp | 2025-04-21 07:37:37 UTC (01 02:14:36 days ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-freebsd14.1 |
tgt config | freebsd_14-clang_18-static_O3 |
timestamp | 2025-04-21 07:36:47 UTC (01 02:15:26 days ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-freebsd14.1 |
tgt config | freebsd_14-clang_18-O3 |
timestamp | 2025-04-21 07:36:15 UTC (01 02:15:58 days ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-freebsd14.1 |
tgt config | freebsd_14-clang_18 |
timestamp | 2025-04-21 07:34:40 UTC (01 02:17:34 days ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-linux-gnu |
tgt config | linux_debian_12-clang_17_libc++ |
timestamp | 2025-04-21 07:34:08 UTC (01 02:18:05 days ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-linux-gnu |
tgt config | linux_debian_12-gcc_13.1 |
timestamp | 2025-04-21 07:33:20 UTC (01 02:18:54 days ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-linux-gnu |
tgt config | linux_debian_12-clang_17 |
timestamp | 2025-04-21 07:33:07 UTC (01 02:19:06 days ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-linux-gnu |
tgt config | linux_debian_12-gcc_12-bindist |
timestamp | 2025-04-20 15:58:20 UTC (01 17:53:53 days ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | aarch64-linux-gnu |
tgt config | linux_debian_12-gcc_14-ndebug_O3 |
timestamp | 2025-04-20 03:55:44 UTC (02 05:56:29 days ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | aarch64-linux-gnu |
tgt config | linux_debian_12-gcc_14-O3 |
timestamp | 2025-04-20 03:52:24 UTC (02 05:59:50 days ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | aarch64-linux-gnu |
tgt config | linux_debian_12-gcc_14-static_O3 |
timestamp | 2025-04-20 03:51:52 UTC (02 06:00:21 days ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | aarch64-linux-gnu |
tgt config | linux_debian_12-gcc_14 |
timestamp | 2025-04-20 03:50:38 UTC (02 06:01:35 days ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-linux-gnu |
tgt config | linux_fedora_40-gcc_14-bindist |
timestamp | 2025-04-19 20:21:38 UTC (02 13:30:36 days ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | aarch64-linux-gnu |
tgt config | linux_debian_12-clang_18_libc++-O3 |
timestamp | 2025-04-19 19:03:34 UTC (02 14:48:39 days ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | aarch64-linux-gnu |
tgt config | linux_debian_12-clang_18_libc++-static_O3 |
timestamp | 2025-04-19 19:03:30 UTC (02 14:48:43 days ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | aarch64-linux-gnu |
tgt config | linux_debian_12-clang_18_libc++ |
timestamp | 2025-04-19 19:03:01 UTC (02 14:49:12 days ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | aarch64-linux-gnu |
tgt config | linux_debian_12-clang_18-O3 |
timestamp | 2025-04-19 19:02:29 UTC (02 14:49:44 days ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | aarch64-linux-gnu |
tgt config | linux_debian_12-clang_18-static_O3 |
timestamp | 2025-04-19 19:00:10 UTC (02 14:52:03 days ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | aarch64-linux-gnu |
tgt config | linux_debian_12-clang_18 |
timestamp | 2025-04-19 18:59:47 UTC (02 14:52:26 days ago) |
result | success | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-microsoft-win32-msvc14.3 |
tgt config | windows_10-msvc_17.10-static_O2 |
timestamp | 2025-04-19 12:30:41 UTC (02 21:21:33 days ago) |
result | warning (update) | warning (test-installed) | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-microsoft-win32-msvc14.3 |
tgt config | windows_10-msvc_17.10 |
timestamp | 2025-04-19 12:30:04 UTC (02 21:22:09 days ago) |
result | warning (update) | warning (test-installed) | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-microsoft-win32-msvc14.3 |
tgt config | windows_10-msvc_17.10-O2 |
timestamp | 2025-04-19 12:29:40 UTC (02 21:22:34 days ago) |
result | warning (update) | warning (test-installed) | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-microsoft-win32-msvc14.3 |
tgt config | windows_10-msvc_17.8-static_O2 |
timestamp | 2025-04-19 12:25:23 UTC (02 21:26:50 days ago) |
result | warning (update) | warning (test-installed) | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-microsoft-win32-msvc14.3 |
tgt config | windows_10-msvc_17.8-O2 |
timestamp | 2025-04-19 12:23:50 UTC (02 21:28:23 days ago) |
result | warning (update) | warning (test-installed) | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-microsoft-win32-msvc14.3 |
tgt config | windows_10-msvc_17.8 |
timestamp | 2025-04-19 12:23:38 UTC (02 21:28:35 days ago) |
result | warning (update) | warning (test-installed) | log | rebuild |
toolchain | public-0.17.0 |
---|---|
target | x86_64-linux-gnu |
tgt config | linux_fedora_39-gcc_13-bindist |
result | unbuilt |
toolchain | public-0.17.0 |
---|---|
target | x86_64-apple-darwin23.5.0 |
tgt config | macos_14-clang_15.0 |
result | unbuilt |
toolchain | public-0.17.0 |
---|---|
target | x86_64-apple-darwin23.5.0 |
tgt config | macos_14-clang_15.0-O3 |
result | unbuilt |
toolchain | public-0.17.0 |
---|---|
target | x86_64-apple-darwin23.5.0 |
tgt config | macos_14-clang_15.0-static_O3 |
result | unbuilt |
toolchain | public-0.17.0 |
---|---|
target | x86_64-apple-darwin23.5.0 |
tgt config | macos_14-gcc_14_homebrew |
result | unbuilt |
toolchain | public-0.17.0 |
---|---|
target | x86_64-apple-darwin23.5.0 |
tgt config | macos_14-gcc_14_homebrew-O3 |
result | unbuilt |
toolchain | public-0.17.0 |
---|---|
target | x86_64-apple-darwin23.5.0 |
tgt config | macos_14-gcc_14_homebrew-static_O3 |
result | unbuilt |
Changes
MD4C Change Log
Version 0.4.8
Fixes:
-
#149: A HTML block started in a container block (and not explicitly finished in the block) could eat 1 line of actual contents.
-
#150: Fix md2html utility to output proper DOCTYPE and HTML tags when
--full-html
command line options is used, accordingly to the expected output format (HTML or XHTML). -
#152: Suppress recognition of a permissive autolink if it would otherwise form a complete body of an outer inline link.
-
#153, #154: Set
MD_BLOCK_UL_DETAIL::mark
andMD_BLOCK_OL_DETAIL::mark_delimiter
correctly, even when the blocks are nested at the same line in a complicated ways. -
#155: Avoid reading 1 character beyond the input size in some complex cases.
Version 0.4.7
Changes:
- Add
MD_TABLE_DETAIL
structure into the API. The structure describes column count and row count of the table, and pointer to it is passed into the application-provided block callback with theMD_BLOCK_TABLE
block type.
Fixes:
-
#131: Fix handling of a reference image nested in a reference link.
-
#135: Handle unmatched parenthesis pairs inside a permissive URL and WWW auto-links in a way more compatible with the GFM.
-
#138: The tag
<tbody></tbody>
is now suppressed whenever the table has zero body rows. -
#139: Recognize a list item mark even when EOF follows it.
-
#142: Fix reference link definition label matching in a case when the label ends with a Unicode character with non-trivial case folding mapping.
Version 0.4.6
Fixes:
-
#130: Fix
ISANYOF
macro, which could provide unexpected results when encountering zero byte in the input text; in some cases leading to broken internal state of the parser.The bug could result in denial of service and possibly also to other security implications. Applications are advised to update to 0.4.6.
Version 0.4.5
Fixes:
-
#118: Fix HTML renderer's
MD_HTML_FLAG_VERBATIM_ENTITIES
flag, exposed in themd2html
utility via--fverbatim-entities
. -
#124: Fix handling of indentation of 16 or more spaces in the fenced code blocks.
Version 0.4.4
Changes:
- Make Unicode-specific code compliant to Unicode 13.0.
New features:
-
The HTML renderer, developed originally as the heart of the
md2html
utility, is now built as a standalone library, in order to simplify its reuse in applications. -
With
MD_HTML_FLAG_SKIP_UTF8_BOM
, the HTML renderer now skips UTF-8 byte order mark (BOM) if the input begins with it, before passing to the Markdown parser.md2html
utility automatically enables the flag (unless it is custom-built with-DMD4C_USE_ASCII
). -
With
MD_HTML_FLAG_XHTML
, The HTML renderer generates XHTML instead of HTML.This effectively means
<br />
instead of<br>
,<hr />
instead of<hr>
, and<img ... />
instead of<img ...>
.md2html
utility now understands the command line option-x
or--xhtml
enabling the XHTML mode.
Fixes:
-
#113: Add missing folding info data for the following Unicode characters:
U+0184
,U+018a
,U+01b2
,U+01b5
,U+01f4
,U+0372
,U+038f
,U+1c84
,U+1fb9
,U+1fbb
,U+1fd9
,U+1fdb
,U+1fe9
,U+1feb
,U+1ff9
,U+1ffb
,U+2c7f
,U+2ced
,U+a77b
,U+a792
,U+a7c9
.Due the bug, the link definition label matching did not work in the case insensitive way for these characters.
Version 0.4.3
New features:
- With
MD_FLAG_UNDERLINE
, spans enclosed in underscore (_foo_
) are seen as underline (MD_SPAN_UNDERLINE
) rather than an ordinary emphasis or strong emphasis.
Changes:
-
The implementation of wiki-links extension (with
MD_FLAG_WIKILINKS
) has been simplified.- A noticeable increase of MD4C's memory footprint introduced by the extension implementation in 0.4.0 has been removed.
- The priority handling towards other inline elements have been unified. (This affects an obscure case where syntax of an image was in place of wiki-link destination made the wiki-link invalid. Now all inline spans in the wiki-link destination, including the images, is suppressed.)
- The length limitation of 100 characters now always applies to wiki-link destination.
-
Recognition of strike-through spans (with the flag
MD_FLAG_STRIKETHROUGH
) has become much stricter and, arguably, reasonable.- Only single tildes (
~
) and double tildes (~~
) are recognized as strike-through marks. Longer ones are not anymore. - The length of the opener and closer marks have to be the same.
- The tildes cannot open a strike-through span if a whitespace follows.
- The tildes cannot close a strike-through span if a whitespace precedes.
This change follows the changes of behavior in cmark-gfm some time ago, so it is also beneficial from compatibility point of view.
- Only single tildes (
-
When building MD4C by hand instead of using its CMake-based build, the UTF-8 support was by default disabled, unless explicitly asked for by defining a preprocessor macro
MD4C_USE_UTF8
.This has been changed and the UTF-8 mode now becomes the default, no matter how
md4c.c
is compiled. If you need to disable it and use the ASCII-only mode, you have explicitly define macroMD4C_USE_ASCII
when compiling it.(The CMake-based build as provided in our repository explicitly asked for the UTF-8 support with
-DMD4C_USE_UTF8
. I.e. if you are using MD4C library built with our vanillaCMakeLists.txt
files, this change should not affect you.)
Fixes:
-
Fixed some string length handling in the special
MD4C_USE_UTF16
build.(This does not affect you unless you are on Windows and explicitly define the macro when building MD4C.)
-
#100: Fixed an off-by-one error in the maximal length limit of some segments of e-mail addresses used in autolinks.
-
#107: Fix mis-detection of asterisk-encoded emphasis in some corner cases when length of the opener and closer differs, as in
***foo *bar baz***
.
Version 0.4.2
Fixes:
- #98:
Fix mis-detection of asterisk-encoded emphasis in some corner cases when
length of the opener and closer differs, as in
**a *b c** d*
.
Version 0.4.1
Unfortunately, 0.4.0 has been released with badly updated ChangeLog. Fixing this is the only change on 0.4.1.
Version 0.4.0
New features:
-
With
MD_FLAG_LATEXMATHSPANS
, LaTeX math spans ($...$
) and LaTeX display math spans ($$...$$
) are now recognized. (Note though that the HTML renderer outputs them verbatim in a custom<x-equation>
tag.)Contributed by Tilman Roeder.
-
With
MD_FLAG_WIKILINKS
, Wiki-style links ([[...]]
) are now recognized. (Note though that the HTML renderer renders them as a custom<x-wikilink>
tag.)Contributed by Nils Blomqvist.
Changes:
-
Parsing of tables (with
MD_FLAG_TABLES
) is now closer to the way how cmark-gfm parses tables as we do not require every row of the table to contain a pipe|
anymore.As a consequence, paragraphs now cannot interrupt tables. A paragraph which follows the table has to be delimited with a blank line.
Fixes:
-
#94:
md_build_ref_def_hashtable()
: Do not allocate more memory than strictly needed. -
#95:
md_is_container_mark()
: Ordered list mark requires at least one digit. -
#96: Some fixes for link label comparison.
Version 0.3.4
Changes:
-
Make Unicode-specific code compliant to Unicode 12.1.
-
Structure
MD_BLOCK_CODE_DETAIL
got new memberfenced_char
. Application can use it to detect character used to form the block fences (`
or~
). In the case of indented code block, it is set to zero.
Fixes:
-
#77: Fix maximal count of digits for numerical character references, as requested by CommonMark specification 0.29.
-
#78: Fix link reference definition label matching for Unicode characters where the folding mapping leads to multiple codepoints, as e.g. in
ẞ
->SS
. -
#83: Fix recognition of an empty blockquote which interrupts a paragraph.
Version 0.3.3
Changes:
-
Make permissive URL autolink and permissive WWW autolink extensions stricter.
This brings the behavior closer to GFM and mitigates risk of false positives. In particular, the domain has to contain at least one dot and parenthesis can be part of the link destination only if
(
and)
are balanced.
Fixes:
-
#73: Some raw HTML inputs could lead to quadratic parsing times.
-
#74: Fix input leading to a crash. Found by fuzzing.
-
#76: Fix handling of parenthesis in some corner cases of permissive URL autolink and permissive WWW autolink extensions.
Version 0.3.2
Changes:
-
Changes mandated by CommonMark specification 0.29.
Most importantly, the white-space trimming rules for code spans have changed. At most one space/newline is trimmed from beginning/end of the code span (if the code span contains some non-space contents, and if it begins and ends with space at the same time). In all other cases the spaces in the code span are now left intact.
Other changes in behavior are in corner cases only. Refer to CommonMark 0.29 notes for more info.
Fixes:
-
#68: Some specific HTML blocks were not recognized when EOF follows without any end-of-line character.
-
#69: Strike-through span not working correctly when its opener mark is directly followed by other opener mark; or when other closer mark directly precedes its closer mark.
Version 0.3.1
Fixes:
-
#58, #59, #60, #63, #66: Some inputs could lead to quadratic parsing times. Thanks to Anders Kaseorg for finding all those issues.
-
#61: Flag
MD_FLAG_NOHTMLSPANS
erroneously affected also recognition of CommonMark autolinks.
Version 0.3.0
New features:
-
Add extension for GitHub-style task lists:
* [x] foo * [x] bar * [ ] baz
(It has to be explicitly enabled with
MD_FLAG_TASKLISTS
.) -
Added support for building as a shared library. On non-Windows platforms, this is now default behavior; on Windows static library is still the default. The CMake option
BUILD_SHARED_LIBS
can be used to request one or the other explicitly.Contributed by Lisandro Damián Nicanor Pérez Meyer.
-
Renamed structure
MD_RENDERER
toMD_PARSER
and refactorize its contents a little bit. Note this is source-level incompatible and initialization code in apps may need to be updated.The aim of the change is to be more friendly for long-term ABI compatibility we shall maintain, starting with this release.
-
Added
CHANGELOG.md
(this file). -
Make sure
md_process_table_row()
reports the same count of table cells for all table rows, no matter how broken the input is. The cell count is derived from table underline line. Bogus cells in other rows are silently ignored. Missing cells in other rows are reported as empty ones.
Fixes:
-
CID 1475544: Calling
md_free_attribute()
on uninitialized data. -
#47: Using bad offsets in
md_is_entity_str()
, in some cases leading to buffer overflow. -
#51: Segfault in
md_process_table_cell()
. -
#53: With
MD_FLAG_PERMISSIVEURLAUTOLINKS
orMD_FLAG_PERMISSIVEWWWAUTOLINKS
we could generate bad output for ordinary Markdown links, if a non-space character immediately follows like e.g. in[link](http://github.com)X
.
Version 0.2.7
This was the last version before the changelog has been added.