Why all you’ll ever need is Markdown

an introduction to Pandoc

Johan Hidding
Netherlands eScience Center
8 min readJul 2, 2019

Markdown is widely used for writing up documents. It is supported by many blog engines, GitHub readme, you name it. There are many varieties, dialects and extensions to Markdown. The standard is described on the Daring Fireball website. One place where many flavours of Markdown meet is in Pandoc. Pandoc can convert Markdown to many different formats like HTML, LaTeX, PDF (through LaTeX), RTF, DOC, EPUB and even back to Markdown of a different flavour. There are some little known features (or extensions) of Markdown that make it very versatile and suitable for any rich text content, especially if you use Pandoc.

The take-away message is: you will never again have to write a document in LaTeX or HTML directly, not for writing notes, reports or papers, no power-point for presentations, no awkward wysiwig editor for web content (I’m talking to you Medium!); just use Markdown.

You might ask “But Johan, how do I then …?”, shush! The answer is always going to be: Pandoc.

Primer

Markdown is a way of writing up rich content (i.e. text, document hierarchy, images, lists, quotes, etc.) in plain text using a human readable format. The format is aimed to mimic in plain text the formatted result:

You can read Daring Fireball for more syntax. The basics of Markdown are highly intuitive, but in many instances standard Markdown does not suffice: there is no support for equations, citation management, cross references or numbered figures. However, Pandoc supports many extensions and flavours of Markdown that have evolved over the years. Markdown feels a bit like a natural language. This has lead to some critique on the use of Markdown in technical documents.

I will highlight some extensions, all clearly documented and supported by Pandoc, that transform Markdown into a versatile and extendable format, suitable for even the most technical and demanding documents, yet easy to use when no such demands arise.

Flavours and extensions

There are many extensions to Markdown, some of which are more widely used than others. One of the most influencial flavour of Markdown is the superset by Github, its major addition being that of delimited code blocks.

Attributes

Nothing good ever comes from PHP, except for PHP Markdown, which adds attributes to Markdown.

Any element in a Markdown document can be adorned with (CSS style) attributes by appending them in curly braces: {#id .class .class key=value}. For example,

when converted to HTML looks like

or to LaTeX

In the case of LaTeX, the color attribute as well as the class is ignored, because Pandoc (by default) doesn’t know what to do with it. The #proof id attached to this header can be used further on in the document to make cross-references.

The attribute syntax also applies to code blocks: a line starting with ```python is equivalent to ``` {.python} .

DIV elements

A <div> element can be added using three (or more) colons.

This can be used to write down any non-standard element. So how is this rendered? If your output format is HTML, change the style sheet. For generating LaTeX however we need to do a bit more work. This is where Pandoc comes in. Pandoc has support for filtering elements and creating relevant output, but we’ll get back to that.

YAML metadata

Information that is usually contained in a HTML <header> region can be included in the YAML metadata block. This is a block delimited by hyphens at the top of the document.

This is also the place where you can configure options for Pandoc and its filters.

Citations

Citations are managed using the pandoc-citeproc plugin. For those who have worked with BibTeX (or another bibliography database) before, you include the bibliography by adding a line to the YAML header block. Say you have stored your references in ref.bib , and want to create a dedicated section called “References” to list the citations:

There are many ways to cite papers, the syntax for which is documented in the Pandoc manual. Basic citation looks like [@Hidding2014] , where Hidding2014 is an entry in the bibliography. Several editors (VSCode and Emacs) even support autocompletion on the included bibliography.

Equations

Equations can be entered using the famous LaTeX DSL for equations. Use single $ characters to delimit an inline equation and double $$ to delimit a full width equation.

When converting to a LaTeX based output this will work trivially. For HTML it is probably best to use MathJax, enabled in Pandoc with the --mathjax and --standalone flags.

I promised to show you how Markdown can be extended with arbitrary functionality using Pandoc. It is inevitable that I will get a bit more technical here, but the rewards are high.

Pandoc basics

Pandoc reads and writes documents of many formats. It does so by converting to and from a native intermediate representation.

We’ll create a document named last-theorem.md and enter the following:

We can see how Pandoc reads a document by running

which will give (slightly edited for readability)

Yes, this is Haskell syntax. It just means what you think it means. Using native output in Pandoc will be very useful if you start developing your own filters. For now it just serves to illustrates how Pandoc works. If you’re not a hacker, you’ll never have to look at this again 🤓👍.

Let’s create a PDF from this mathematics wizardry.

Resulting in a nice PDF rendering:

Typeset paper on a famous theorem

Pandoc filters

There is a big problem with the above example. The equation is not numbered! Pandoc filters let you change the intermediate representation. Let’s try the pandoc-eqnos filter. This filter is written in Python (using pypandoc); any language that can read the intermediate JSON representation works.

We’ll need to change the last-theorem.md document a bit. To get an equation numbered, add an id to the equation.

Also add a sentence to the end.

Now to invoke the equation numbering filter, add the --filter pandoc-eqnos argument to the command line.

Typeset paper on a famous theorem with numbered equations

Pandoc command lines can grow out of hand rather quickly. It is advisable to manage your Pandoc settings in a Bash script or Make file, whatever you prefer.

Lua filters

Pandoc has built-in support for filters written in Lua. Filters written in Lua are generally faster than those written in Python or other external languages. Lua filters forego documents being passed to an external program via JSON, but rather work directly on the abstract syntax tree as it is represented in Pandoc itself.

Let’s add a feature. Add the following to our budding math paper:

To parse this, Pandoc needs the extension fenced_divs enabled.

At the end of the output will be the expression:

Once we generate HTML from this example, we can add the proper CSS to the .warning class to change the looks of the warning. However, in the generated PDF there’s no change. Let’s make a filter that creates a nice coloured box in the LaTeX output. We define a filter that runs on all Div elements in the file warning-div.lua .

The filter checks if the div has class warning , if so, it adds a LaTeX RawBlock at the start and end of the div .

Running Pandoc with --lua-filter=warning-div.lua now converts the div element to a LaTeX string

This is not standard LaTeX, so we’ll need to define a macro in warning.tex

We can run Pandoc again to generate the PDF. The -H option can be used to include files into the generated output.

Did I mention Pandoc command lines tend to grow out of hand?

Typeset paper on a famous theorem with numbered equations and a warning

Skies and limits

Granted, to unlock the full power of Markdown for the web, you’ll need to know some HTML and CSS, and to tweak PDF output to your impossibly high standards you need to grok LaTeX. All the more reason to create an ecosystem of scripts, themes and tutorials to ease the learning curve. Also, code editors should offer better support for more varieties of markdown. I don’t mean cluttering up the editing experience with more distracting tool-tips, snippets etc. Just this: correct and efficient highlighting and proper outline support.

This was just a teaser of what is possible with Pandoc. Did I mention creating slide shows with reveal.js ? Or how about doing some literate programming with entangled (see my previous post on that)? The documentation of Pandoc is excellent, so just go ahead and write all your content in Markdown!

Published in Netherlands eScience Center

We’re an independent foundation with 80+ passionate people working together in the Netherlands’ national centre for academic research software.

Written by Johan Hidding

eScience research engineer at Netherlands eScience Center, astrophysicist, finding distraction in music, SF literature, computers and food

Responses (3)

What are your thoughts?

--

--

--