Outputting Markdown from Jekyll using hooks
Jekyll hooks provide enough flexibility to modify posts before output.
One of the things I most enjoy about Jekyll1 is writing my blog posts in Markdown. I love not worrying about HTML and just letting Jekyll generate it for me when a post is published. Using Liquid tags directly in Markdown is also helpful, as I can define sitewide or page-specific variables and then replace them during site generation. This is a really useful capability that I wanted to take advantage of to output Markdown for use in other sites. Places like Medium and dev.to allow you to post Markdown articles, so I thought repurposing the Markdown I used in Jekyll would make crossposting to those sites easier.
I assumed that there would be a property on the page
variable that would give me access to the rendered Markdown, but I was wrong. This began a long journey through relatively undocumented corners of the Jekyll plugin ecosystem. I’m sharing that journey here in the hopes that others won’t have to go through the same frustrating experience.
An introduction to Jekyll plugins
I was relatively unfamiliar with the Jekyll plugin system before trying to figure out how to get the rendered Markdown for a post. Jekyll supports a number of different plugin types2. These plugin types affect Jekyll directly:
- Generators - plugins that create files. While it’s not required that generators create files, this is the most frequent use case. Plugins like
jekyll-archives
use generators to create files that wouldn’t otherwise exist. - Converters - plugins that convert between text formats. Jekyll’s default of converting Markdown files into HTML is implemented using a converter. You can add support for other formats to be converted into HTML (or any other format) by creating your own converter.
- Hooks - plugins that listen for specific events in Jekyll and then perform some action in relation to the events.
There are also two plugin types that are used primarily with Liquid:
- Tags - create a new tag in the format
{% tagname %}
for use in your templates. - Filters - create a new filter that you can use to transform input, such as
{{ data | filter }}
There is also a command plugin type that allows you to create new commands to use with the jekyll
command line tool. The jekyll build
command is implemented using this plugin type.
Designing the solution
My goal was to get the Liquid-rendered Markdown content (so all data processing was complete) for each post into a page
property so that I could output that content into a JSON file on my server. That JSON file would then be fetched by an AWS Lambda function that crossposted that content to other locations. I didn’t need to generate any extra files or convert from one format to another, so it seemed clear that using a hook would be the best approach.
Hooks are basically event handlers in which you specify a container and the event to listen to. Jekyll passes you the relevant information for that event and you can then perform any action you’d like. There are four containers you can use with hooks:
:site
- thesite
object:page
- thepage
object for each non-collection page:post
- thepost
object for each blog post:document
- thedocument
object for each document in each collection (including blog posts and custom collections)
Because I wanted this solution to work for all collections in my site, I chose to use :document
as an easy way to make the same change for all collection types.
There were two events for :document
that immediately seemed relevant to my goal:
:pre_render
- fires before the document content is rendered:post_render
- fires after the document content is rendered but before the content is written to disk
It seemed clear that getting the Markdown content would require using the :pre_render
event, so I started by using this setup:
Jekyll::Hooks.register :documents, :pre_render do |doc, payload|
# code goes here
end
Each hook is passed its target container object, in this case doc
is a document, and a payload
object containing all of the relevant variables for working with the document (these are the variables available inside of a template when the document is rendered).
The :document, :prerender
hook is called just before each document is rendered, meaning you don’t need to worry about looping over collections manually.
The catch: Rendering doesn’t mean what you think it means
I figured that the content
property inside of a :document, :pre_render
hook would contain the Liquid-rendered Markdown instead of the final HTML, but I was only half correct. The content
property actually contains the unprocessed Markdown that still contains all of the Liquid variables. It turns out that “prerender” means something different than I thought.
The lifecycle of the content
property for a given document in Jekyll looks like this3:
- The
content
property contains the file content with the front matter removed (typically Markdown) :pre_render
hook fires- The
content
property is rewritten with Liquid tags rendered (this is what Jekyll internally refers to as rendering) - The
content
property is rewritten after being passed through a converter (this is what Jekyll internally refers to as converting) - The
:post_render
hook fires
While Jekyll internally separates rendering (which is apply Liquid) and converting (the Markdown to HTML conversion, for example), the exposed hooks don’t make this distinction. That means if I want Markdown content with Liquid variables replaced then I’ll need to get the prerendered Markdown content and render it myself.
The solution
At this point, my plan was to create a pre_render
hook that did the following:
- Retrieved the raw content for each document (contained in
doc.content
) - Render that content using Liquid
- Store the result in a new property called
unconverted_content
that would be accessible inside my templates
I started out with this basic idea of how things should look:
Jekyll::Hooks.register :documents, :pre_render do |doc, payload|
# get the raw content
raw_content = doc.content
# do something to that raw content
rendered_content = doSomethingTo(raw_content)
# store it back on the document
doc.rendered_content = rendered_content
end
Of course, I’m not very familiar with Ruby, so it turned out this wouldn’t work quite the way I thought.
First, doc
is an instance of a class, and you cannot arbitrarily add new properties to objects in Ruby. Jekyll provides a data
hash on the document object, however, that can be used to add new properties that are available in templates. So the last line needs to be rewritten:
Jekyll::Hooks.register :documents, :pre_render do |doc, payload|
# get the raw content
raw_content = doc.content
# do something to that raw content
rendered_content = doSomethingTo(raw_content)
# store it back on the document
doc.data['rendered_content'] = rendered_content
end
The last line ensures that page.rendered_content
will be available inside of templates later on (and remember, this is happening during pre_render
, so the templates haven’t yet been used).
The next step was to use Liquid to render the raw content. To figure out how to do this, I had to dig around in the Jekyll source4 as there wasn’t any documentation. Rendering Liquid the exact same way that Jekyll does by default requires a bit of setup and pulling in pieces of data from a couple different places. Here is the final code:
Jekyll::Hooks.register :documents, :pre_render do |doc, payload|
# make some local variables for convenience
site = doc.site
liquid_options = site.config["liquid"]
# create a template object
template = site.liquid_renderer.file(doc.path).parse(doc.content)
# the render method expects this information
info = {
:registers => { :site => site, :page => payload['page'] },
:strict_filters => liquid_options["strict_filters"],
:strict_variables => liquid_options["strict_variables"],
}
# render the content into a new property
doc.data['rendered_content'] = template.render!(payload, info)
end
The first step in this hook is to create a Liquid template object. While you can do this directly using Liquid::Template
, Jekyll caches Liquid templates internally when using site.liquid_renderer.file(doc.path)
, so it makes sense to use that keep the Jekyll build as fast as possible. The content is then parsed into a template object.
The template.render()
method needs not only the payload
object but also some additional information. The info
hash passes in registers
, which are local variables accessible inside of the template, and some options for how liquid should behave. With all of that data ready, the content is rendered into the new property.
This file then needs to be placed in the _plugins
directory of a Jekyll site to run each time the site is built.
Accessing rendered_content
With this plugin installed, the Markdown content is available through the rendered_content
property, like this:
{{ page.rendered_content }}
The only problem is that outputting page.rendered_content
into a Markdown page will cause all of that Markdown to be converted into HTML. (Remember, Jekyll internally renders Liquid first and then the result is converted into HTML.) So in order to output the raw Markdown, you’ll need to either apply a filter that prevents the Markdown-to-HTML conversion from happening, or use a file type that doesn’t convert automatically.
In my case, I’m storing the Markdown content in a JSON structure, so I’m using the jsonify
filter, like this:
---
layout: null
---
{
{% assign post = site.posts.first %}
"id": "{{ post.url | absolute_url | sha1 }}",
"title": {{ post.title | jsonify }},
"date_published": "{{ post.date | date_to_xmlschema }}",
"date_published_pretty": "{{ post.date | date: "%B %-d, %Y" }}",
"summary": {{ post.excerpt | strip_html | strip_newlines | jsonify }},
"content_markdown": {{ post.rendered_content | jsonify }},
"content_html": {{ post.content | jsonify }},
"tags": {{ post.tags | jsonify }},
"url": "{{ post.url | absolute_url }}"
}
Another option is to create a rendered_content.txt
file in the _includes
directory that just contains this:
{{ page.rendered_content }}
Then, you can include that file anywhere you want the unconverted Markdown content, like this:
{% include "rendered_content.txt" %}
Conclusion
Jekyll hooks are a useful feature that let you interact with Jekyll while it is generating your site, allowing you to intercept, modify, and add data along the way. While there aren’t a lot of examples in the wild, the concept is straightforward enough that with a few pointers, any programmer should be able to get something working. The biggest stumbling point for me was the lack of documentation on how to use Jekyll hooks, so I’m hoping that this writeup will help others who are trying to accomplish similar tasks in their Jekyll sites.
To date, I’ve found Jekyll to be extremely versatile and customizable. Being able to get the Liquid-rendered Markdown (even though it took a bit of work) has made my publishing workflow much more flexible, as I’m now more easily able to crosspost my writing on various other sites.