By Md. Sabuj Sarker | 1/16/2018 | General |Beginners

Custom Markdown Parsing With Mistune and Python

Custom Markdown Parsing With Mistune and Python

Markdown is the preferred way of writing rich documents for many. Developers like it the most. StackOverflow uses it as the format for writing questions and answers. It is is everywhere on Github, Gitlab and on many other online platforms and is the preferred way of writing content in Static Site Generators. You can even use it with WordPress. Writing markdown is easy, fun and productive. It helps make the document written in it portable and convertible to any other document formats including HTML, PDF, DOC, etc.

So, as a developer or a programmer if you want to integrate something with a system you’ve developed, it is not just enough to use a converter and convert markdown files or text to HTML or another document format. You may want to add custom styles, custom classes or use some kind of transformation. You may want to construct full URLs from the URLs specified inside the markdown with the markdown notation for URLs. For all those things and proper conversion and transformation of various aspects, you need to have control over parsing and document generation.

Mistune is a Python library with which gives you control over the parsing of markdown and control over HTML generation.

Prerequisites

This is not a beginner level article for Python programmers, but you can say that it is beginner level article for mistune. To go with this article you should already have good knowledge of markdown. You also should have knowledge of using the command line on your system.

Preparing Your Environment

Make sure that you have Python installed on your system. Create or choose a directory where you want to put your python scripts for shown here. Create a file named mist_md.py in your chosen directory. Run the following command by opening the command line in the chosen directory to see if everything is alright.

python mist_md.py

Or,

python3 mist_md.py

Installing Mistune

Installing mistune is easy as running a pip command. Run the following command to install it:

pip install mistune

or,

pip3 install mistune

A Sample Markdown Formatted Document

To work with mistune we need a sample markdown document. We are not going to keep that in a file, instead we are going to keep that in a multi-line Python string. Let's try out the following piece of markdown.

# This is a first level header
This is a short one line paragraph.

Here is a [link](http://example.com)

Here is an image: ![](image.png)

Converting the Markdown to HTML

To work with mistune you need to import the library first. And then you have to invoke the markdown() function from the mistune module.

import mistune

md_str = """
# This is a first level header
This is a short one line paragraph.

Here is a [link](http://example.com)

Here is an image: ![](image.png)
"""

html_str = mistune.markdown(md_str)

print(html_str)

It will provide the following output.

<h1>This is a first level header</h1>
<p>This is a short one line paragraph.</p>
<p>Here is a <a href="http://example.com">link</a></p>
<p>Here is an image: <img src="image.png" alt=""></p>

So, it just provides a plain HTML output with no extra information. What if we want to add some extra style with the style attribute on an h1 tag? What if we want to make the image responsive with a bootstrap class? What if we want to check whether the anchor tags' links start with https:// and if not then to add it? What if we want to convert every relative URL to an absolute URL by adding a domain?

For all those things we need to intercept the HTML generation of markdown.

Creating Custom Renderer

In mistune, renderers are responsible for generating HTML. It has a default renderer that provides that plain old boring HTML. We want to add spice to it. Custom renderers need to be created by inheriting the Renderer class available in the mistune module. An object of the renderer class should be passed to the Markdown class with a keyword argument renderer. The object is a callable and when you pass a markdown string to it, it will generate HTML for you. Let's not provide any functionality to our class for now and just skip the class definition with a pass keyword.

import mistune

md_str = """
# This is a first level header
This is a short one line paragraph.

Here is a [link](http://example.com)

Here is an image: ![](image.png)
"""

class MyCustomRenderer(mistune.Renderer):
   pass



renderer = MyCustomRenderer()
markdown = mistune.Markdown(renderer=renderer)
html_str = markdown(md_str)

print(html_str)

Running it will provide you the same old output as no default behavior was changed. Let's add some custom functionality to it.

To provide custom rendering you will have to override some methods of the renderer class. Markdown has block level and span elements and mistune have relevant methods in renderers for them.

Block level methods that need to be overridden, if needed, are listed below:

block_code(code, language=None)
block_quote(text)
block_html(html)
header(text, level, raw=None)
hrule()
list(body, ordered=True)
list_item(text)
paragraph(text)
table(header, body)
table_row(content)
table_cell(content, **flags)

Span level method names:

autolink(link, is_email=False)
codespan(text)
double_emphasis(text)
emphasis(text)
image(src, title, alt_text)
linebreak()
newline()
link(link, title, content)
strikethrough(text)
text(text)
inline_html(text)

We are not interested in all of them. We are interested in the following method names.

header(text, level, raw=None)
image(src, title, alt_text)
link(link, title, content)

All these methods need to return an HTML string.

Let's code for it.

import mistune

md_str = """
# This is a first level header
This is a short one line paragraph.

Here is a with no-https [link](http://example.com)
Here is a relative link [link](index.html)

Here is an image: ![an image](image.png)
"""

class MyCustomRenderer(mistune.Renderer):
   def header(self, text, level, raw=None):
       return "<h%s class='my-cusom-header-cls'>%s</h%s>" % (level, text, level)
   def image(self, src, title, alt_text):
       return "<img src='%s' alt='%s' class='img-responsive'>" % (src, alt_text)
   def link(self, link, title, content):
       if link.lower().startswith('http://'):
           link = 'https://' + link[len('http://'):]
       elif link.lower().startswith('https://'):
           pass # we do not need any processing here
       else:
           # so it is a relative link.
           # We can add a domain in front of it.
           link = link.lstrip('/')
           link = 'https://example.com/' + link
       return "<a href='%s' title='%s'>%s</a>" % (link, title, content)


renderer = MyCustomRenderer()
markdown = mistune.Markdown(renderer=renderer)
html_str = markdown(md_str)

print(html_str)

Note: I have added an extra bit of markdown in the above code. Do not get confused for that.

Outputs:

<h1 class='my-cusom-header-cls'>This is a first level header</h1><p>This is a short one line paragraph.</p>
<p>Here is a with no-https <a href='https://example.com' title='Link title 1'>link 1</a>
Here is a relative link <a href='https://example.com/index.html' title='Link title 1'>link 2</a></p>
<p>Here is an image: <img src='image.png' alt='an image' class='img-responsive'></p>

Look at the header and see that our custom class called my-custom-header-cls has been added.

<h1 class='my-cusom-header-cls'>This is a first level header</h1>

Our first link was non https and it is now converted to https from http.

<a href='https://example.com' title='Link title 1'>link 1</a>

The second link is converted to an absolute URL.

The image got a bootstrap responsive class to make it responsive.

<img src='image.png' alt='an image' class='img-responsive'>

Try out other methods yourself. If you face any difficulty let us know in the comments below and I will get back to you soon.

Conclusion

Mistune is a great library, though it has some shortcomings that I hope will be improved in future release of it. If you are using Python for your system development and you need custom markdown to HTML output then this will be a great library to have in your toolbox.

By Md. Sabuj Sarker | 1/16/2018 | General

{{CommentsModel.TotalCount}} Comments

Your Comment

{{CommentsModel.Message}}

Recent Stories

Top DiscoverSDK Experts

User photo
3355
Ashton Torrence
Web and Windows developer
GUI | Web and 11 more
View Profile
User photo
3220
Mendy Bennett
Experienced with Ad network & Ad servers.
Mobile | Ad Networks and 1 more
View Profile
User photo
3060
Karen Fitzgerald
7 years in Cross-Platform development.
Mobile | Cross Platform Frameworks
View Profile
Show All
X

Compare Products

Select up to three two products to compare by clicking on the compare icon () of each product.

{{compareToolModel.Error}}

Now comparing:

{{product.ProductName | createSubstring:25}} X
Compare Now