README.rst

Presentation

This is a parser for MediaWiki's (MW) syntax. It's goal is to transform wikitext into an abstract syntax tree (AST) and then render this AST into various formats such as plain text and HTML.

How it works

Two files, preprocessor.pijnu and mediawiki.pijnu describe the MW syntax using patterns that form a grammar. Another Python tool called Pijnu will interpret those grammars and use them to match the wikitext content and build the AST.

Then, specific Python functions will render the leaves of the AST into the wanted format.

The reason why we use two grammars is that we will first substitute the templates in the wikitext with a preprocessor before actually parsing the content of the page.

How to test

The current simplest way to test the tool is to put wikitext inside the wikitext.txt file. Then, run:

if you want to be able to distinguish between standard links, file inclusions or categories, list the namespaces of your wiki in the namespaces dict (e.g.: {'Template': 10, 'Category': 14, 'File': 6} where the numbers are the namespace codes used in MW)

Example for text

In order to use this tool to render wikitext into text in a Python program, you can use the following lines: