Export wiki pages to latex

Description

It would be very usefull to export wiki pages to Latex. A wiki is a great tool for brainstorming and planning. Latex is better for definitiv printed texts. It would be cool if I could use wiki pages as a starting point for Latex articles.

Change History (32)

I think this is an excellent idea, and intend to work on it myself as a patch for a system we are using to allow students to manage group software projects at the University of Sydney. Basically, a lot of their documentation in on the Wiki, but they need to generate a final report — ideally in LaTeX, but some use Word, which we try to discourage. An export to LaTeX would be a very convincing reason to (a) use LaTeX but also (b) maintain hyperlinked documentation on their wikis…

A lot of the Wiki markup makes no sense in LaTeX. It does "stuff" with most things though. I got it to a point where it works on trunk/trac/wiki/tests/wiki-tests.txt​. That is, without breaking Trac or latex (although one of the tests results in a 'too deeply nested' error, which you can just batchmode through). The result is ugly, but no uglier than the html that Trac makes by default.

I've done this against the SVN trunk, and the changes are very isolated, so I see little reason for this not to be bumped in.

This should be packaged as a plugin, and we would need to add
an interface extension point for exporting to alternate formats.

We should first discuss this, I think.

classIMIMETypeConverter:defget_supported_conversions():"""Yield tuples corresponding to the supported export formats:
Each tuple should be of the form `(key, name, in_mimetype, out_mimetype)`
e.g. ('latex', 'Wiki to LaTeX', 'text/x-trac-wiki', 'text/plain')
"""defconvert(self, content, mimetype, key):"""Perform the actual conversion of `content`.
The actual MIMEType is given in `mimetype` and the conversion mode
is the chosen `key`.
The result should be a `(converted_content, out_mimetype)` pair.
""""

With this, the !WikiModule could build a list of alternate download links
corresponding to the text/x-trac-wiki converters, and then perform
the conversion in a generic way.

Having this interface at the Mimeview level would enable to install
a similar mechanism for alternate download formats in the attachment view,
and in the repository browser view.

I've made an implementation based on your concept, and the one I added #1468. This seems pretty clean to me, opinions?

I think this could supplant the existing IHTMLPreviewRenderer interface as well, either removing it entirely (not good for backwards compatibility) or adding an adaptor using the new interface (probably a better idea) for IMIMETypeConverters that convert to text/html.

This could also be used for adding CSV export for ticket data, which gets requested a bit on the IrcChannel and the MailingList; text/x-trac-ticket to text/csv.

I've done some refinements to wikilatex.py​. To be honest, I am still essentially hacking — I'm not yet familiar with inner-workings of Trac. When I have some more time I'll delve in to the Trac code, cleanup this stuff and see how this should fit in with the proposed IMIMETypeConverter.

Some comments on the LaTeX converter, as it currently stands (with reference to WikiFormatting.pdf​):

I think the typeset stuff looks pretty good

but I'm biased — does anyone else have an opinion?

Keep in mind that the current idea is that each wiki page exported will probably be incorporated into some larger document

However, this should be an option flag to be passed somehow to the exporter

cgi request? LaTeX conversion staging page?

Hence:

Handling section/subsection/subsubesction

Currently the page name is put into the \section{} at the start, but convention might mean that this is not right

Maybe we should rely on the wiki page to have a =Heading= at the start, to be promoted to \section{}

Otherwise =h1= now maps to \subsection{}, ==h2== to \subsubsection{} and ≥ ===h3=== to \subsubsection*{}

Handling hyperlinks

since the Wiki is inherently hyperlinked, it makes sense to carry this over to the PDF

hence, hyperref is now part of the preamble, with some sensible options set

pdfauthor could be set to the login ID, perhaps (pass in context..)

Wiki links obviously cannot be resolved until the all-in-one document is generated (so they come up as \S{}??), but otherwise work, and are clickable thanks to hyperref

To dileneate links in the printed version, they are currently underlined (but this is easily changed by tweaking the \anchortext command in the preamble)

If we _know_ hyperref is going to be used (currently it does not rely on any specials in the hyperref package — it just overrides builtins to make links in the PDF), then the \anchortext{} command could be adjusted to accept an [optional] argument to make the actual anchortext an active link (but we should probably still do the footnote method for the printed version)

OK, so this is meant to be part of a larger document, so there should only be one preamble

But it's easy to delete a preamble (harder to make up your own), and this should help latex n00bs that just want a PDF

If this later goes the route required for #2207, individual preambles can probably be stripped automatically

line separation of list items

the default in LaTeX has quite a large spacing between list items, which is maybe not what people expect

this can be adjusted in the preamble..

Tickets

I've written a nice way of showing tickets for Trac v0.8 (which is what is running on a legacy system for students at my university), but the way tickets are handled in the formatter has changed quite drastically to v0.9

So this hasn't yet been implemented in the attached wikilatex.py (and acutally causes some unicode freakout that I don't fully understand at this point)

Same goes for reports and revision logs

Images

obviously, the Image cannot be embedded in the LaTeX, but we can make a figure float for .{png,jpeg,gif,etc} links (not yet implemented)

this required the image to be downloaded separately and put somewhere that pdflatex can find it (I wouldn't recommend /usr/bin/latex or dvipdfm because they want images in eps format, and would need to be converted)

At some point there will need to be a 'LaTeX' flavour of the Image macro

Quotes

double quote characters (") are TeXified? to `` or '', according to a regex

Oneliners

not yet properly implemented, but probably similar to the method for Wiki/HTML

Tables

Tables in LaTeX are represented radically different to HTML (and are crap, quite frankly)

e.g. you need a column desciption before the table starts, with number of columns, etc.

Perhaps an opportunity to test the MIME detection/flavours for Macros..

But the suggested solution on TracHacks is not very clean at the moment

Testing

This probably needs a test suite, which would be markedly different to the one for the HTML Wiki, testing handling of LaTeX specials, etc.

Unicode

LaTeX does not like unicode, the preferred way to do &eacute;, for example is \'{e} , but there is no solid mapping between unicode and LaTeX escapes like this — currently if unicode is encountered, a exception is raised (not deliberately.. some other bug is at work here), and the line is output as <bad unicode on this line> or similar

Trent: I've looked at the WikiFormatting.pdf (despite of the difficulty
to access it, because of the #2974 issue…) and it looks promising.
However the current approach is a bit heavy-weight, as you are forced to
reimplement most of the Formatter logic. It would be much better if
there would be a better separation of the parsing/formatting methods,
within the Formatter class. E.g. the _xxx_formatter could be renamed
_parse_xxx and would call format_this or format_that as appropriate.

OK, I've implemented it as a plugin (which can probably migrate to trac-hacks), so this ticket can probably be closed. But there are some other issues that implementing this has highlighted. These issues may have arisen because this 'plugin' is still in essence a hack, and they come in light of Christian's comment about the duplicated functionality in the latex formatter vs the html formatter that comes with Trac.

The change from 0.8/0.9 to 0.10 that puts HTML formatting for tickets, changesets, etc. into the extension architecture — ExtensionPoint?(IWikiSyntaxProvider) and friends — means that I can no longer hook into the regex bindings for these the way I could for the versions of wikilatex implemented for 0.8 and 0.9. So, for now, I have just disabled them (maybe this is not such an issue for LaTeX, but it would be nice to have a footnote with the ticket description or changeset comment, as I did for 0.8). That is, I am no longer using wiki.rules, but generating my own that don't inlcude those inserted by extensions (which includes ticket, changeset, etc.):

If we want the Formatter logic to be reusable for extensions, then there might need to be a clean way of overriding these or hooking into the functionality (maybe there is and I'm missing it). In any case, it quickly becomes messy because there is no way to anticipate which extensions have been overridden and will start trying to feed 'Element' objects to the processor, rather than strings. For wikilatex to work reliably, all of these would have to be overridden. Also, as Christian points out, the _parsing_ logic in Formatter​ should be able to be reused. Maybe this should be a new ticket. On that, after all this hacking, I am really starting to hate regex. I would suggest a recursive descent parser to generate a nice parse tree that could be passed to these content converters, but I don't think the grammar is context-free so this would have issues.

So where does this ticket stand?

With the current API, I don't think that Formatter can be reused, and for this plugin to remain maintainable, I feel as though I would have to write all the parsing logic from scratch. That's fair enough, I suppose, because inheriting from Formatter is what makes this a hack rather than a plugin, because I don't think the Formatter is an official API. So, perhaps if I run out of other work to do or feel like procrastinating, wikilatex will become the first attempt at implementing the wiki formatter as a recursive descent parser that can later be incorporated into the core to allow the parse tree used by other output generators (but I wouldn't be so presumptuous to suggest that it be used for the main HTML Formatter, which may be more suited to the regex implementation).

Making the Wiki parser reusable by separating the parsing and formatting
steps, and using a recursive descent parser instead of a regexp-based engine
are two different things. The former can be achieved without the latter,
and I have the feeling that getting away from regexps will be bad in terms
of performance (see ​Trac-Dev:316 and the following DrProject's ​blog entry),
and make it less flexible for introducing new constructions
and being extensible by plugins.

Also, a Wiki engine is different than a parser for a programming language,
as it parses text meant to be read by humans ;)

it is available in binary form for Windows and Ubuntu Linux. Furthermore the source code is available under GPL 2. It works form the client side, and does not require any changes on the installation on the servers. So you essentially got the requested feature now.

change ownership
to
The owner will be changed from athomas to the specified user.

Add Comment

This ticket has been modified since you started editing. You should review the
other modifications which have been appended above,
and any conflicts shown in the preview below.
You can nevertheless proceed and submit your changes if you wish so.