Archive for the ‘Professional’ Category

Emacs, of course, has always described itself as self-documenting because it has an integrated documentation system. With my new creation, lenticular text, (http://www.russet.org.uk/blog/3035) though the world changes slightly. I can now write rich documentation embedded into the code base of my package, and can do this without comprimising either the documentation or coding environment.

For the 0.7 release of lentic (http://www.russet.org.uk/blog/3047), I added support for generating this documentation automatically on the fly from the source code. This can be viewed either internally in Emacs with EWW, or externally with a browser. In the latter case, I have used org-info.js (http://orgmode.org/worg/code/org-info-js/) which gives an info-like and navigable experience to the manual.

As yet, this process is not perfect, and the output is not ideal, but it still seems to me to be very nice. It is certainly something that I wanted to use for m-buffer (http://github.com/phillord/m-buffer-el). So I’ve extended lentic so that its documentation system can work for any package. Much to my amazement, it’s already is use by others (http://github.com/tumashu/chinese-pyim), which has to be a good sign after such a short time span.

The main issue in making lentic-doc generic is working out how to compile the completed org-mode files into HTML. I had already considered trying to the “main” file for a package (so, lentic.el for lentic or m-buffer.el for m-buffer). But, as I discovered with lentic this does not work that well; instead I use a single top level file which imports everything else. To enable this lentic-doc uses the following technique: first it looks for a variable called package-doc describing how to compile the package documentation; if that fails, lentic looks for a file called package-doc.org in which case that is used. The main entry points are two functions lentic-doc-external-view or lentic-doc-eww-view which show the documentation in the respective browser.

There has been a lot of discussion about documentation format on the Emacs-devel mailing list recently. One common complaint was that texinfo 5 is rather too slow, taking a considerable time to compile for Emacs. Now, at first sight it would appear that moving to a lentic/org mode solution would make this worse, as the combination of the two will be much slower than even texinfo. However, lentic and org have one advantage over texinfo — they are entirely implemented in Emacs lisp, so anyone using Emacs also has the ability to generate the documentation from source; this means that the documentation can be built up lazily, on-the-fly as it is requested.

This works straighforwardly for a single package at a time — on request Emacs generates the documentation for lentic or m-buffer and displays. This still leaves the problem of cross-links between the different packages, of course. If I want to hyperlink from lentic to m-buffer, then I need to ensure that the m-buffer documentation is generated. Not so easy if the documentation is being viewed inside a browser.

Fortunately, Emacs provides a solution. Actually, two, as there are now two packages implementing a webserver for Emacs. By using an Emacs web server, I can handle any requests for package documentation, generating them on-the-fly if necessary. It turns out to be relatively simple too, as you can see here: https://github.com/phillord/lentic-server.

Of course, lentic-server is just a work in progress and far from complete, but it is pleasing none the less. The use of lentic has allowed me to tie together Org and Emacs-lisp, and to make rich documentation environment from technology that already exists.

Bibliography

I have just released m-buffer version 0.10. A pretty minor point release, but I thought I would post about it, as I have not done so far.

Interacting with the text in an Emacs buffer can be a little bit painful at times. The main issue is that Emacs makes heavy use of global state its functions, something which has just rather gone out of fashion since Emacs was designed, and probably for good reason. I know that I am not the only to find this painful (http://www.wilfred.me.uk/blog/2013/03/31/essential-elisp-libraries). Perhaps, most obvious, is that many functions work on the current-buffer buffer, or at point. As a result, Emacs code tends to have lots of save-excursion and with-current-buffer calls sprinkled throughout.

We see the same thing in the regexp matching code. So for instance, the Emacs Lisp manual suggests the following:

to replace all matches in a buffer. As you should be able to see, the call to replace-match contains no reference to what match is to be replaced. This is passed through the use of the global match data. Again, this made managable through the use of macros to save and restore the global state, in this case save-match-data. So, you end up with:

If you want to see this taken to extreme, have a look at the definition of m-buffer-match which is a good demonstration of the problem to which it is a solution.

The use of while is also less than ideal. It may not be obvious, but the code above contains a potential non-termination. For instance, the following should print out the position of every end-of-line in the buffer. Try it in an Emacs you don’t want.

(while (re-search-forward "$" nil t)(message "point at: %s"(point)))

It fails (and hangs) because the regexp $ is zero-width in the length. So, the re-search-forward moves point to just in front of the first new line, reports point, and then never moves again. This problem is a real one, and happens in real code.

m-buffer has been written for ease of use and not performance. So, by default, it uses markers everywhere which can potentially cause slowdowns in Emacs.I have started to add some functions for dealing with markers which need explicitly clearing explicitly; a traditional use for macros in lisp of closing existing resources. So

leaves the buffer free of stray markers. The m-buffer-with-markers macro behaves like let* with cleanup.

I am now working on stateless functions for point — so m-buffer-at-point does the same as point but needs a buffer as an argument. Most of the code is trivial in this case, it just needs writing.

I am rather biased, but I like m-buffer very much. It massively simplified the development of lentic (http://www.russet.org.uk/blog/3047). Writing that with embedded while loops would have been painful in the extreme. I think I saved more time in using m-buffer than it took to write it. I think it could simplify many parts of Emacs development significantly. I hope others think so to.

Bibliography

With a previous release of lentic (http://www.russet.org.uk/blog/3035), I got a couple of suggestions. One of which was a complaint that it was hard to get going, because lentic lacks documentation. This is a bit unfortunate. Lentic actually did have documentation but it is hidden away as comments in source code; although, it’s not specific to it, I wanted lentic to enable literate programming and it uses itself to document itself.

Now, this makes perfect sense, but there is a problem. The end-user form of the documentation needs generating from the source code. This is true in general in Emacs land, although the standard form is texinfo. The usual solution to this problem is to generate the documentation up during the packaging process.

This should work, but it doesn’t. Or, rather, it does not work in all cases. For an archive such as Marmalade, it is entirely possible. But for MELPA it fails. The problem is that MELPA works directly from a git checkout. My documentation, of course, is not source but generated. Now MELPA has support for the generation of info from texinfo. But my source is Emacs Lisp and I need to use Lentic to generate a readable form.

One solution would, of course, be not to use MELPA. Nic Ferrier recently argued on the emacs-devel mailing list that the idea was fundamentally broken — a package is something that the developers should generate and publish as with Java or Clojure . He makes a good point, and one that is correct. Moving to Marmalade would solve this problems; after Nic’s work it is largely stable, so this was definately an option.

However, I like MELPA (although I have only used it since stable came out). It is nice that it uses what I am doing anyway (tagging, pushing, so forth). And I like the download stats. So I talked with the MELPA folks but, entirely reasonably, they did not want to add specific support to MELPA for this. Nor support for, for example, downloading the source from somewhere else.

Other possibilities did raise themselves; I could just check in my documentation. But my documentation depends on my source, so pretty much every commit would require also require a documentation commit. Not nice. I thought about adding the documentation as an independent package. Then my documentation commits would be in a repo with nothing else; but this hassles the user, even if it auto-installs. And I’d need different packages for MELPA and Marmalade. So, I was left with no good solution.

At the same time, as all of this, I was working on the documentation, generating Org files from my lisp documentation, then converting that to info. This sort of worked, but not nicely. A significant problem was that something in the toolchain did not like having multiple sections with the same names and I have a lot of these (“Commentary, Header, Code”). I have not tracked down yet whether this is a problem with Org’s texinfo export, texinfo itself or info, nor am I sure it would be worth the effort.

Instead, I decided to try HTML output. This worked quite nicely; I use a single Org driver file (called lenticular.org) and imported all the generated org files from here. I also found org-info which I had not seen before — this is Javascript which gives an Info like experience — next, previous, occur, search and so on. It’s imperfect, but pretty usable, and gives a quite nice documentation experience. It’s also possible to view in EWW although there is no Info like paging here.

Dropping info has one other big advantage — my tool chain for generating the documentation is now entirely in Emacs. So, my source code is now enough, because lentic can generate it’s own documentation on-demand after installation. The first time the user requests the documentation either in EWW or a browser, lentic generates org files from it’s own source and then the HTML (http://homepages.cs.ncl.ac.uk/phillip.lord/lentic/lenticular.html). The only limitation is that this forces the requirement for a recent Emacs version, since the org mode exporter framework has just been updated; unfortunate but acceptable for a 0.x release.

Not all problems disappear. Because my documentation fits into the Emacs-Lisp commenting standards, it is not structured-ideally for info. For instance, the headers of all the lentic Emacs-Lisp files are included. I also would like to extend org-info so I can switch the “Code” sections on and off (embedded literate sources are useful, but not for everyone). It will need some work on Lentic, and probably also the org-mode HTML exporter.

But, then, neither is it that far off. It is good enough and a bit advance on the previous situation. Perhaps, too, it demonstrates a future for Emacs documentation in general as well, with Info replaced with HTML.

The new release (http://www.russet.org.uk/blog/3047) of Lentic is now available on Marmalade and MELPA, complete with documentation avaialble from the menu. Please feel free to try both lentic and its documentation system out now.

Bibliography

Lentic is an Emacs mode which supports multiple views over the same text. This can be used for a form of literate programming. It has specific support for Clojure which it can combine with either LaTeX, Asciidoc or Org-Mode.

Two lentic buffers, by default, the two share content but are otherwise independent. Therefore, you can have two buffers open, each showing the content in different modes; to switch modes, you simply switch buffers. The content, location of point, and view are shared.

However, lentic also allows a bi-directional transformation between lentic buffers — the buffers can have different but related text. This allows, for example, one buffer to contain an Emacs lisp file, while the other contains the same text but with “;;” comment characters removed leaving the content in org-mode, enabling a form of literate Emacs-Lisp programming with no change to either org-mode or Emacs-Lisp. Ready made transformations are also available for Clojure, latex and asciidoc.

Lentic is both configurable and extensible, using the EIEIO object system.

Lentic was previously known as Linked-Buffers.

The 0.7 release adds an integrated documentation system, support for Haskell, LaTeX literate programming and best of all, a ROT-13 transformation.

Like many developers I often edit both code and documentation describing that code. There are many systems for doing this; these can be split into code-centric systems like Javadoc which allow commenting of code, or document-centric systems like Markdown which allow interspersing code in documentation. Both of these have the fundamental problem that by focusing on one task, they offer a poor environment for the other.

In this article, I introduce my solution which I call lenticular text. In essence, this provides two syntactic views over the same piece of text. One view is code-centric, one document-centric but neither has primacy over the other, and the author can edit either as they choose, even viewing both at the same time. I have now implemented a useful version of lenticular text for Emacs, but the idea is not specific to a particular editor and should work in general. Lenticular text provides a new approach to editing text and code simultaneously.

The need for Literate Programming

The idea of mixed documentation and code goes back a long way; probably the best known proponent has been Donald Knuth in the form of literate programming. The idea is that a single source document is written containing both the code and the documentation source; these two forms are then tangled to produce the real source code that is then compiled into a runnable application or readable documentation.

The key point from literate programming is the lack of primacy between either the documentation or the code; the two are both important, and can be viewed one besides the other. Tools like Javadoc are sometimes seen as examples of literate programming, but this is not really true. Javadoc is really nothing more than an docstring from Lisp (or perl or python), but using comments instead of strings. While Javadoc is a lovely tool (and one of the nicest parts of Java in my opinion) it is a code-centric view. It is possible to write long explanations in Javadoc, but people rarely do.

There are some good examples of projects using literate programming in the wild, including mathematical software Axiom or even a book on graphics. And there are a few languages which explicitly support it: TeX and LaTeX perhaps unsurprisingly do, and Haskell has two syntaxes one which explicitly marks comments (like most languages) and the other which explicity marks code.

These examples though are really the exception that proves the rule. In reality, literate programming has never really taken off. My belief is that one reason for this is the editing environments; it is this that I consider next.

Editing in Modes

When non-programmers hear about a text-editor they normally think of limited tools like Notepad; an application which does essentially nothing other than record what you type. For the programmer, though, a text-editor is a far cry from this. Whether the programmer prefers a small and sleek tool like Vim, the Swiss-Army knife that is Emacs, or a specialized IDE like Eclipse, these tools do a lot for their users.

The key feature of all modern editors is that they are syntax-aware; while they offer some general functions for just changing text, mostly they understand at least some part of the structure of that text. At a minimum, this allows syntax highlighting (I guess there a still a few programmers who claim that colours just confuse them, but I haven’t met one for years). In general, though, it also allows intelligent indentation, linting, error-checking, compiling, interaction with a REPL, debugging and so on.

Of course, this would result in massive complexity if all of these functions where available all of the time. So, to control this, most text editors are modal. Only the tools relevant to the current syntax are available at any time; so, tools for code are present when editing code, tools for documentation when editing documentation.

Now, this presents a problem for literate programming, because it is not clear which syntax we should support when these syntaxes are mixed. In general, most editors only deal with mixed syntax by ignoring one syntax, or at most treating it conservatively. So, for example, AucTeX (the Emacs mode for LaTeX) supports embedded code snippets: in this case, the code is not syntax highlighted (i.e the syntax is ignored) and indentation function preserves any existing indentation (i.e. AucTeX does not indent code correctly, but at least does not break existing indentation). We could expand our editor to support both syntaxes at once. And there are examples of this. For instance, AucTeX (the TeX editor) allows both the code and the documentation to be edited with its DocTeX support — this is slightly cheating though, as both the code and the documentation are in the same language. Or, alternatively, Emacs org-mode will syntax highlight code snippets, and can extract them from the document-centric view, to edit them and then drop them back again.

The problem here is that supporting both syntaxes at once is difficult to do, particulary in a text editor which must also deal with partly written syntax, and may not always follow a grammar.

Literate Workflows

As well as the editor, most of the tools of the programmer are syntax aware also. Something has to read the code, compile the documentation and display the results. For a literate document, there are two approaches to dealing with this. The first is the original approach, which is to use a tool which splits the combined source form into the source for the code and the documentation. This has the most flexibility but it causes problems: if your code fails to compile, the line numbers in any errors come out in the wrong place. And source-level debuggers will generally work on the generated source code, rather than the real source.

The second approach is to use a single source form that all the tools can interpret. This works with Haskell for instance: code is explicitly marked up, and the Haskell compiler ignores the rest as comments. This is fine for Haskell, of course, but not all languages do support this. In my case, I wanted a literate form of Clojure and, while I tried hard, it is just not possible to add in any sensible way (http://www.russet.org.uk/blog/2979).

A final approach is to embed documentation in comments; Javadoc takes this approach. Or for a freer form of documentation, there are tools like Marginalia. As a general approach it seems fine at first sight, although I had some specific issues with Marginalia (chiefly, that it is markdown specific and is relatively poor environment for writing documentation). But in use, it leads right back to the problem of a modal editing: Emacs’ Clojure mode does not support it so, for example, refilling a marked-up list ignores the markup and the list items get forced into a single paragraph.

Lenticular Text

My new approach is to use Lenticular text. This is named after lenticular printing, which produces images that change depending on the angle at which they are viewed. It combines approaches from all three of these literate workflows. We take a single piece of semantics and give it two different syntactic views. Consider the “Hello World” literate code snippet written in LaTeX and Clojure (missing the LaTeX pre-amble for simplicity).

Now, if this were Haskell, we could stop, as code environment is one of those that the compiler recognises. But Clojure does not; we cannot load this file into Clojure validly. And as it is not real Clojure, any syntax aware editor is unlikely to do sensible things. We could instead use a code-centric view instead.

This is now valid clojure but now LaTeX breaks. Actually, with a little provocation, it can be made valid LaTeX as well (http://www.russet.org.uk/blog/2979), but my toolchain does not fully understand LaTeX (nothing does other than LaTeX, I suspect), so again we are left with the editor not doing sensible things.

These two syntaxes, though, are very similar and there is a defined relationship between the two of them (comment/uncomment every line that is not inside a code environment). It is clearly possible to transform one into the other with a simple syntactic transformation. We could do this with a batch processing tool, but this would fail its purpose because one form or the other would maintain its primacy; as an author, I need to be able to interact with both. So, the conclusion is simple; we need to build the transformation tool into the editor, and we need this to be bi-directional, so I can edit either form as I choose.

This would give a totally different experience. I could edit the code-centric form when editing code-blocks, and I could edit the document-centric form, when editing comments. When in code-blocks, I would use the code-centric form and the editor should work in a code-centric mode: auto-indentation, code evaluation, completion and so on. In the comment blocks, with a document-centric view, I should be able to reflow paragraphs, add sections, or cross-reference.

But does it work in practice? The proof of the pudding is in the programming.

Lenticular Text For Emacs.

I now have a complete implementation of Lenticular text for Emacs, available as the lentic package (http://github.com/phillord/lentic). The implementation took some thinking about and was a little tricky but is not enormously complex. In total, the core of the library is about 1000 lines long, with another 1000 lines of auxilliary code for user interaction.

My first thought was to use a model-view-controller architecture, which is the classic mechanism for maintaining two or more views over the same data. Unfortunately, the core Emacs data structure for representing text (the buffer) is defined in the C core of Emacs and is not easy to replace. Indirect buffers, an Emacs feature which lentic can mimic, for instance are implemented directly in the core. Secondly, MVC does not really make sense here: with MVC, the views can only be as complex as the model, and I did not want the lentic to be (heavily) syntax aware, so that it can work with any syntax. So, we do not have much in the way of a data model beyond the a set of characters.

Lentic instead uses a change percolation model. Emacs provides hooks before and after change; lentic listens to these in one buffer and percolates to the other. My first implementation for lentic (then called linked-buffers) simply copied the entire contents of a buffer and then applies a transform (adding or removing comments for instance). This is easy to implement, but inefficient, so it scaled only to 3-400 lines of code before it became laggy. The current implementation is incremental and just percolates the parts which have changed. To add a new transformation requires implementing two functions — one which “clones” changes and one which “converts” a location in one buffer to the location in the other.

Re-implementing this for other editors would not be too hard. Many of the core algorithms could be shared — while they are not complex in the abstract, for incremental updates there are quite a few (syntax-dependent) edge cases. Similarly, I went through several iterations of the configuration options: specifying how a particular file should be transformed. This started off being declarative but ended up as it should do in a lisp: with a function.

I now have transformations between lisp (Clojure and Emacs) and latex, org-mode and asciidoc. These use much of the same logic, as they demark code blocks and comment blocks with a start of line comment. This is essentially the same syntax that Haskell’s literate programming mode provides. The most complex transformation that I have attempted is between org-mode and emacs-lisp; the complexity here comes because emacs-lisp comments have a syntax which somewhat overlaps with org-mode and I wanted them to work together rather than duplicate each other. For all of these implementations, my old netbook can cope with files 2-3000 lines long before any lag starts to become noticable. Lentic is fast enough for normal, every day usage.

Experience

The experience of implementing lentic has been an interesting one. In use, the system works pretty much as I hoped it would be. As an author I can switch freely between document and code-centric views and all of the commands work as I would expect. When I have a big screen, I can view both document and code-centric views at the same time, side-by-side, “switching” only my eyes. I can now evaluate and unit-test the code in my Tawny-OWL (http://www.russet.org.uk/blog/3030)manual.

By building the transform into the editor, I can get immediate feedback on both parts of my literate code. And, my lenticular text is saved to file in both forms. As a result of this, no other tools have to change. Clojure gets a genuine Clojure file. LaTeX gets a genuine LaTeX file. And for the author, the two forms are (nearly) equivalent. We only care which is the “real” file at two points: when starting the two lentic views, and when versioning as we only want to commit one form. The decision as to which form is “primary” becomes a minor issue. The lentic source code, for example, is stored as an Emacs-Lisp file, which can transform into an Org-mode file, so that it can work with MELPA (as well as for bootstrap reasons). The manual I am working on is versioned in the LaTeX form but transforms into Clojure so that people cloning the repository will get the document first; I use Emacs in batch to generate Clojure during unit testing.

There are some issues remaining. Implementation wise, undo does not behave quite as I want, since it is sensitive to switching views. And, from a usability point-of-view, I find that I sometimes try capabilities from the code-centric view in the document-centric or vice versa.

Despite these issues, in practice, the experience is pretty much what I hoped; I generally work in one form of the other, but often switch rapidly backward and forward between the two while checking. I find it to be a rich and natural form of interaction. I think others will also.

Conclusions

I doubt that tooling is the only difficulty with literate programming, but the lack of a rich editing environment is certainly one of them. Lenticular text addresses this problem. So far I have used lenticular text to document the source code of lentic.el itself, and to edit the source of a manual. For myself, I will now turn to what it was intended for in the first place: literate ontologies allowing me to produce a rich description of a domain both in a human and computational language.

The notion of lenticular text is more general though, and the implementation is not that hard. It can be used in any system where a mixed syntax is required. This is a common problem for programmers; lenticular text offers a usable solution.

Bibliography

Less than one month after the release of Tawny-OWL 1.2.0 (http://www.russet.org.uk/blog/3018), I am pleased to announce the 1.3.0 release. This is a much smaller release than 1.2.0, but provides two useful changes.

First, I have now added support for axiom annotations as more extensively documented in my past post (http://www.russet.org.uk/blog/3028). While I expect these are still a minority requirement, they are used heavily by some people, and so they need supporting.

Second, I have reworked the support for patterns. This has been through the addition of three functions. p allows writing classes with optionality. So, for instance, consider this code:

(p o/owl-class
o partition-name
:comment comment
:super super)

comment and super are optional here; they can be nil. In this case, the p function removes the nil value from the owl-class call. More over if, as in this case, the entire frame is has only nil values, it will be removed altogether. p returns the results of this call as a clojure record contain the entity itself and the name that was used to create that entity. In a nice piece of serendipity, the support that I have added for annotations (http://www.russet.org.uk/blog/3028), also allows direct use of this record latter in the pattern. So, for example, this form uses partition with is the return value from above.

(map
#(p o/owl-class o
%:comment comment
:super partition)
values)

The reason for this record, though, is that it makes it relatively easy to build patterns in both function and macro form. Macros are never trivial, but all this one does is turn a set of symbols into their string equivalents.

As well as generating the OWL API objects, this also binds the relevant vars, both those visible here (Hydrophobicity) and those generated (hasHydrophobicity).

Take together these are both important additions to Tawny-OWL. We can provide better provenance with the axiom annotations, and with patterns we can lift the level of abstraction at which we build our ontology which is one of the original motivations for Tawny-OWL in the first place.