~ A blog about the complex relation between computers and history

Just a quick note, today. My personal way of working is close to some kind of multitasking. As I personally have only one processor (“the brain”), things do come out sequentially, but I am quite convinced, that the brain actually operates many parallel threads at the same time. Of course, most of this are independent, autonomous processes connected with the bodily functions etc., but some threads are obviously some kind of sub-conscious analytical processes, to which the consicous mind passes pressing problems to be processed without any external disturbances. Then at some point, when your conscious level of the brain has reached a point in its progress, to which these sub-conscious processes have some relevance, they send an interrupt and require attention.

Practically, what this means, is that the ideas, what to do with the data often appear in the middle of the writing process itself, and force me to divert my attention from producing as much text as possible to performing some new analysis with the data. Now, with the traditional, single core systems this used to mean a halt in the writing. No matter how nice you tell your GRASS module to be, it still seems to bog down the system while doing something with your ~30M cell raster. Now, with the modern nay dual-core processors, this does not happen any more — the GRASS module takes everything the second core has to offer, while I can still keep happily using the first one to all the other things.

Especially nice this is on my work computer, as I run GRASS on Debian as a VirtualBox virtual machine. VirtualBox occupies the second core completely, while the rest is left to the actual OS (XP, in my case) to run my Emacs and other stuff. Probably with Quad-core I would feel like wasting the cores, but for my purposes, a Dual core system is very nice indeed.

Just a quick note to record something I just found out, while planning my first poster ever. Of course the first choice, as always when doing layout, is to see, whether it could be done with LaTeX. I just hate the idea of using some WYSIWYG horror, where everything will be inconsistent anyway, no matter how hard you try to Set Things Right.

The natural choice for a poster seems to be the a0poster class, which is designed to set LaTeX properly up for making these huge pages, especially regarding the font and paper sizes. To my surprise, the paper size did not work with pdfLaTeX, but the result was big text on an A4. From some source (lost the site already) I found out, that supposedly the path LaTeX -> PS -> PDF should work, but I’d rather not go there, things are complicated enough when forced to work with XP.

Luckily, the use of geometry package saved me: just include geometry at the beginning of your document with the option a0paper, and the resulting document is A0. Voilà!

This has nothing to do with the theme of the blog, but I just happened to find a wonderful window manager for my laptop. I’m generally using Gnome, since I like the looks, but I’ve never really liked Metacity, the default window manager in Gnome. On the other hand, I’ve lately been studying Haskell, a functional computing language. A wonderfully weird experience.

There just happens to be this new window manager written completely in Haskell, XMonad, which is a so-called tiling window manager — no empty spaces on your screen any more. I just had to try it, and believe me, it is very nice to use indeed. I wanted to integrate is with my Gnome system, and the panels especially, but that was not so easy. First I had to get very recent versions of XMonad — happily there are Debian packages, but not on the official repo — and then I had to figure out, how to configure XMonad so, that it

ignores the gnome panels on my screen;

Leaves empty space for the panels.

The instructions on the XMonad site were for the old versions and did not really completely work. The installation goes according to the instructions on that page, but what the page lacks is a complete, working example of the configuration. Here’s one:

You’ll have to create the file~/.xmonad/xmonad.hs
where to put your configuration. A sample content for this file:

That gives you an empty space of 24 pixels at the top and bottom of the screen, and ignores gnome-panel. In addition, it lets Gimp and MPlayer float above other windows. Some fiddling with the session setup (see the link to the instructions) and this file in place, everything works fine, at least with XMonad 0.6. I haven’t tried the 0.5 series, so I cannot say, whether this works with them, and the versions before 0.5 use adifferent kind of a configuration scheme, so this does not work for those.

Update on 2008/05/01:

There is a page in the HaskellWiki on XMonad and Gnome, which contains all the information on this page plus much more. Go there!

For years now, I’ve been using a venerable old tool, rtf2latex, to convert documents from MS Word to LaTeX. For my purposes it has served well: the purpose being mostly to transform submitted articles into something, that can then made to conform to my LaTeX-style for the journal whose layout I’m doing. In practice the needs are: keep the footnotes intact, let italics, bold and underline survive the translation. Nothing else is needed, as I trust LaTeX to do rest.

I’ve felt strangely uneasy about using rtf2latex lately, though. The fact that is is no more available in Debian has made me look for alternatives. Also, the Word documents I received I still had to translate first from .doc to RTF. wvWare seemed the proper alternative, but it does not work with footnotes at all, and its web page says, that its use for this purpose is deprecated in favour of Abiword.

Abiword I use occasionally, it is your typical Gnome program. Not too complicated, works well, and is nice to look at. But this conversion function I was never able to get working, until today, when I realised, that I need to install also the abiword-plugins… how stupid of me. Now the conversion from MS .doc to latex works well, although the resulting documens is slightly too fancy to me. I’d be happy with something that preserves only the logic of the markup, and discards all of the funny spaces that are used to make it look like a Word document (Why on earth would anyone want that?)

But I guess I can finally stop worrying about not accidentally removing rtf2latex from my system. A replacement has been found! And although from the web page and release history it might seem, that Abiword is a dead project, the traffic on the development mailing list demonstrates, that the project is very much alive. I guess we will have the version 2.6 someday — not that there’s really anything wrong with 2.4.6.

As I’m currently forced to work on a computer running Windows XP at a university, I have started to explore the possibilities, the commercial programs might offer as replacements for the open source tools I’ve learned to use during the past years on Linux. Last week and today I’ve been studying how to do some cluster analysis on the software availabe in the university network. I had already installed my database on a PostgreSQL server running on my work computer – I can’t be bothered to even try things out with Access anymore, even though the current versions are probably much better than the horrible Access 97 -, so an important feature was the ability to access views and tables on the server.

First I tried Statistica, version 7. The data import worked well through the ODBC driver provided by PostgreSQL, and the user interface was surprisingly nice; not immediately accessible, but this seems to be a powerful tool. I never got to the clustering part, though, as this morning Statistica started to complain, that it has expired, and that I need to fill in a new date code. That’s that then, I’ll file a report to the support unless in works again tomorrow.

So today, I turned to SPSS (version 14). Data import was less intuitive, but worked well. The analysis methods are not as easy to use, and I was quite surprised, when the simple hierarchical clustering of ca. 500 measurements, each with ca. 30 variables, locked the computer for almost 30 minutes. Seven years ago I wrote a C++ program for my old, old laptop for a similar purpose, with 3000 measurements with 10 variables, and it lasted 20 minutes. And that was my first ever real computer program, so I had expected somewhat better performance.

I decided to test the good, old tools, and installed R on the XP. I had some trouble importing the data from PostgreSQL, until I realised, that I just have to use the same ODBC interface as with the other programs. After that, everything went quite quickly; a hierarchical, agglomerative clustering with agnes took 4 seconds, but nicer results were produced with diana (about the same time), both from the package cluster. After this, I won’t be going back to SPSS, but I might still give a try to Statistica, if it agrees to run one of these days.

Based on these quick tests, R is a much more efficient tool. The learning curve is probably quite much steeper, as using R is like shell programming, but once you learn how to use it, there’s no limit what you can do. But don’t take my opinions at face value: I don’t really know how to use SPSS, so anyone really knowing his/her way around with it is probably better source. This is just a blog, anyway…

For a long time I’ve been looking for a tool to annotate the pdf documents. The electronic versions of academic journals usually provide the articles in pdf format, and it would be nice to work with them like with the printed versions: underlining, making notes in the margin etc. This far, this has required the purchase of a full version of Adobe Reader — rarely anyone distributes their pdf-files with the “comment” property enabled, if not for anything else then for the reason, that to enable this property is not trivial.

A wonderful alternative seems to be PDF-XChange Viewer. It is as free as the Adobe product, but seems to be faster and lighter on the computer. The display quality at the XP I use at work is as good, and the biggest bonus is, that you can add your own annotation to the pdf file. These are saved with the file, and can be seen with any other pdf viewer, it seems.

You never know about the policies of individual companies, but it seems, that Tracker Software tries to do the same thing as Adobe, because the other versions of the program have more capabilites; they just offer more for nothing.

Downsides? Of course, there is no Linux version available, which is a pity. But for the time being, as I have to use XP at work anyway, I’ll be doing my readings (and annotations!) with this program.

Update on 2008-06-20:

I just tested PDF-XChange viewer on Wine after having updated my Debian testing distribution. Works well now, so even though there is no Linux version yet, the program can be used under Wine. No special setups in my case were needed.

As a happy user of LaTeX for a few years, a recurrent problem has been the sharing of my documents with other persons. In the early days I was a happy latex2rtf user, and I even contributed some minor details to its development. Quite soon it became apparent, however, that the only reasonable solution to exporting my products is something that acts like a TeX-processor.

In the cases I need to export my texts to other formats than PDF for printing, the layout is of secondary importance. Of major importance, however, is that the certain “academic” structure gets through as well as possible:

The footnotes have to be footnotes also in the end product

The bibliography must come through as produced by jurabib.

Everything other is secondary, as the articles written will be layed out by the journals, anyway, but they want the reference system to work. Ever heard of a journal in the humanities giving out its LaTeX-styles? Me neither. This means, that I have to try to reach a citation format which fills out the requirements of the journals through tweaking the options of jurabib.

This far, the only solution which actually seems to work is TeX4ht. This is a program that works by running a TeX-processor, and it has quite many output formats. Only bad thing, the documentation is quite lousy, and most of the commands are not described at all.

But I get pretty ok OpenOffice output with oolatex, although ooxelatex is better if you want to use other languages that pretty plain English. Classical Greek works fine, though… I had some trouble getting this to function, for a long while, in fact. It seems, that TeX4ht did not like the hyperref package at all; once I dropped that from the preamble, everything went nice and smoothly. The problem seemed to be related to jurabib, somehow. Probably should file a bug report, some day.

A sad thing is, that jurabib is unmaintained. Jens Berger, the guy who developed the package, cannot devote any more time to the package, so the package is frozen until someone volunteers to take it over. I wish I had the time… A replacement pointed to also by Jens is biblatex. It seems to be quite a potent too for the bibliographic needs in the humanities, but it is still beta-level and not officially released, so you can’t find it in any of the TeX distributions, yet. It seems to include many of the good features in jurabib, like fields for gender, original languages and translations — all very necessary for a historian. To the surprise of many, the hegemony of English is not nearly absolute in for example Classical Studies. French, German, Italian, even Spanish are still major languages, and a researcher unable to read any of there is bound to miss major contributions in the field; therefore, support for original language information of publications is important or people working in these fields.

But none of these really help in getting over the main problem in humanities word processing with LaTeX: the incredible backwardness of BibTeX. In a world where almost everything begins to support Unicode, BibTeX is happy only with 7-bit ASCII. As the only decent BibTeX-file editor is Emacs (IMHO), this is a major pain-in-the-ass. Who wants to keep up a bibliography, when you cannot write Köln but you have to type in K\”oln. Not too handy nor readable.

I’ve actually been running BibTeX on unicoded files happily for some time now, you just have to be very, very careful with the entry keys — better to use plain ASCII in those. This is not supposed to work, though, but luckily, it does. There are rumours (about five years old or something) about a new version of BibTeX, which might address some of the problems. Who knows, perhaps in ten or twenty years we’ll see the next version. I just think, that unless it appears soon, there won’t be many who care about it, anymore.