Category Archives: Software & Tools

Do you use Mendeley, Papers, Zotero or other bibliographic softwares that export their bibliography to bibtex .bib files for use in LaTeX with with bibTeX or bibLaTeX? Do you hate the way those bibliographic softwares generally export a .bib file that can’t exclude certain fields and so make a total ugly hash of your Bibliography Entries?

If so, then this Python script is for you! It deletes the extraneous fields in the .bib file leaving you with just the essential ones, like Author, Year, Title, Journal, etc. Never have a horrid bibliography entry with the article Abstract (!) copied into it ever again!

But it turns out there’s a fair bit of work to do on the texts before they’re usable in a programmatic way. The format of the XML raises two questions for me. It’s always confused me that people talk about using “epidoc” (“Epigraphic documents in TEI XML”) to encode literary texts. Why is it used in this way, to encode documents it is apparently not designed to encode?

The second question follows on from this. I don’t know whether this is an artefact of using Epidoc or if it’s an artefact of the particular choices made to encode the CSEL. The standard numbering systems of the critical editions of these texts are effectively lost in the Epidoc versions of the text online, rendering them problematic for programmatic access to the data in the standard scholarly reference systems.

Different texts have different breakdowns, for example, Book/Poem/Line, Book/Line, Letter Number/Line, and so on depending on the particular text and the choices made by the editor of the critical edition. In the Perseus format (the “old” format?) the TEI documents have a header that tells my programs on De Commentariis the structure of the document breakdown, thus:

This gives the document a structured, hierarchical view of the content. Everything contained with the div1 element with the attributes type=“book” and n=“1” is a part of Book 1, and the div2 element inside that with type=“chapter” and n=“1” is 1.1. and inside that the div3 with a type=“section” and n=“1” is 1.1.1. The abstract document structure (according to the standardised referencing established by the critical edition) is encoded directly onto the data structure. It’s an excellent XML structure that reflects directly the way the data is referenced, with enough flexibility to encode many different types of referencing schema, as long at it’s laid out in the metadata and the relationship is hierarchical. It’s easily navigable with standardised XML tools like xpath/xquery or simple XML DOM (document object model) manipulation.

In this style of format, the presentation of the text (the original page it was scanned from) is confused with the data structure, and the critical data structure information is presented in the form of an annotation attached to a particular line (rather than enclosing all the lines which belong to chapter 1). This style of document is incredibly difficult to use with standard tools like xpath. This is highlighted if we go down just a little further into the text:

Where does chapter 3 start? Clearly not half-way through impenderent and most likely not at the word break in damnasse uideantur. Is it at the comma after impenderent? At the full stop after uideantur? A human, familiar with the original text, might be able to decide: a simple algorithm inside a computer program, probably not.

I bring this up – I know it may seem churlish, after all any open XML version of an ancient text has to be a good thing – because I feel that in the “official” digital classics circles there is a certain enthusiasm for recoding existing XML texts to the Epidoc format, but if this is the result, it’s a definite step backwards. Forgive me if I am wrong and this is merely the first step in getting from the presentation layer (the scan of the book) to the data layer (a properly structured XML version of the text). But I certainly hope this style of markup isn’t regarded as the standard way to proceed into the future.

Coming up soon: features to support instructors in the classical languages to create, manage and allow their student cohorts to engage with their ancient texts via the commentaries they create. List of features planned.

The tool is about the creation of “crowd sourced” / “social” commentaries on ancient texts. I hate both of those terms in scare quotes — I don’t like buzzwords like that — but I can’t think of better term. Being literal-minded with the domain name, “a network on commentaries”. What’s not to like? Click the link above and find out!

DeCommentariis.Net

De Commentariis uses data from the Perseus project’s online open-source data repository. Because of this the number of texts – especially Greek ones – is severely limited at the moment but I hope to get more as the texts improve in the Perseus repository and I overcome my own technical limitations effectively extracting the data. I’ve got a some “suggested texts” linked from the home page, but you can list and view all the available texts (some that say they are available aren’t in a great state though, so please be aware of that limitation! I’m a Livy scholar and there’s next to no Livy in it!).

DeCommentariis.Net example commentary.

In order to see the texts you must register an account.[fn1] You can sign up with either Google, Twitter, or Facebook credentials (OAuth); or just register a simple account on the site and fill in the fields and put a good password in place. Once you do any of the former steps you’ll be sent an automatic email address verification email. In this email there is a link. Click the link and you should be able to use the site.

After you register, send me an email, or just reply to the verification email and I will add the “make commentary” permission to your account (on my to-do list: automate those permissions). Until I do that you can’t enter any commentary items. (Update: if you verify your email address – google+ social logins are automatically verified – permissions are added automatically).

The site is running on pretty limited resources at the moment so be a bit forgiving if it gets slow under all your eagerness to log on and check it out.

I would love to hear your feedback.

[fn1]: Your browser will warn you about “insecure” security certificate. I have to use a “self-signed” certificate for the moment, because at this point I’m not about to pay $200+ per year bribe to a security signing authority for a signed security certificate. The alternative is let you send your password unencrypted to my server and that’s just silly. Therefore, there’ a self-signed certificate. Update: proper security cert installed.

Regular readers of my blog may already know of the struggles I’ve had with Papers, the bibliographic database and research tool. That last link goes to what is by far (a huge margin) my single most popular blog page. That is because the Wikipedia entry for Papers links to it. But if you need verification of my negative appraisal of Papers in those posts, or in this one, just have a look at the comments. Anyway, I took the decision a couple of weeks ago to stop trying to make Papers work for me at all and try another tool in its place. Ditching it and moving over to Mendeley was relatively straightforward for me.

The Mendeley web application interface

Mendeley comes in three major components.

The first part is the web application, where you sign up for an account. The account is free, and you get up to 2 gigabytes of storage space for your research database. If you need more you can purchase a plan to get more space. I’ve got several hundred papers in my database and it uses 600 megabytes, less than half the allocated free space. Technically, all you actually need is the web application. The Mendeley web app also has this social networking aspect, but I think these features are actually rubbish in a general sense.

The continual focus by Papers on extending its ‘social networking’, rather than fixing the serious data reliability issues and extending its core research and citation features was one of the reasons I decided to cut it loose in the end. On this account I don’t care for similar features in Mendeley. If I want to get a social network of academic research interests, there’s always academia.edu. That is, besides regular networking at conferences, and participating on relevant mailing lists. Mendeley uses the social network to retrieve articles off those which other people upload into their databases. Yet the usefulness of this feature is going to depend how many Mendeley users there are in your research area.

The second part of Mendeley is “Mendeley Desktop”, a free download from the site. It’s your local application that you run as a native app on your Mac or PC. It downloads your research database from your online account. It uploads any new papers that you add to the desktop app to the database in the web application. There are Desktop versions for Windows (XP and later) and Linux as well as Mac OSX. The Desktop app can also import your Papers library into Mendeley if you are converting. It directly imports your Papers2 database. I am not sure if it can import a Papers3 database. Papers3 does its darnedest to hide the database from you: you may have to export your Papers3 database to a .bib file and allow Mendeley to import that. Mendeley Desktop also has some neat features and some drawbacks compared to Papers (see below).

The third part of Mendeley is an iOS app. The Mendeley iOS app is free, unlike the Papers iOS app. For Papers, you have to buy the iOS app as a separate item to the Mac or Windows app. The Mendeley iOS app, just like the Desktop program and basic levels of web storage, is completely free. Like the Desktop app, the Mendeley iOS app syncs to the central web-based data repository. Papers2 tries to cross-sync its iOS app to the desktop via your wifi network, an inferior solution. The Papers3 iOS app syncs to your desktop via Dropbox, or iCloud. The Mendeley iOS app lets you carry around your research database on your iPhone or iPad or both. This is great for reading research articles on the train or bus, during lunch, or just sitting around under a shady tree in beautiful Queensland weather (did I mention I go to the University with the most beautiful campus in all Australia?). The Papers iOS apps have this functionality too of course, but at a cost. There is also the matter of the two different iOS apps depending whether you use Papers2 or Papers3.

There are some nice features that you gain from switching from Papers over to Mendeley:

Mendeley automatically syncs its database to a nominated .bib file for BibTeX or BibLaTeX so you can always have one up to date with your research data. This is important for people like me who use plain-text tools like Pandoc and LaTeX to create and edit their articles. Having to remember when I last performed a manual export of the .bib file from Papers was a pain in the neck.

Mendeley generates citation keys in much nicer format. The default is a straight author-date format (Mcphee2014). This way you don’t have to remember those awful random appendices that Papers tacked onto the end of its cite keys. And Mendeley doesn’t generate the colon between the author and the year (Mcphee:2014zkwel). To convert from the Papers format to the one used by Mendeley, I had to do a bulk ‘regular expressions’ search and replace on documents. I had already created. But that didn’t take long (because I use simple marked-up plain text as my main document format). Now it’s much nicer to insert references into my documents, as it’s easy to recall the citation key.

It’s free if you have less than 2GB of PDFs (I mentioned this already but it bears repeating).

I feel that Mendeley’s duplicate paper detection and merge is superior to Paper’s. But, Papers has an author merge and journal merge feature that Mendeley doesn’t. This is pretty neat when you get several variants of Author or Journal names, and Mendeley doesn’t have this feature. Instead you have to edit the offending documents one by one so all the relevant authors and journals match. This is not as nice as Papers’ superior method of dealing with duplicate authors and journals.

I far prefer the central-server sync scheme used by Mendeley to the Dropbox or iCloud style database file sharing, or inter-device wi-fi sync that Papers uses. The Papers developers clearly have struggled with these latter mechanisms (and cross-sync can be hellish to do successfully at the best of times). Furthermore, Mendeley Desktop’s local configuration and data store is sqlite, a standard lightweight application storage database. This means that standard tools exist which allow a geek like me to hack into my local Mendeley database if needs be. I have found this feature useful to clean up the horrible citation keys that Mendeley imported from my Papers database. But if this last point sounds like gobbledegook to you, just remember that Mendeley’s storage of your precious research data is more reliable than Papers.

What you do lose when you switch from Papers to Mendeley is the internal search hook into the online article databases (e.g. JSTOR, Web of Science, Pub Med, ArXiv, etc). With Mendeley, you have to go to each database that you use one at a time and use their various web search facilities. Then you have to import each result into Mendeley with the supplied browser bookmarklet. This is an ugly throwback to go about searching for people used to Papers’ integrated search. Papers itself can search research databases and import the selected results directly. Mendeley does not have this feature. Yet the Mendeley website lists “Search across external databases” in the feature comparison matrix as “Almost there!” With luck, this is an important feature that Mendeley won’t lack for too long.

Mendeley can auto-import PDFs that you save into a configurable directory. When I last checked this feature out a few years ago, it didn’t read JSTOR metadata in the PDFs in a correct manner. You had to do a tedious clean up of the resultant data by hand in the Desktop app. If this applied to you, it negates the feature and creates dispiriting extra manual work. Later version may have fixed this defect, but I have not yet tried it with the current version yet. (Update 2014-09-09: I have tried this out, and it still imports PDFs from JSTOR terribly. Mendeley, please fix this defect!)

Mendeley does have a search tool for searching papers that other users have imported into Mendeley. This is helpful if you are in a field that has a lot of Mendeley users. But if you are not not, then you won’t find many results. I tried searching for something obvious in my field and got only two pages of results. Most of which I already had in my library. Any one of the relevant online databases would have given back hundreds of results. So you need a large pool of researchers in your field for this to be a great feature.

There are also some other minor drawbacks to using Mendeley.

In the desktop application, online, and in the iOS app the columns you can view and sort in your research database is limited. They are not at all flexible or in anyway configurable. For example, you can’t view and sort by citation key. You can filter by publication or by Author name using a side-bar on the left. You can search you own collection though and that’s pretty flexible.

The reference manager, which inserts citations into documents and builds your bibliography automatically, is only available for Word. Also the documentation implies that Open Office and LaTeX options are available also. Although Word was the only option on the menu that I saw. I don’t use Word or Open Office for my research publications (and you should not either, word processors suck!). I use Pandoc so I guess I’m plumb out of luck. You can insert citations in several different portable flavours with Papers’ Citation.app. These include Pandoc and Multimarkdown, as well as Papers’ own format. Papers has more options for citations — if only I could have gotten it to be reliable. Both products use the CSL format for citation formatting (as does, for example, the citeproc tool which Pandoc relies on). Mendeley needs to add Pandoc and Multimarkdown citation insertion support. But, Mendeley’s sensible citation key generation, combined with Pandoc’s simple reference style, makes manual insertion of citations pretty easy: @Mcphee2014 p. 1.

In Mendeley, author names that have apostrophes in them, such as “O’Dwyer”, generate invalid citation keys in the .bib file (e.g. “O’Dwyer1999″). You have to perform a manual edit of the citation key in the desktop app to fix it to something valid, e.g. “ODwyer1998″. This is a known bug in Mendeley, let’s hope they fix it soon.

However, putting up with those drawbacks beats losing research data to database corruptions! If Papers didn’t have such a large range of very fatal data reliability bugs it would have many more interesting features than Mendeley. Trust in your research database’s reliability has to be absolute for any researcher. The Papers team have left their users in the lurch on this score. Promising to fix it in future updates just doesn’t cut it. Such fatal bugs should never be in a public release. And once detected post-release, an emergency patch should be available within hours. It shows fundamental misunderstanding of software engineering principles. Mendeley, is not as flashy or as feature-rich as Papers, and lacks many advanced features, but gets the basics right. Also, the Papers developers shut down public threads on their support site, to keep negative comments being visible. This is a terrible, non-open way to approach support issues! Without a public forum, users can’t solve each other’s problems. They have to rely on official support channels only, which can take weeks to answer the simplest of queries. Or they use unreliable unofficial channels. Without public forums, critical bug reports, generated by their hasty release of a poor-quality beta version of Papers3 (that they had the gall to charge money for), overwhelmed the support staff. The desire to control what their users were saying about their product resulted in a major loss of reputation.

I will sum up with an analogy. Mendeley is like a basic model car that is unremarkable in features and gizmos and only only comes in one color. But it gets you from A to B with pretty good fuel economy and in a reliable fashion too (imagine a 1980s Japanese sedan). In contrast, Papers is like a nice-looking car with tons of nice styling and loads of gizmos and advanced features as standard. But every third morning it won’t start without a complete oil change and full service. Once a year it tends to dump its gearbox on the freeway while you are in the middle of driving it (imagine a 1970s Fiat, Rover or Leyland). Thus, while I’d love to have a beautiful, stylish car, with all the bells and whistles, the tow truck and mechanic’s bills (and the time wasted) is killing me. And preventing me from getting to work on time, and sometimes not at all, so … no. Mendeley it is.

I have been writing my thesis using the Pandoc markup format which is a flavour of Markdown. Recently I also switched to using the Sublime Text plain text editor for this purpose. As Sublime Text is fantastically extensible with easy-to-write Python 3.3. plugins, thus on week nights I’ve also been building some tools that help me with some of the basic Pandoc/Markdown mark up text. In particular that bane of all those who write theses of any sort: footnotes and references.

I wrote the plugin PandocReferencr. It’s available in the Package Control system for Sublime Text (to install, type CMD-SHIFT-P (CTRL-SHIFT-P on Windows and Linux), type “Package” select “Package Control: Install” from the drop down menu, wait until it fetches the package list, then type “Pandoc Ref” and select the package to install).

There are two main commands: one (check_footnotes) scans the current file and makes sure every footnote that has been inserted has a corresponding entry somewhere else in the file. It also checks if every footnote text entry also has a footnote insert to match.

This block of footnoted Pandoc/Markdown has two errors and one successful footnote.
This text has a successful footnote.[^footnote1] This bit of text has a footnote
that's broken because it doesn't have the matching text entry.[^notexist0]
[^footnote1]: this is the text of the successful footnote number 1.
[^missing1]: there is nowhere that inserts the 'missing1' footnote.

The other useful command (insert_footnote) asks for a footnote id (if you have selected any text, it uses the selection as the default), and then inserts it at the end of the selection (if nothing is selected it acts as expected – inserts at the current cursor position). Then it asks for the footnote text data, and when that’s completed (i.e. you press enter) it puts the footnote text at the end of the current buffer (i.e. at the end of the file).

I’ve got some other ideas for functionality to add in the near future, like:

My latest support email to the Mekentosj Papers support – who rarely answer support emails anyway. They don’t dare have a public forum like nearly every other software company because it would be flooded with complaints. This particular complaint applies to Papers version 2, which I had to go back to after I found Papers 3 to be unusable junk. Turns out the same thing applies to the Papers 2 version. I wonder how much other research articles it has managed to disappear? I’m sick of it. I want my money back. I’m going to sick a lawyer onto them for the 70 euro refund just from the principle of the thing.

If your data integrity is shite then it doesn’t matter diddly squat how many dancing gizmos and nice icons your so-called beautiful interface has. I want a research database that doesn’t lose articles then insists the article is still in its database when it’s plainly not.

I found today a citekey reference for [an] item in an oldish article using a cite key, that clearly I [once] had [in my papers database]. I even found the article title in a manually list of references in the article.

I looked for the article in the Papers 2 database. Nothing.

Then I searched for the article in JSTOR using Papers. It didn’t find it. Papers search is unreliable, terribly unreliable, so I went manually to JSTOR and found the article. Pasted the JSTOR URL into a new browser window of Papers. Imported OK. Then I decided the easiest thing would be to assign the original citekey to the item. Edited the article’s details.

Changed the citekey. Papers warns me there is an item with the same citekey in the database already!

Looked at the list of every item in the papers2 database sorted by citekey. The citekey is not to be found in the database. Conclusion: Papers database corrupted again.

I just paid for an “upgrade” to Papers3 and that was unusable for other reasons (Citations busted). I’ve paid for your product (Papers2) twice. I want my money back. Every time I open it, there’s some new random problem with it. You don’t answer your support calls. Everything about your product is is either junk or corrupt or broken or a black hole which you never answer questions about serious problems.

It’s currently “beta” and they want you to pay $79 for this beta version. There is a 30 day demo. I just spend about two or three hours demoing it.

In that two hours I have just filed about seven support items against it. Three of them are data corruption issues relating to importing the Papers 2 library, although thankfully, the Papers 2 library files remain uncorrupted and it’s just the new Papers 3 library that is junk. I get:

blank authors on some items

these blank authors are not deletable

book chapters do not import correctly (book title in “subtitle” field)

most of my periodicals are missing from the periodical list even though the papers are in the database

when you do a search, it doesn’t ask you for the ezproxy password if ezproxy is configured and just gives you no results instead

This is just what I found in three hours. I did not try actual citations or exporting the data into .bib files but given what I’ve seen so far I would not be very confident of success. Do not buy this program until they can report these issues fixed. This is quite apart from the fact that there are outstanding Papers 2 issues, like the fact that my iOS/iPad database incorrectly swaps authors names on the iOS data when imported. Oh look, an unfixed database corruption issue! What chance of Papers 3 being fixed? I rate this as none. Papers is a “data corruption ‘r’ us” special. For an application where data reliability is the key indicator of quality (research database!), this is very, very, poor quality, indeed, an amateurish effort. The company is focussed on making a pretty interface and churning major versions without fixing critical data reliability bugs.

This is a massive fail. 0.5 stars of 5.

If only Mendeley Desktop could search the online databases directly! Despite that limitation I am thinking about switching off Papers 2 onto Mendeley forthwith.

Can’t work out the root form of a irregular Latin conjugation? Confused as to whether it’s a 3rd declension neuter plural or a 1st declension feminine ablative .. or even nominative? Is that 1st/2nd pl. dative or ablative, or a 3rd m/f sing. genitive? Know how to parse the form, but don’t know the vocabulary?

Well, there’s an App for that!

There’s now a free iPad version available. You can read more about it, and get it from the App Store, via this link: http://inlustre.net/latinowl/.