The Latin Lexicon (nicknamed Numen) is an online Latin dictionary (a dictionary of the Latin Language) and Latin grammar tool based on An Elementary Latin Dictionary (by Charlton T. Lewis). This online dictionary is different from any other you've ever used. It has been built from the ground up using AJAX technology to allow the fastest, most efficient and most useful user interface. Oh, and it includes macrons!

Saturday, December 26, 2009

Well, I've just run the word analysis tool on Livy Ab Urbe Condita Book 2. The important thing to note is that out of eighteen thousand words, only 20 weren't parsed and found in the dictionary. That's pretty much amazing.

How did this happen? Well, two things had to happen. First, I ignore capitalized words that weren't located in the dictionary. Essentially, I'm ignoring proper names and place names. Second, I programmed Numen's ability to parse syncopated perfect verbs: laudasse (laudavisse), norat (noverat), et cetera.

I still have a bit of testing to do to make sure I didn't break anything, but this was one of the few major hurdles that I needed to overcome to get a nearly perfect parsing engine!

Monday, December 21, 2009

This isn't the prettiest site in the world -- I admit. It's meant to be functional. Still, I sometimes think it could look better, so I spent a few minutes today sprucing it up. Hopefully I didn't break anything.

News is slow -- for most schools around the U.S., winter break is upon us. With students taking a break, the site is also slowing down. Hopefully when you all return from break there will be some interesting new features and bug fixes!

Saturday, November 28, 2009

After the server upgrade a few weeks ago, I busted the OpenID login system. I've spent a few hours fixing it and improving it, so if you use OpenID to login then it should work now. Feedback is always welcome!

I like to keep the news relatively fresh, even if nothing major is happening. But even though this is a somewhat slow time for Numen development, there are still a constant stream of small updates. I'd like to keep everybody apprised of the situation.

One of the benefits of having students who actively use this dictionary is their feedback. One of the things they noticed was that sometimes an error message pops up saying, "AJAX has timed out", and it would happen relatively often -- especially on the flashcard practice tool. So I dug into the code and found the issue: I designed the site so that -- if any request took more than a few hundred milliseconds -- it would give an error. Such timeouts always involve a balance between briefness and lengthiness of waiting, and I quickly realized that I had not struck that balance. So I bumped up the timeout duration to something reasonably middle-of-the-road and. Voila! Problem solved! Gratias vobis ago, discipuli.

More: My big project with Vergil and Livy is still paying off. I continue to correct dozens of tiny mistakes and errors in the data every week, and I was able to run some statistics. Excluding proper names and place names, the parsing engine can analyze and pin down around 98.5% of all the words in these two Augustan authors. So accuracy is definitely improving daily! I can't yet account for all false positives, but they seem to be less than a fraction of a percent (anecdotally).

Speaking of accuracy, there is still room for improvement in three key areas:

proper and place names (which do not, for the most part, exist as a regular part of the Lewis Elementary dictionary).

I have been cogitating over solutions for all three issues, but it might take a while to implement them -- probably not until December or January (winter holidays, yay!).

In other news -- I guess I have more than I had first assumed -- I've almost got a word list feature finished. This is for people who prefer to work with formatted word lists as opposed to flashcard decks (which I understand are sometimes referred to as index cards). The only major problem I have with this process is that the Lewis Elementary dictionary does not provide a "core" definition for most words, so the word list would have extremely lengthy definitions. I think one of my best options is to import the data from Whittaker's Words, which have more simple, more core-like definitions. But this could be problematic as a 1:1 mapping between Lewis' forms and Whittaker's forms would be difficult to achieve. Alas, I shall continue to think on this one.

Okay, so that's enough for now! Keep using it, and please keep reporting problems and errors! It may take a few days or weeks, but I eventually do fix all the errors!

Thursday, November 5, 2009

I fixed a small bug that affected flashcard decks in Internet Explorer 8 (and presumably earlier versions). If you couldn't create a flashcard deck in that browser, it should be fixed now!

As a side note, I've been working on a big project with Livy and Vergil. I've essentially been editing all the mistakes and unfound words in those authors. This is especially useful in Livy because we have a corpus of about 1 million words! So the accuracy of this dictionary is creeping up to the highest possible levels! With the exception of proper names and place names, I'll ballbark its accuracy with common classical authors at about 95%.

Also, thanks to the people who have been reporting errors and bugs! It's really helpful to have your feedback!

Wednesday, October 14, 2009

I mentioned a few weeks ago that I planned on making I's/J's and U's/V's look the same on the back-end, while preserving their traditional orthographies on the front-end. I've just completed this task!

My main motivation for making this update is because certain passages stored in The Latin Library reflect the older conventions of using J's for consonantal I's or U's for both consonantal and vocalic V's. Numen's parsing engine was having trouble recognizing forms like jecit (iecit) and uuius (vivus). So now as a result -- after a bit of work -- the engine is updated and now recognizes more possibilities than ever. Incidentally, internally J's are stored as I's and U's are stored as V's.

Another project I completed at the same time is an order-of-magnitude speed improvement for parsing. I was trying to figure out ways to make the engine faster and I discovered a shortcut that boosts speed tremendously. When parsing a word, the engine used to spend between 250ms and 500ms parsing each word! That was always disappointing to me, but I had gotten around the problem by caching the results. Now, however, word parsing takes about 25ms!

Why bother improving the speed? Because soon I will be implementing word lists and frequency lists! A word list, of course, is just a "mini-lexicon" that defines only the words in your chosen passage, and a frequency list is a list of words in order of how often they appear in a passage. The word list will be helpful to quickly work on vocabulary for a passage, and a frequency list will help Latin students study more effectively by giving them the most frequent words first. I'm very excited about this feature, but I don't anticipate it will be done before January 10th (giving me the winter holiday to work on it).

Thursday, October 1, 2009

So one problem with Numen is that it doesn't recognize the different possibilities when dealing with I's and J's and U's and V's. As you know, the J and the U were not Classical Latin letters. There has been a lot of back-and-forth over the past 200 years -- some editors prefer the originals and some prefer the modern versions.

But how should Numen deal with this issue? Internally, the computer is more precise and less forgiving than a human, and so in order to provide highly sensitive and accurate searches, the data needs to be "normalized". For example, I recently normalized verbs for consistency by changing all deponent verbs into their active forms and simply marking them as deponent with a data flag. Now, when you search for a deponent verb, the flashcard still shows something like sequor but internally it's stored as sequo. The reasoning here is simple: deponent verbs, regardless of their dictionary form and traditional morphology, still have active participles and their imperfect/pluperfect subjunctives are still formed from active infinitives.

But what about the I's and J's? Those are easy. Convert all the J's to I's, and most Latin readers won't have a problem -- this has been the convention for quite some time now. But then what about the V's and U's? Should I convert all the U's to V's? The opposite is true here: most Latinists would be mildly irritated by this form: uiuus (vivus).

The solution, which would be similar to the one for the deponent problem, would be to mark internally everything with I's and V's but then show the contemporary I's and U's and V's to the end users. That way, the computer can do accurate searches, but users get the information they are used to.

So, in the coming weeks, Numen will undergo this under-the-hood transformation. For the most part, users will never even notice -- except in one area. Searching for uiuus will be the same as searching for vivus!

Saturday, August 29, 2009

I apologize for taking down Numen for a few hours! But it was for a good reason. A couple of weeks ago (as I posted) the server died, and I had to jury-rig a (slow) home computer to be the server.

Well, just yesterday a bunch of new parts arrived via UPS and I quickly built, tested and verified the new server. I took down the old one about 11am, copied over all the data, and fired up this new one just a few minutes ago.

This new box is quite a bit faster than the home computer which was the temporary surrogate, but you'll be happy to hear that it's also zoom-zoom-zoom fast compared to the original server! It's somewhere in the range of 2 to 3 times faster overall.

The computer power users might be wondering what the new server consists of. It's a very simple configuration. It's a dual core Athlon 245 (?) Athlon II 2.9GHz processor, an nVidia-based AM3 motherboard, 4GB of DDR-2 1066, an Antec EarthWatts 380W power supply and a WD Green 1TB hard drive. I was specifically trying to build a power-sipper here. My best guess -- because I forgot to bring a Kill-A-Watt to test it, is that this server runs somewhere around 85W and around 110W at peak. I wanted to get an Intel SSD (160GB Generation 2), but ... I felt that the performance gain from that wouldn't outweigh its cost (around $420). Now why does that matter? Because this entire computer, including shipping and a $20 mail-in-rebate was about $292! Nice!

Okay, enough geeking-out for now. I hope you enjoy using Numen, and as always, feedback is welcome!

Friday, August 14, 2009

You might have noticed that this site was down for about a day. That's because the web server crashed! Well, that's a simple way of saying something more complicated. Don't worry, absolutely no data was lost. I did, however, have to load everything up on a smaller, older, slower server -- so what that means is that the site is going to be a bit slower until the replacement server arrives. It's under warranty so it won't be a problem to replace it, but it might take a few weeks.

Wednesday, July 8, 2009

As per the usual, I've been tinkering with the back-end from time to time. Things slowly and silently improve -- well, hopefully!

Right now I'm at the University of Kentucky participating in an intensive conversational Latin seminar. Intense is the appropriate word. Today is day 2 of the seminar itself, although I've been here since the 4th of July. We'll see how my Latin improves after 10 days of immersion.

Since I've been here at UKY I found a bug -- well, a performance issue. Some Universities (like this one) don't put individual host names in DNS for their various thousands of connections. UNM happens to be one that does. Normally, my web server is set to perform a DNS lookup on every single connection (inefficient, I know). For a normal internet connection (say, a UNM student or a home Comcast user), that lookup takes microseconds. In the case of UKY, that lookup took 6 seconds ... to timeout. That timeout happened every single time the browser connected. Needless to say, this site must seem incredibly slow at a University like this. I can verify that it did for the short time it was affecting me.

So, to all the UKY students out there ... I apologize for the slowness. I also apologize to all the users of this site who experienced the ... shall we call it a mismatched configuration? On a pleasant note, this was a truly easy situation to resolve. One # (hash) sign in the config file and all is well. C'est la vie? Oui. Or in Latin, perhaps Re vera est? Ita.

Incidentally, I clean up words on the back-end on a regular basis. Since I've been intensely studying here, I've been catching a few more errors than usual!

I also had a suggestion to incorporate the full Lewis and Short dictionary into this site. Great idea! I'm looking into it now.

The biggest improvements came in database queries. Some of the queries I was using were executing more slowly than I would have expected. In researching this problem I discovered something called prepared queries. I had no idea they would improve execution speed of certain queries by nearly 10x! On the back-end of things, that's a considerable improvement. On some pages it reduced the overall server load of each page by half -- to 35ms from 65ms! On the front-end, the site will probably feel a tiny bit snappier. Overall your average page load will reduce from about 160ms to about 130ms (since it takes about 100ms for intercommunicative data to traverse the internet from your computer to the server and back). That may not seem like much on your end (a 15% drop in latency) but on the server side it's quite dramatic (a 50% drop in latency).

Monday, May 25, 2009

I know, I know! I've been bad about updating. But as usual, there's way more going on behind the scenes here than meets the eye.

I've been a busy beaver since the semester ended. I've got two main things going on in my life right now: my reading list and this web site. I read Lombardo's translation of the Aeneid and now I'm reading Ferry's Georgics. I'm also working through Discourse, Consciousness and Time by Wallace Chafe.

But I've also been working on this site! If you've tried to visit in the last week, you might have noticed that the site was a bit flakey from time to time. It's true, and I apologize, but it was all temporary and for a good cause.

First, I rewrote the flashcards feature entirely using AJAX technology. Check them out! They're completely awesome. They should work at the very least in IE8, Firefox 3, Chrome and Safari 3. That should cover 98% of the people out there. Maybe I'll test them in Opera later. There is one major feature missing: printing. But I added two super-awesome features: custom flashcards decks and practicing those decks online! Two other minor features are missing: timed slideshows when practicing and searching by tags. Those are minor additions that I'll get to later.

I also made the site more uniformly UTF8 compatible. This is a technical, backend feature that won't affect you at all, most likely. I used to send all the Latin characters to your browser in HTML entities, but now I'm sending them directly in UTF8 encodings. Surprisingly, that was a really easy feature to enable.

Another big improvement is the site security. I've been looking for holes and security breach-points. I discovered a big one: XSS (Cross Site Scripting). It's kind of an ugly loophole on websites, one which has been around for ages. Essentially I fixed my back-end library code to disallow these so-called XSS attacks. With a bit of luck and some salt thrown over the shoulder, I've hopefully closed all the loopholes.

As usual, I'll add a promise to try and update the news regularly. But if I don't, just remember that this site is continually improving behind the scenes.

I'm sad to see that it has gone missing from the web. Unfortunately, the Google Cache has also expired. Since this very important resource is in danger of extinction, I took the liberty of mirroring the Latin-English portion of the site (including sigla). I will keep the page posted until the original maintainer (Florus) can re-upload his version.

Friday, March 13, 2009

As I promised, I've been working on verb paradigms. As of now, they're up and running. Just search for any Latin word and click "see the complete paradigm".

Most verbs show up just fine, but of course some irregular verbs will show odd glitches. Therefore the data is "beta" but the paradigms should still prove helpful.

Here are some caveats:

certain irregular verbs will have weird forms, for instance, the participles for esse (which didn't exist until late antiquity).

deponent verbs will show active forms. Remember that deponent verbs do have active participles, and the imperfect subjunctive is formed from the "reconstructed" active infinitive. I'm trying to imagine a way to "gray-out" the unused active forms, but I haven't decided fully on that yet.

as a result of deponent verbs having "active" forms, they are now stored in the dictionary in their active forms, although on flashcards they will still show their deponent forms. So for instance sequor will be searchable under sequo.

unusual forms, such as dic, duc, and fac will show up as dice, duce, and face. I haven't implemented and "irregular forms" system yet, even though I've half mapped it out. UPDATE: It turns out that Plautus was fond of using forms like dice, duce, and face even though they were later rejected by Terence.

UPDATE: Some forms which are not known to exist (in other words, we don't have a record of them) but can logically be deduced will show up on the paradigm charts. For instance, the rare future active participle of volo, voliturus shows up and so does it's non-extant future active infinitive voliturus esse. Many grammar books will not show these forms simply because we don't have a record of them. Nonetheless, it is logical to assume they existed or would have been known to exist during Roman times (at least in theory).

Saturday, February 7, 2009

Based on the recent lack of news on this site, you might assume that the Latin Lexicon is dormant or stagnant. Yet nothing could be further from the truth! In fact there's been quite a bit of behind-the-scenes activity!

Let me start out by apologizing for not updating more often. This semester has turned out to be rather more packed with excitement than the last one. Since I've been short on time, I allowed blogging and site documentation slip.

In terms of back end coding, there hasn't been much activity. I've cleaned up a few bugs here and there. For instance, I cleaned up a UTF8 bug on the Word Study Tool.

But in more interesting news, I've been a busy beaver correcting words that appear in Lucretius' De Rerum Natura Book III. For the last 5 semesters our classes have focused on Augustan poets. Since the vocabulary is somewhat stock among those guys, I ended up doing very few corrections to the dictionary entries themselves. But since Lucretius uses a whole new set of vocabulary, the amount of "cleanup" is massive! This is a good thing, since words which contain errors get fixed and the Latin Lexicon slowly improves in quality.

Since the news/blog section of this site gives the first impression to new visitors, it might look bad if the front page isn't regularly updated. Good impressions are the best impressions, so I'll try to update more regularly despite the crazy-busy semester I'm having.

On that note, it's back to the grindstone for me. Valēte!

Update: To the parsing engine I added a couple of pronouns: quisquam and quidam, since Lucretius is so fond of them.