Does anyone know of a website where I could find - in full dictionary format - the 1400 words that are reputed to make up the core vocabulary for at least 80% of any Latin text.

I have a list of the words not in full dictionary format which is not nearly as useful e.g. manus is listed but without genitive, gender and meaning and this would be so much more useful to me in full dictionary format.

It would doubtless be good for the soul to fill out this information for myself but if it already exists, why reinvent the wheel??

I already subscribe to the Textkit vocab service but will follow up the other two.

You are welcome. And please note - MORE vocabulary lists are in the works, as well as Greek. I've got two additional books I'd like to extract the vocabulary from & post into the vocabulary service. One book from 1919 (?) gathered it's words from a NY state exam list IIRC.

I already subscribe to the Textkit vocab service but will follow up the other two.

You are welcome. And please note - MORE vocabulary lists are in the works, as well as Greek. I've got two additional books I'd like to extract the vocabulary from & post into the vocabulary service. One book from 1919 (?) gathered it's words from a NY state exam list IIRC.

And I'm currently working on a military vocab list from N & H "Latin Prose Composition" but I am very slow about it.

Back in 1939, Paul B. Diederich compiled and submitted to the faculty of the University of Chicago, for the partial completion of the requirements for his master's or doctorate (I forget which), a 100-page or so book of Latin words with frequencies. He explained the method he used to compile it and, in the back, gave a selection of some 1400 words with translations grouped by theme as a beginner's vocabulary. If I remember right, he only gave the stem of the words for paedogogical reasons of his own.

All that goes by way of introduction to this, A Dual-Source Database of Word Frequencies in Latin compiled by James H. Dee. It integrates the results of his work and that of another man, and presents the results in the form of a plaintext or Excel document. However, to get a list of the most frequent, I think you'll have to do your own scraping, and for translation...well, you could see about dumping it through the Words program and capturing the output. If you can find a copy of the two sources for the database, you might do better to work with those; as I said, the Diederich one has a vocabulary in the back selected to cover some 85% of all word occurrences. (Well, I didn't give a number before, but I believe it is somewhere around there. After that, the percent increase per word learned goes down a bit too much.)

The Diederich one was compared to the College Board vocabulary list given for the Latin test they must have been administering at the time, so some copies of that may be floating about, as well.

dhaaz wrote:All that goes by way of introduction to this, A Dual-Source Database of Word Frequencies in Latin compiled by James H. Dee. It integrates the results of his work and that of another man, and presents the results in the form of a plaintext or Excel document.

Let's see if I can dredge up my knowledge of Intellectual Property from when I studied it at university. The principle in copyright law, in the US as well as in the UK, is that copyright is not available for the "mere sweat of the brow". In one case for instance, a company tried to copyright a telephone directory. It failed because the work must involve a minimal level of creativity:

Reading the report of that case it seems adding extra words to the list would be quite sensible.

I'm reading the De Bello Gallico anyway, and I've made a spreadsheet wordlist for that. I think it would be wise to conflate the two. Copyright lawsuits can be nasty. Did you know for instance that they are one of the few circumstances in which English courts will award punitive damages? Nasty stuff.

And here's another list to be getting on with...at least six weeks work I think...