I have been working on making a completely offline dictionary using the Wiktionary XML dumps. The dumps themselves are about 10 MB, but when converted into a index using a search engine indexer (I use Whoosh Search Engine in Python), the complete index comes to about 250 MB. Which I think would be difficult to distribute, it might be zipped, still it won't come anywhere near 10 MB. And indexing takes about 1 hour in my system, so indexing while installing the software in a PC is tedious.

So I am looking for an alternate way of storing the words and meanings to make the dictionary. Which is a better searchable solution? May be some sort of Data Base that produces light weight DBs.

1 Answer
1

Have a look at the Directed Acyclic Word Graph data structure, which is designed to be a highly space-economical way to store dictionaries. They are commonly used on mobile phones, where economizing storage space is important.

+1 for introducing me to a data structure that's obvious when you think about it but cool nonetheless. Just looking at the name I thought "Sounds like a trie except…ah, clever".
–
Jon PurdyJun 19 '11 at 7:04

Any examples of Python libraries or implementations?
–
user22662Jun 19 '11 at 15:15