Much of the valuable linguistic data that has been collected over the years is languishing in filing cabinets and is not immediately available to linguists and interested members of the public. We (Gray & Greenhill) are using this data to construct phylogenetic trees with computational methods adopted from evolutionary biology to test hypotheses about Pacific settlement. As part of this project we have “computerised” a large amount of lexical data, and constructed a large scale comparative database of this vocabulary. This data began with a collection of Swadesh lists collected by Blust over the last 20 years, and has been supplemented with lists from many other linguists and published resources. This Austronesian Basic Vocabulary Database is available on the internet at http://language.psy.auckland.ac.nz, and currently has word lists from 481 languages, for a total of over 100,000 entries. We shall describe some of the technologies required to build a repository such as this, and talk about the benefits of releasing data onto the internet for collaborative purposes. Finally, we will discuss our plans for expansion and consolidation of this database and make a special plea for more data. A few results from our recent analyses will be presented along the way.