I have about 30 MB of textual data that is core to the algorithms I use in my web application.

On the one hand, the data is part of the algorithm and changes to the data can cause an entire algorithm to fail. This is why I keep the data in text files in my source control, and all changes are auto-tested (pre-commit). I currently have a good level of control. Distributing the data along with the source as we spawn more web instances is a non-issue because it tags along with the source. I currently have these problems:

I often develop special tools to manipulate the files, replicating database access tool functionality

I would like to give non-developers controlled web-access to this data.

On the other hand, it is data, and it "belongs" in a database. I wish I could place it in a database, but then I would have these problems:

How do I sync this database to the source? A release contains both code and data.

How do I ship it with the data as I spawn a new instance of the web server?

How do I manage pre-commit testing of the data?

Things I have considered thus-far:

Sqlite (does not solve the non-developer access)

Building an elaborate pre-production database, which data-users will edit to create "patches" to the "real" database, which developers will accept, test and commit. Sounds very complex.
I have not fully designed this yet and I sure hope I'm reinventing the wheel here and some SO user will show me the error of my ways...

BTW: I have a "regular" database as well, with things that are not algorithmic-data.
BTW2: I added the Python tag because I currently use Python, Django, Apache and Nginx, Linux (and some lame developers use Windows).

Thanks in advance!

UPDATE

Some examples of the data (the algorithms are natural language processing stuff):

World Cities and their alternate names

Names of currencies

Coordinates of hotels

The list goes on and on, but Imagine trying to parse the sentence Romantic hotel for 2 in Rome arriving in Italy next monday if someone changes the coordinates that teach me that Rome is in Italy, or if someone adds `romantic' as an alternate name for Las Vegas (yes, the example is lame, but I hope you get the drift).

I'd term this a resource, which is data that your application relies on, but not the data your application manages. Images, CSS, and templates are similar resources, and you keep them all version controlled.

In this case, you could split out your data into a separate package. In python distribution terms, use a separate egg that your application depends on; package deployment tools such as pip and buildout will pull in the dependency automatically. That way you can version it independently, you can have other tools depend on it, etc.

Last, choose a format that can be managed effectively by a source control system. That means a textual format. You can initiate parse that format on start-up, but by keeping it textual you can manage it properly through changes to it. This could be a SQL dump (CREATE TABLE and INSERT statements) to be loaded into a sqlite database on start-up, or some other python-based structure. You can also choose to load data on demand, caching it in-memory as needed.

Have a look at the pytz timezone database for a great example of how another resource project manages such structures.