4 Answers
4

If you're interested in analyzing the data, you can use the same sort of queries that you'd use in MySQL in SQLite. I'm also working on making the script I used to import the data work with other databases, so keep an eye on the (unofficial) forum for updates.

If I'd use PHP for the import, I wouldn't use preg_match_all but XMLReader.
The XML files all have a similar structure. Each element <row> stands for one record and the attributes correspond to a field of that record. Each attribute is present in all records. Now all you have to do is to create tables with field names that are exactly like the attribute names in the XML files and write a simple script that puts the value of an attribute in the table field with the same name.

With the MySQL Connector/Net and System.Xml.XmlReader it should be possible to do almost exactly the same thing with C#/.NET.

I wouldn't put too much effort in error handling, schema testing, "but what if", etc. here. If you need something beyond "just fiddling with the data" there are other tools; you could even use Hibernate for that ;-)

VolkerK: How long does it take for your code to write all the files to the database? I ask because I'm writing something in C# using the MySQL Connector/Net and System.Xml.XmlReader as you mention at the bottom of your post, but this takes hours to run (about 10 hours writing one statement at a time, and somewhere between 3 and 4 i think when writing 500 per query - e.g. Insert into table values (rec1), (rec2), (rec3). ). I'd like this to be faster, but don't have a baseline for normal times. I've got a question on this site regarding fast ways to do this, but haven't gotten much of a response
–
AgentConundrumSep 24 '09 at 7:30

I just let the php script run and it took <20 minutes to import the "Sep 09" data dump on my BenQ S73G notebook (T2400, 1GB ddr2, all on one hdd). Windows XP, PHP 5.3.0, MySQL 5.1.37, MyISAM, no indices. And though this is only a dumb import I think you should be able to beat both the execution time ...and the hardware ;-) the votes table took the longest (8 minutes), followed by posts (7 minutes)
–
VolkerKSep 25 '09 at 0:49