Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

Many times when trying to come up with an efficient database design the best course of action is to build two sample databases, fill them with data, and run some queries against them to see which one performs better.

Is there a tool that will generate (ideally straight into the database) large (~10,000 records) sets of test data relatively quickly? I'm looking for something that at least works with MySQL.

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Shopping list question - questions about which tool, library, product or resource you should use are off-topic here because they quickly become obsolete and often are just about the preferences of the answerer. If you have an issue with or a question about a specific tool, please revise your question to conform to that scope." – Aaron Bertrand

This might be a 'shopping list' question, but leaving it open is a useful resource, as these sorts of questions come up quite often. (just happened on StackOverflow-- stackoverflow.com/q/30427116/143791 )
–
JoeMay 24 at 18:56

I typically generate my own, using some known data as input -- if it's too random, it's not always a good test; I need data that's going to be distributed similarly to my final product.

All of the larger databases that I have to tune are scientific in nature -- so I can usually take some other investigation as input, and rescale it and add jitter. (eg, taking data that was at a 5 min cadence with millisecond precision, and turning it into a 10 sec cadence w/ milisecond precision but a +/- 100 ms jitter to the times)

...

But, as another alternative, if you don't want to write your own, is to look at some of the benchmarking tools -- as they can repeat things over and over again based on a training set, you can use them to insert lots of records (and then just ignore the reports on how fast it did it) ... and then you can use that same tool for testing how fast the database performs once it's populated.

For anyone looking for a different solution to this problem...
I wrote a test data generator project for Data Synchronisation Studio. It can generate a large dataset ranging from 1 to 100s of millions of rows of realistic testing dat. Here is a blog post all about it. http://www.simego.com/Blog/2012/02/Test-Data-Generator-Download-for-Data-Sync It's free to use for 15 days (once you have your test data, you have it)