Public Data Sets on AWS provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications. AWS is hosting the public data sets at no charge for the community, and like all AWS services, users pay only for the compute and storage they use for their own applications.

I believe this would be a good way to provide the Stack Exchange data dump in an easily accessible form. My questions are:

Does anybody here have experience with preparing and submitting a data set to Amazon for publication?

Does Amazon support publishers specifying their own license (such as the cc-wiki used here)? [I suspect the answer is yes, since there's a lot of Wikipedia data on there under the same license.]

How well does Amazon support a public data set that has updates every few months? What's the procedure for providing an update? Does the new data set replace the old one, or are multiple snapshots kept?

I'm willing to put in the work to make the data set available on AWS every time a new one is released. I asked the SE team about this and the response was basically "Interesting idea, ask on meta", so I'm now soliciting input.

My own situation is that I create a few things every time the data dump is released:

The data dump is getting fairly large, and takes a while to download (to this remote Pacific island with bandwidth limits and monthly data caps). I also might soon need more compute horsepower than what I've got available with my own hardware, since it now takes a few days to run through all the processing. I'd like to use EC2 to do the actual data crunching. I could of course just transfer it myself, but making it widely available seems like a good idea.