If you want data that is not in the API, please let us know what you are looking for. We would prefer to provide a simple delivery mechanism rather than have you scrape data.

You can't have everything at once however because that is simply too much data. The cache table is just under 1GB and the logs are just over 8GB. We can probably provide the basis of caches and logs without descriptions and text as that's the largest parts of the data.

We would not be in an easy position to provide the graphs in data / download form. The class we use doesn't cater for a data output. It assumes we are always drawing a graph. This can be done, but the question comes back to effort for return. If you can draw the raw data and perform your own calculations you can avoid delays in the future where a new piece of data is required.

If you do want the data as you have asked in the OP, then you can use the API.

You can change up the type to see what you want. It may be a little more effort to set up, but you then have a very powerful data provision tool for your analytics. You can then use these geocache codes to drill down into the cache attributes and logs.

I doubt the cache description and log text are important to you for statistics purposes so I'll see if I can include an "exclude" parameter to stop you having to download all of the text data for the caches and logs. Then you should be able to use the API for just stats.

I was just thinking of making a caching dashboard, and I'd like to pull the counts etc of queries... rather than the list of the caches etc.

eg... I want to write something that tells me every day "You have x caches left in your home state, there are x active moveables you haven't found based off your query etc.. there's x caches total.. etc etc etc"

Keep in mind that's me being a total novice at this sort of stuff, and for me it's a great way to learn. I agree though - the API is much better than a scraper.

There are a number of ways you can achieve the same outcome, some more painful than others

We have been looking into having these sorts of user generated statistical queries run against the database but we always come back to a root issue in that if someone screws up the way they construct the query, they send the DB into an endless search and retrieve. A poorly constructed query of logs and caches could return up to 5,000,000,000,000 rows.

Of course that only covers those statistics which we have precalculated. We can always add interesting statistics that can be downloaded by this method but you will need to wait in a queue for the developers to build them.

We are also looking at providing two data sources. Once of logs (without the log text) and geocaches (without short and long descriptions). If we provide these as CSV files, ZIPped, they should check in around 10MB or less. I haven't checked what the size would be in JSON form but I'm thinking that CSV might be better for the non-technical to manipulate in Excel. We would probably restrict access to once a day (to avoid flooding) and would be restricted to GCA geocaches only (all GC caches would be too much).

This might not address your requirements though as it will only look at GCA caches. We are not likely to ever produce access to the raw data for GC geocaches. The data is our to use, but not ours to share.

If you are interested in exploring some more of the GCA statistics, please let me know what you would like and we'll see if we can make it happen.