Last I heard the data sets were to be updated once per quarter. You probably have to wait another month. But the promise to get live updates from a RO copy sounds promising!
–
Martijn PietersSep 2 '12 at 9:14

1 Answer
1

The data is updated early every Monday morning around 3:00 UTC

Update 9 Nov 2012 - Done!

Since our datacenter move has been delayed, I was able to use the new server as soon as it arrived. Data.SE now has fresh data and the import is almost fully automated. I'll finish the automation on Monday and we'll have a weekly refresh of data going on. On Monday we'll also get the graduated sites that are missing in there - we haven't forgotten about them.

There may be issues with this new build that we're unaware of - it's a whole new setup. Please comment here if you notice any issues.

Update: we're just waiting on the new server to arrive (ETA 2012-11-09) in Oregon so I can get Data.SE on an auto-update. We should be able to get it up about a week after it comes in (it arrives the same weekend as our New York datacenter move, so it will likely have to sit a few days).

To clarify/set expectations: for data.SE we're talking about a weekly refresh (that's my goal, since it has to be down for a moment when we reload fresh data and it takes time to run). The SEDE George (Zypher) was talking about in chat was actually our internal copy of SEDE (which is just raw data, not transformed or anonymized - much easier to do "live").

We call that "LocalSEDE" internally and just "SEDE" for the public data.SE. So while it will be much fresher, it won't be instant. To get data there we have to transform and move it, and we can't logistically do that continuously, at least not in the foreseeable future.

Rebecca is the one taking lead on this as I'm working on the infrastructure side, but we'll update here in case we run into anything that makes weekly an issue.

Also to clarify, for data.SE we're talking about a weekly refresh or so (that's my goal, since it has to be down for a moment when we reload fresh data).

We have uncovered a critical bug in our network setup paired with SQL Server 2012 clustering... We'll get to the data refresh when Stack Overflow and chat are no longer in danger of going offline. We're working with Microsoft on the issue now.

Data Explorer lives in Oregon, a faster data refresh is predicated on getting a replica up and running out there for all databases, which currently exists for Stack Overflow but not any other sites. To get a data refresh going we need to:

Get the bug fixed and deploy that fix across our network

Get all other sites moved to the second SQL Server 2012 cluster

Move chat to SQL Server 2012

Re-task the chat database server as a 2012 instance that SEDE will run on

Move SEDE to the new server

Re-write the data import process for SEDE to be totally automated and pull from the replicas

We plan on doing all of this, but our primary concern is getting our architecture fixed and moved to support this (for other, more important reasons like moving data centers really). Once that's done then we'll get to SEDE data refresh. Since we are shuffling hardware here and SEDE is dependent on data, it would be a large amount of throw-away effort to get anything working before the above is complete.

The result of this will be no more manual imports, but (hopefully) something like a weekly data refresh to SEDE rather than the somewhat random intervals it's updated on currently. I apologize that things aren't going as planned with our timeline, trust me when I say we are far more frustrated with this than you are. Bear with us, the result will be worth it. SEDE's getting fresher, more reliable data refreshes and a beefier database server to run on.

Since it sounds like you're rewriting the data import process, would you be able to include deleted questions in the import? Statistics are always skewed because deleted questions are not included in Data.SE.
–
RachelSep 6 '12 at 19:26

3

@Rachel - They are intentionally excluded (not a bug)...I don't think there are any plans to change this.
–
Nick Craver♦Sep 6 '12 at 23:41

Out of interest, why do you use MSSQL, not something like MySQL?
–
uınbɐɥsSep 7 '12 at 1:37

@NickCraver - not to be a nag, but it would be greatly appreciated if there was an update to this? Not sure about SO DE, but SFF one still has the posts from the latest date of June 27 2012. I know SE was doing some datacenter work last week, was that related to your bullets?
–
DVKOct 15 '12 at 1:14

@DVK we determined the chat server wasn't a good fit (old, no remote management), we'll be ordering a new one for SEDE hopefully this week then ill be able to move things over to that box. The datacenter work is tangentially related...that's around our moving the NY datacenter which is a huge company-wide effort that's eaten most of my time the last 2 months. I haven't forgotten about SEDE, it just has lots of prerequisites (which are mostly done) and some time, which we haven't had any to spare.
–
Nick Craver♦Oct 15 '12 at 1:55

1

@NickCraver - 6 to 8 weeks, then :))) Good luck with the move - I know how fun that can be. Appreciate the update!
–
DVKOct 15 '12 at 1:57

@DVK - Update: Server is now ordered, I'll get to work on SEDE as soon as it's up and running.
–
Nick Craver♦Oct 15 '12 at 20:30

4

So data had been borked for 4 months and there isn't so much as a banner warning about the extremely outdated data? Any idea of a timeline to have it fixed (either updated the "old" way, or the new system)?
–
Chris SOct 19 '12 at 16:10

@DVK Nag him! and also Rebecca Chernoff... they both need much more nagging! I can't do it all myself!
–
WardOct 19 '12 at 17:26

@ChrisS The date is published beside every site, so that at least is pretty visible.
–
Anna Lear♦Oct 21 '12 at 15:19

3

@AnnaLear Although people seem to overlook those dates somehow...I guess the resultsets could indicate somewhere that they're showing "Results as of..." or something, if that'd help any.
–
Tim StoneOct 23 '12 at 13:07