If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Unanswered: "Freshest" data

The setup:

A SQL server database consisting of 252 tables. Each of which contains several million records and will continue to grow at a rate of about 1mil per month. This data is captured for a particular year and month and is captured for a specific entity. So a (really, really) simplified table structure is:

There are obviously many other fields but those are the ones that are used to sort or find information.

What I need to do is automate (as much as possible) a process of taking the last data captured per entity and putting it into a completely empty database with the same structure.

Entities may have last been captured in January 2012 so it's not as simple as just selecting the last rows by date and dumping them into a DB.

I'm happy to do this manually by SQL or SP etc and equally happy to have some awesome tool that does it. Maybe something that I can get to synchronise the latest data with the criteria of code, year and month?

Thanks for any help, much appreciated

Notes: I cannot change the structure of the database, I cannot change indexes, I really do need to have a separate database - not my choice but there it is

“If one brings so much courage to this world the world has to kill them or break them, so of course it kills them. The world breaks every one and afterward many are strong at the broken places. But those that will not break it kills. It kills the very good and the very gentle and the very brave impartially. If you are none of these you can be sure it will kill you too but there will be no special hurry.” Earnest Hemingway, A Farewell To Arms.

[quote] A SQL server database consisting of 252 tables. Each of which contains several million records [sic] and will continue to grow at a rate of about 1 mil per month. This data is captured for a particular year and month and is captured for a specific entity. So a (really, really) simplified table structure is: [quote]

I know it is a skeleton, but rows are not records, columns are not fields, auto-increment table properties cannot ever be a key, there are no generic “id”, no generic “code” and temporal values are not split. Nothing is right here. This narrative does not even have a amen! Let me guess that the GTIN is the “<generic, nothing in particular>_code” so we can get an international standard and that we can use the MySQL month-within-year temporal duration notation.

There are obviously many other fields [sic] but those are the ones that are used to sort or find information. What I need to do is automate (as much as possible) a process of taking the last data captured per entity and putting it into a completely empty database with the same structure.

Why? This mimics how we moved punch cards and mag tape records to new physical storage. A database is supposed to model a “universe of discourse”, so you want to create an entire, separate parallel new universe.

>> Entities may have last been captured in '2012-01-00' so it's not as simple as just selecting the last rows by date and dumping them into a DB. <<

what date?

Notes: I cannot change the structure of the database, I cannot change indexes, I really do need to have a separate database - not my choice but there it is

Can you explain why you must use a physically separate database? That kind of thing is usually a legal requirement, since it makes no sense from an RDBMS viewpoint.

A SQL server database consisting of 252 tables. Each of which contains several million records [sic] and will continue to grow at a rate of about 1 mil per month. This data is captured for a particular year and month and is captured for a specific entity. So a (really, really) simplified table structure is:

There are obviously many other fields [sic] but those are the ones that are used to sort or find information. What I need to do is automate (as much as possible) a process of taking the last data captured per entity and putting it into a completely empty database with the same structure.
Why? This mimics how we moved punch cards and mag tape records to new physical storage. A database is supposed to model a “universe of discourse”, so you want to create an entire, separate parallel new universe.

Notes: I cannot change the structure of the database, I cannot change indexes, I really do need to have a separate database - not my choice but there it is

Can you explain why you must use a physically separate database? That kind of thing is usually a legal requirement, since it makes no sense from an RDBMS viewpoint.

Thanks for the assist so far - OK, so here's the complete overview, perhaps it'll help to understand

The people in question run a warehouse of data which contains all the information for every school in the country. When I say everything, I really do mean it - up to and including when someone was off sick and why as well as the stock movement for every piece of stock ever bought. So, the warehouse is fairly large.

The underlying application at every school in the country basically dumps unfiltered, unsantized data into an access database with some pretty wacky structures (hence the simplified table structure). There are now 401 tables and they all have anywhere between 10 and 100 columns in them. There is no such thing as lookup tables, foreign keys etc, everything is stored as text.

Each month, every school (+- 5700) send their access databases to a central point and it's dumped into the warehouse, increasing the number of rows per table by ~1 million rows per month.

Because the warehouse is badly designed (since it's just an exactly duplicate of the access database but with less restraints ), indexing doesn't work and queries are slow - in the order of a couple of hours in some cases. Unfortunately, this is not something I can change or even contemplate changing until the political whackjobs in charge of it realise that *something* is drastically wrong here.

So that brings us to the solution required. A catalogue of "freshest data" per school. A school may have last submitted their data in March 2012 or March 2014 or even October 2008. Whatever their last submission is is considered their freshest data. The logic behind this for the company is that this new database will be properly indexed and will result in much faster queries since the dataset will be much, much smaller.

Obviously, I have defined useful views that give me data such as the last submission period per schoolcode but where it all falls over is that running something akin to the following pseudosql is insane for ~5700 schools (given an array of schools):

So, suggestions? There are some bits of software out there that seem to allow synchronization but they're usually only schema based or they don't allow constraints such as the ones I must use. I'm open to anything at this point

“If one brings so much courage to this world the world has to kill them or break them, so of course it kills them. The world breaks every one and afterward many are strong at the broken places. But those that will not break it kills. It kills the very good and the very gentle and the very brave impartially. If you are none of these you can be sure it will kill you too but there will be no special hurry.” Earnest Hemingway, A Farewell To Arms.