The SitePoint Forums have moved.

You can now find them here.
This forum is now closed to new posts, but you can browse existing content.
You can find out more information about the move and how to open a new account (if necessary) here.
If you get stuck you can get support by emailing forums@sitepoint.com

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

How to design a position tracking DB?

Hi all,

Whats the best practice for designing a system to monitor and track changes in an ordered list?

That probably sounds a little obscure, but I'm thinking of the scenario where you monitor over time changes in league rank positions for example - i.e. storing top 100 products over time. For now its best to assume this is a seperate DB entity, storing only this 'league'.

It obviously makes no sense to store the entire league over and over and then compare across different dates, thus I assume you just want to store changes as they occur? New entrants, change in position/rank, changes in composition maybe (new title?).

I've attempted searching but not found anything, any suggestions on how to describe this problem?

I'm a little out of my depth with this. Any advice would be much appreciated.

By implication this would mean that for every date the ranks were recorded a reference to however many items that exist would be added to the Rank Table? I assumed this would be inefficient and therefore thought there must be a better model.

The example I mentioned above was just an example, in reality I'd be tracking the position of thousands of items, some of which might never change rank. I kind of envisaged a system where only changes were recorded, but maybe this isnt possible?

I've yet to really decide how many results to track, as many as is feasible. really. Some of the data returned by the webservice will change from week to week, others will be more constant - I'm not sure whether it makes sense to setup a system to log just the changes (in both content and rank position).

if you're obtaining data from external sources, the best approach is to store as much detail as you can, even stuff that is available but you're not sure if you can use it or not yet

after that, running queries to track stuff can take any number of directions

but if you pre-calculate stats from incoming sources, and store only that, then you can never get the missing stuff out of it

another consideration is complexity

take for example your items table -- what if an incoming source contains a new item you don't have yet? you have to have logic to add it, right? that can get in the way of efficient data capture or storage

in your case, extract means to get the data you want from the web, transform is where you decide which items to track, and load (which normally refers to loading clean data into a well designed database) would be your queries to display the results of your tracking

Just did a quick search on this, pulled up some really interesting articles thanks.

Originally Posted by r937

in your case, extract means to get the data you want from the web, transform is where you decide which items to track, and load (which normally refers to loading clean data into a well designed database) would be your queries to display the results of your tracking

Does this suggest that a full 'data load' should occur everytime I query the web service for fresh data? Then use the date of the insert to compare any changes that have occured when tracking over time? Would this approach cause performance issues? For example in this case a URL would represent a unique item, with different meta data associated, selecting all items with the same URL, then comparing the meta data over each strikes me as intensive.