Incremental Data Fetching in Data Abstract for OS X

Bugs 7, the new bug-tracking application i have been working on over the past month for internal use here at RemObjects, employs several different data access paradigms (all based on Data Abstract for OS X), to accommodate the different nature of the data in individual tables.

The most interesting one is the main Issues table that contains all the bugs and tasks that are logged in the system. That is due to the fact that (a) it is a huge table (with over 25,000 records as of now, sized at about 11 MB when transferred over the wire, compressed) and that (b) by the very nature of a bug tracking system, this table changes frequently, and needs to be updated on the clients, often.

Rather then downloading the entire table anew every time, we optioned for a solution that allowed us to incrementally fetch only those records that have changed, and integrate them with the local dataset. This way, only minimal traffic is occurred for the regular refresh (which our client, by default, does every two minutes).

In the following post, i want to give you a quick glimpse at how this was accomplished, and how you can leverage the same technology in your own Data Abstract applications.

The Server

A couple things happen on the middle tier server (written using Data Abstract for .NET), to support the incremental refresh. Like all tables in the Bugs database, our Issues table has an UpdatedDate field, which gets automatically adjusted by the business logic code on the server. Every time a new issue is created, or an existing issue is updated, the server puts the current UTC time into the UpdatedDate field, clearly marking the order in which issues have been touched.

This is handled by a simple BeforeProcessChange event handler on the server, which simply adjusts the received delta, as such:

(Of course the actual code in our server performs a lot more checks and changes, to enforce business logic for our database – but that’s beyond the scope of this post.)

Also, our Issues table does not permit deleting of records (only closing of issues, which sets their status accordingly, but does not remove the rows from the database). This alleviates the problem of worrying about rows disappearing from the table altogether.

The Client

On the client, a bit more custom logic is necessary, to perform the incremental updating.

When the client application (“Bugs 7”) is first started, it checks whether a briefcase file with a local copy of the data is found from a previous run, or not.

If not, the client will start a request to download the complete set of data from the server. This is a one-time process, and will download the entire table with it’s (currently) 11MB across the wire. Once downloaded, it is stored in a briefcase file, so on next application start, the data can be loaded locally. After the download is finished, the application also takes note of the latest UpdatedDate value it can find in the table, for future reference. This is made easy by Cocoa’s KVC and Key Paths:

Whichever path was taken, the application now holds a local copy of the Issues table it can work with. The next step it to schedule the regular refreshes, and for that an NSTimer is configured, to fire at regular intervals, on a background thread.

This NSTimer will trigger our beginRefreshBugs method, which uses asynchronous requests to start checking for new issues. It uses the previously stored maxDate and a feature of Data Abstract called DA SQL, to fetch only those issues that have newly changed:

The DAAsyncRequest, once started, will communicate with the server in a background thread, without blocking the caller. beginRefreshBugs will return right away, and not wait for the request to complete (or fail).

Once the request did complete, it will call back to a delegate method (in this case we assigned self as the delegate, above), called asyncRequest:didReceiveTable:. It is here that we handle integrating the received data back with our big issues table by sending it the mergeTable:withPrimaryKey: message. This will replace the data in any rows that have changed, as well as add any new rows to the table:

The last step after receiving new issues is to update any affected views. This happens more or less automatically, as every view that shows one one more issues (whether it’s the regular grid view of issues, a chart visualizing issue data, or an individual issue’s detail view) will be have registered itself to observe DA_NOTIFICATION_TABLE_CHANGED notifications on issues. And like any other change to a data table, mergeTable:withPrimaryKey: will send such a notification if changes happened, allowing all views to update themselves.

In Bugs, all of this happens in the background, so over time the view(s) presented to the user just seamlessly adjust themselves, as changes happen – new issues come into views; issues resolved by other users disappear on their own, etc.

This topic just touches on a very small aspect on Bugs 7, which itself is part of a mch larger project, comprised of four different client applications (Mac and iPhone, based on DA/OS X, for Windows, based on DA/.NET and Gtk# and the Web) as well as a middle tier server. We will blog more about different aspects of this project over the next few months, ands we’re also working on a bigger case study, to appear at bugsapp.com, soon. Stay tuned to this space, for more.

9 responses to Incremental Data Fetching in Data Abstract for OS X

I look forward to the case study. Hoping to see some best practices when creating DA servers. The PCTrade sample doesn’t really show so much of what you can do in the server. The client samples are good though.

Will you provide a customer interface to the bugs database in the future? Newsgroups aren’t ideal for bugtracking :-).

Bjarte, yeah, i realize the PCTrade samples are pretty client-centric, as they are now. We’ll need to see what we can do about that – if you have any specific areas/features you would like to see covered better in the sample server(s), please let us know!

regarding the bug tracker, we are planning to open it up with a web interface eventually, yes. although it will probably remain read-only and (obviously) a subset of the full database. tbh we’re quite happy with how bug reporting works on the newsgroups, right now; i think it’s a great workflow, and it does help to keep dupes and otherwise bogus/unnecessary reports out of the database, as the devs doing the logging from the group (which is a one-click process) can do some pre-filtering – which is better than getting everything into the db and then closing lots of it as dupes or unnecessary requests.

but we’re always open to improvement there, as well. if we do something here (in the longer term), i could imagine a separate database where customers could log issues, which woud then get synced into the main tracker. similar to how CodeGear’s QC and RAID systems are split, with customers logging into QC, and those reports getting promoted into RAID, if they are good.

A couple of things I could think of.
It would be great seeing a sample that:
– Shows off all the Schema methods, NewCommand, NewDatatable, GetDataReader, ExecuteCommand, etc..
– Uses DynamicWhere/Parameters to filter DataTables on the server (i.e based on logged in user)
– Uses (Linq)LocalDataAdapter and LocalCommand
– Uses Roles to filter access to DataTables

Bug tracking:
I agree, bug reporting works very good.
But it’s a pain searching the ngs if you think you’ve found a bug and want to know if it’s a known one (because of all the other traffic in the ng). Also if you find a similar issue and it is given an issue number, there is no indication whether the issue is fixed.
Ideally there would be a read only (perhaps with commenting/voting) web interface where you could search for reports with status and (expected) fix version showing.

Bug tracking: yeah, that’s exactly what we’ll have, to start with. logging stays as it is, but there’ll be a web interface to view and search (public) issues. if you receive an ID based on something you post, chances are, it’ll be a public issue, and visible there.

and of course, we always had (and will continue to) list bug ids in the change logs.

Sounds good. I forgot to write something about also having easy access to issues you have logged yourself (private and public).
Ah, thanks for pointing out searching for ids in changelog, I ment to write it but I left it out for some reason :-).

If you plan on making the bug tracker available for us, please don’t follow CodeGear’s structure as it is, it has flaws. They have problems moving all information between the systems, and there is no good workflow toward the users.

I suggest you look at how Developer Express have done their bug reporting. That solution is very good.

Yes, we’r definitely not going to got the completely separated route of QC/RAID that CodeGear has, for bug reporting – i’m seeing more of a separate “queue” (with its own IDs) that external submissions go in, that will be part of the same database. Issues would get promoted from there into the main Issues table, and you’d get the real bug id (on our internal db) for further tracking.

i’m not familiar with DevExpress’ system, but will make sure to have a look.