Syncing large datasets over low bandwidth links

They needed to synchronise their ESRI GeoDatabase out to near on a hundred clients, over an unreliable cellular or 3G network. We’re talking thousands of features, containing up to several hundred thousand vertices each. That’s when my phone rang.

“Lance” they cried, “we’ve already tried the provided ESRI syncing frameworks, and people are shouting at us because we can’t get it to work. What can we do, and how soon can you get here?” A couple of days later saw me on the red-eye to the north island to find out what we were dealing with…

“Number one” they started, “We need on-demand syncing of near on a hundred mobile clients. Some clients will need data no less then five minutes old, while others may only sync once every several months.” Ahh, scratch ArcGIS delta files. That would fill up the server storage quick smart. Let’s see, at five minute intervals, that’s 288 delta files per day, or 8640 per month. Add to that the fact that they have to be applied in the correct order, would be at best painful, at worst heart attack stress inducing. Lastly we’d best remember the been counters, who would have to fork our for an Enterprise GeoDatabase license for each client (at over $1,000 a pop).

“Number two” they continued, “it would be great if we could deploy an entire dataset from scratch to any client, at any time.” Right, I thought. Out of interest, we’re now talking 105,120 delta files that we’d have to store after the first year. Worse, we would have to store all delta files forever, if we want true on demand syncing. This is getting better.

“Number three” at this point, they were shuffling and looking down at their feet, while mumbling something about the universe conspiring against them. “It has to operate over 3G mobile networks, where coverage really isn’t very good.” Ahh, there it is, the real fly in the ointment which would make this a great challenge, and a significant project to work on. We can now also scratch ESRI’s on-line replication. While OK in LAN environments, it’s useless if you don’t have a nice big fat pipe between client and server.

“Oh, just so you know, we’re talking about au aoafakjfak aiufa”. No I didn’t understand either. I think they were mumbling in the hope that we could skim past the clarification. “Huh I beg your pardon?” I politely inquired. A little louder this time, but also very fast, I think hoping that if when said, it fits into the space of one clarification, I wouldn’t notice there were several hiding in there. “We’re talking about thousands of features, containing up to several hundred thousand vertices each, and we need the system to support multiple projects, each having distinct data sets.” They stopped, and I looked at them, waiting for more, but that appeared to be all.

“OK, leave it with me to think about” I told them, “I’ll see what I can come up with.” Now in the interests of openness, I’ve done a lot of work with synchronisation, specifically over low bandwidth links, so have a fair idea of the basic architecture that would work for them, but wanted to be sure I’d thought the whole thing through rather than presenting a half baked solution.

The flight home that night had my head gears grinding. Thinking back over previous projects, what lessons could I bring to bear on this problem. It’s odd, how on these occasions a single diary page is never big enough for all the diagrams and scribbling, and you have to make use of last Sat & Sun, which are always empty. None the less, by the time the wheels touched the ground, I had settled on an architecture, I just needed to do some research to prove that my hunches would in fact pay off.

And pay off they did! One of the great benefits (and nightmares) of ArcGIS is their ArcObjects framework. Anything that needs to be done can be done in code. The fact that the core products are built with the same tools developers have access to is a huge bonus. The solution consisted of three components. A custom ASP.NET web service, talking to an ArcObjects layer, which queried and retrieved data directly from the corporate GeoDatabase. On the client side, we had an ArcEngine (which was already installed) process, making connections to the web service, requesting synchronisation data, and in turn updating the client geodatabase. Because we had full control of the implementation, we could create custom compression and filtering tools to reduce the size of the transmitted data packets. We enabled clients to adjust how they sync, based on the quality of the link they’re using at the time, so that we could get the best balance between fast and reliable, for each client individually.

Needless to say, the end result worked a treat. It’s been running for over a year now, and is constantly being expanded and rolled out to new projects, without further effort, paying tribute to implementing the right architecture the first time.

Do you have a unique synchronisation problem? Talk to me, I can show you the right solution.