Search This Blog

Google AppEngine : Lazy Data Migration with Versions

If you have already used Google AppEngine to develop an application, you would have already scratched your head around data migration. And Google App Engine is not that great when it comes to Agile style iterations of web development. In its own way, it forces you to design models up front instead of making life easier evolving over time.

I have been developing a community application, 'Yes to Politics' for friends in Andhra Pradesh, India to interact with politics in some strangely different way. More about this app later, let me share about data migration.

I have around 28000 entities in a model and trust me, I tried to cover all my needs in the design of the model up front. Well, software development doesn't work that way. I realized I needed another property in the model and I needed to give some default value too.

When you add a new property to an existing model, Google App engine doesn't fill the default value for the existing set of entities. So when you access that new property on existing rows, you will meet with exceptions as they never existed for them. You can check if that property exist before accessing it. Well, Python provide no way to do it. The only way to find whether that property exists or not is by accessing it and catching an exception when it doesn't exist. Not a cool way, but that's almost the best method we got.

And, you can not use that new property in queries too that need to search in the existing set of entities. There is a way we can fix it.

Define your new property and then loop through all your entities and set this new default value. This is no easy job to do on Google App engine. First, you need to setup a new URL and a HTTP handler to take care of this maintenance. Second, you can only update so many entities in one go without exceeding restrictions in terms of time and processing power. So you need to split the task in to easier bits, say, update 10 entities at a time and create a handler to auto refresh every few seconds to take care of all updates. And then run that handler in the browser and wait till it is finished.

I have about 28000 entities and that would mean I have to call that handler almost 2800 times (10 entities at once) and better give some wait between calls to make App Engine restrictions happy. For my model, 5 sec between each call worked fine. Any quicker, App Engine throws an exception. And it took about 25 minutes for me to finish the process.

I thought it will be anyway one time task so didn't regret waiting that long. But then after all that is done and happily using it for a couple of days, I found that I had to add yet another property.

This time, instead of doing that hard way, I have decided to do something different. Instead of adding the new property I was thinking of, I added a simple integer that will act as a version number for the entity. I followed the same as above and waited another half hour to get it updated. And for the actual property that I wanted to add, I added the new property but deferred updating the property with a default value until that entity is being used. The lazy way. So I just need to update my queries with this new version number logic but don't have to really update all my data at once.

I realized later it was an excellent move. As not all my entities really need that new property added. Whichever entity needs it, will get it when it is accessed for the first time. Your data migration now becomes highly scalable. Now all my models, I begin with a version number, so that I never have to worry so much about data migration when I decide to make changes to my models.

This is not without a downside. We have an additional property in the model and a little overhead of a version number comparison every time entity is accessed. If you are continuously adding properties that may not be required for all entities, then the storage space you save could easily outweigh this new additional property.

But, you decide whether the flexibility of scalable data migration it provides is worth the weight and hassle.

Popular posts from this blog

“There are about 6.7 Billion people in this world that we know of. Whether you believe in ‘Creation’ or ‘Evolution’, this human race started with a tiny number. It is quite amazing to see how fast it multiplies. What is more amazing is that every single individual in that 6 billion crowd is born ‘unique’. Quite literally, you are born to be one in a billion, whether you believe it or not. “ This was the Introduction to my latest and last speech in Toast Masters club, ‘One in a Billion’ as part of International Speech contest.
As much as I believe that each one of us can be that 'one in a billion' personality, I admit the reality as I perceive it and some times feel alone in that belief.
A famous quote says 'You are what you think'. It is also true that 'you are what you think people think about you'. If you think people think you are smart, then you act smart and become smart. If you think people think you are dumb, you will become dumb even if you are not, a…

When you are writing software, you always get a second chance. In fact, lots of chances to get it correct. You have compiler warnings, failed test cases and some times crashes alert you that something is not right and will give you a chance to correct. And you get literally unlimited chances to apply those corrections.
Well, cooking looks to be totally unforgiving in this respect and on any given day, you may get just one chance to get it right. If you fail, you fail. Try again right away if you have patience of starting it all over. Or start over some time later or next day. But not much of a second chance to correct a mistake.
More ruthless, when it comes to salt. If you put just a little more, even a tiny little more, it never hesitate to show what it got. Totally ruthless. End result will be a failed dish that no one will be able (and/or happy) to eat. And most dishes, you may not be able to add something little more to offset it.

I have recently spotted quite a few places where NDTV title doesn’t exactly say the same as the details in the article says. Lost in translation? or just plain twisting for journalistic sensationalism?Title says “'AAP doesn't treat women as humans,' says founder member Madhu Bhaduri as she quits”, but the quote in details says, slightly differently: “In this party, women are not considered humans” (see the text highlighted).Source : NDTV.comYou may say, they effectually mean the same thing. Is it? Even if they mean the same, Why not use the same exact phrase in both places?