Five short links

DynamoDB - I can't tell you how excited I am to see Amazon's new hosted NoSQL service. I so wanted SimpleDB to be usable, but it was too hard to work with thanks to the way it required clients to deal with implementation details like servers and sharding. I ended up using S3 for some projects, but if I was starting Jetpac today I'd seriously consider going with Dynamo rather than self-hosted Cassandra. The main drawback for our requirements is that we couldn't run Pig scripts across the data, but I'd imagine analytics will come.

Unsupervised Decomposition of a Document into Authorial Components - Uses machine-learning techniques to figure out the authorship of books from the bible, in a way that matches the results of traditional biblical scholars. I love applications of computer science to the humanities, I think there's a lot of ground for cross-fertilization.

The state of NoSQL in 2012 - An insightful look at the past and future of the new wave of database systems, from someone who's been in the trenches working with them for years.

Auto scaling in the Amazon cloud - Describes how Netflix keeps up with demand by automatically creating and destroying instances based on usage. The CPU utilization graphs show effective they are at managing the process, but they also hint at the work that's required to build AMIs for all services that can be spun up automatically.

Google Plus Scraper - There's no API yet to most of the interesting bits of Google+'s content, but a lot of it is available publicly to web crawlers, and the information is delivered as a giant JSON array embedded within the page, so it's surprisingly easy to decode. The biggest pain is the space-saving variant of JSON that Google use, which leaves out explicit null values in arrays, so you get ['a',,'b',,,'c'] instead of ['a',null,'b',null,null,'c'].