LessThanDot

Less Than Dot is a community of passionate IT professionals and enthusiasts dedicated to sharing technical knowledge, experience, and assistance. Inside you will find reference materials, interesting technical discussions, and expert tips and commentary. Once you register for an account you will have immediate access to the forums and all past articles and commentaries.

LTD Social Sitings

Note: Watch for social icons on posts by your favorite authors to follow their postings on these and other social sites.

There’s been a lot of buzz about the cloud over the past years, with a lot of that attention going to IaaS and SaaS platforms, but there’s a revolution (or re-revolution) that is of even more importance, and that’s PaaS. What PaaS brings us is the ability to scale horizontally and treat CPU, memory, and storage as pools of resources that are as deep as our checkbooks allow.

Forget about virtual servers. Remember that 60 hour job with a 24 hour deadline? Built on a PaaS platform and equipped with a couple hundred dollars, you won’t even be staying late today.

Today’s post is going to share highlights from a basic file processing application. Something you would find (hopefully better written) in any random enterprise IT shop or SaaS company. It offers a web page that lets you upload files, a button to process files, a basic and poorly written list of the processed and unprocessed files, and an unattended worker. The trick is that this application was written on top of Windows Azure, so I can play tricks with time just by twisting the dial from one file processor to twenty.

The Basic File Processor

The file processing in the sample application is intended to be a sample workload. It consists of reading files completely into memory and passing them around, spinning through them one character at a time, replacing each character in the line with it’s upper case variant. Very critical stuff, very performant.

In addition to running the process via the website, I also need an unattended application that will can run the same processing function. If I owned the server, this would be a scheduled task or service. As an Azure Worker the code will be remarkably similar.

Architecture of the Processor

The two front-ends access common logic in the Core library, which is responsible for both the processing logic and interacting with storage resources. This being sample code, it is certified as working on my machine and is definitely not production ready. That being said, I did write this in a few evenings, so writing a production-ready service doesn’t have to take that long in normal workdays.

The Web Site

The website has a single MVC controller with 3 actions:

~/Home/Index: Displays the list of processed and unprocessed items and buttons for upload and processing

Windows Azure offers a number of storage options, each with their own benefits and constraints. For this process I decided to use table storage to track the summary level information about each file processing job, blob storage to store the actual file, and the queue service for managing task execution.

The ItemStore class is built to interact with interfaces for each of these resources, using a single IStorageLocator interface to get instances of those resource interfaces. The class (and application) was driven by the small set of unit tests that helped me define how i wanted the process to work and interact with the resources above.

Configurations

With all of the pieces defined, we use a pair of configurations to tell Azure how we want to deploy everything.

The first configuration defines the services we intend to package and deploy as well as the instance size and any endpoints:

Note that I’m telling it to use the local development storage, which is supported by a local storage emulator. In a production configuration I would enter the service location and a generated token.

So Where’s the Magic?

So where’s the magic that makes this a distribute application instead of 3 days of overtime? It’s sprinkled throughout the system.

The architecture of this system would work just as well outside of Azure, provided I offered it stand-ins for the 3 storage resources and deployed the instances and any necessary settings accordingly. Instead of worrying about how to manage deployments and what to use for centralized queueing and storage, I can focus on building an application that simply assumes those resources are available. Is there headroom for performance improvements? Sure, but I can also choose to throw another $15/month server at it, push data to CDNs and blob storage, add caching, or even a SQL Azure instance.

This application may be fairly basic, but nothing stops us from following this same pattern for much larger applications. PaaS has removed some of the constraints we take for granted. Even applications that have to run in-house in order to standardize against a database can now consider uploading a subset of that lookup data to a table store, performing most of the heavy lifting in the cloud, then produce a few files to import back into the on-premise system. The total execution time would be longer, but being able to scale part of the job across numerous parallel instances means the actual elapsed time can actually be much shorter.

And it means when we have a 60 hour task that’s due in 24, it’s no longer an impossible situation.

The source code is available on github along with requirements and links for setting up the emulators locally.

About the Author

Eli delivers software and technology solutions for a living. His roles have included lone developer, accidental DBA, team lead, and even unintentional Solaris consultant once. With experience in adhoc, Lean, and Agile environments across NSF grants, SaaS products, and in-house IT groups, he is just as willing to chat about the principles of Lean or Continuous Delivery as he is to dive into Azure, SQL Server, or the last ATDD project he created.