There are a couple of competing approaches to representing the RDF data model in JSON. W3C seems to favor JSON-LD, but the alternative RDF-JSON refuses to go away. Until now, we have been following the W3C towards JSON-LD. We have come to the conclusion in the last week or so that we are probably backing the wrong horse. At some significant coding expense, I have now converted our code over almost entirely to RDF-JSON. We have retained one critical concept from JSON-LD that we use when storing the data in the database. RDF-JSON works so nicely that you can think of it simply as a preferred programming interface to RDF data. You can think of JSON as a wire format, but the wire format of JSON also corresponds exactly to the syntax for dictionaries and arrays found in modern programming languages like Python, Ruby and JavaScript. I think this is why JSON became so popular - it is a simple wire format and elegant programming model all at once if your language is Python, Ruby or JavaScript (and probably others I don't know). The way that RDF-JSON represents RDF can be thought of as simply a very natural and convenient way of organizing RDF into dictionaries and arrays for programming, even if you are not interested in JSON.

For those who are interested, here is a little more detail on the difference between RDF-JSON and JSON-LD. The basic format of RDF-JSON is this:

{ "S" : { "P" : [ O ] } }

Where S is subject, P is predicate and O is object. In general, there will be multiple "S"s, multiple "P"s for each "S", and multiple "O"'s for each "P", so the thing looks like a tree. O is actually a structure too, so that you can record datatypes and other properties of the values. If you understand RDF, this is very attractive in its simplicity, and very convenient to code to. If you don't understand RDF data, it probably looks strange, since regular Object-Oriented folks only expect to see:

{"P" : ["O"]}

JSON-LD started with the idea that we should try to keep the structure as much like the OO one as possible since that is what most people know. So they took a different approach to introducing the "S"s. If you only have one, subject, you introduce it by having a special property (a "P") - "@id" - that carries the value of the subject. Since you can have multiple subjects, you end up with this:

[ { "@id" : "S", "P" : [ "O" ] } ]

where there are many other "P"s for each "@id". The "@id" is a special value of "P" that is used to hold the subject.

It turns out that JSON-LD is a pain to code to, because whenever you want the triples for a subject, you have to perform a loop to find the dict/hash/assoc-array with the matching value for "@id'. In RDF-JSON, you just index the outer dict with your subject URL. However, it turns out that the JSON-LD organization of data works better for query in the database, because there you do want the subject urls to be values of some predicate - in databases you query on predicates, and you index predicates. JSON-LD has a lot of additional features which we don't need, but we just ignore those. Our database format is not really JSON-LD compliant though, because JSON-LD managed to screw up the way they represent the "O", where RDF-JSON did it better, so that is what we use.

So where we ended up is this:

In memory, and on the wire, we use RDF-JSON, pretty much exactly the way it is in the spec

In the database, we use fundamental data organization from JSON-LD, but with the "O" part from RDF-JSON, and we ignore all the other JSON-LD complexities (contexts and so forth).

I actually think JSON-LD is doomed. It tries to appeal to OO people by pretending that they can stick to their OO view of the world and not learn the RDF data model. IMO, this won't work, because if they stick with their OO view, they don't need any of this stuff at all. My prediction is that JSON-LD will lose with the OO folks who will view it as a lot of complexity that does not make sense from their world view, and it will lose with the RDF folks who will see that RDF-JSON is a lot simpler and more straightforward for RDF, so JSON-LD will appeal to nobody. But I'm grateful to them for pointing out a basic data organization for RDF that fits well with the way JSON databases like to query - we may or may not have figured that out on our own.

The "new architecture" I described in my last post is holding up fairly well. The first application I converted was the simple "business cards" application. I am now converting Frank's "Lifecycle Concepts" application to the new style. We have made a couple of other changes along the way - I will blog about those separately.

Meanwhile Frank is off exploring the possibility of using different technologies for the site query and search engine. Our current query and search engine - LQE - is based on Jena. The good news with Jena is that its query language - SPARQL - is very powerful. It allows you to express complex graph queries. Philippe Mulet pointed out a blog post to me that notes the parallel between what graph queries allow and the sort of backward chaining you see in PROLOG. Many people are very nervous about Jena performance. I believe - without real evidence - that SPARQL engines will always be too slow because we can't resist exploiting the power of the query language and hence our queries will never run fast enough no matter what the implementation. Philippe showed me a SPARQL query he was was working on trying to speed up. It is a Union of a filter query, with a 2-way, a 3-way and a 4-way "join" or "graph traversal". This is an example of the sort of thing that makes me pessimistic that SPARQL will work for us. Frank is currently exploring the possibility of using both MongoDB and Elastic Search as alternatives. Both of those will restrict us to simple filter queries, so the sort of complexity you see in the query Philippe was trying to help optimize will not be possible. For those sorts of applications, we would explore other options - perhaps Hadoop or one of its competitors. I think this may force us to think of those sorts of functions as being "batch analytics" and reset our expectation of how they are executed.

Dave Tropeano continues to explore the mobile world. I'll let him write his own blog post. The mobile world may have even more competing frameworks and approaches than the web world. We are still hoping for a sort of convergence where a single approach can cover both web and mobile presentation tiers. I have a lot to learn there still.

At the beginning of this project, I thought it was important that the top-level design [many software developers would say "the architecture", although this is ungrammatical] of our applications should support a strong separation between the presentation logic and the domain logic. Reasons include the fact the user interface changes at a different rhythm and is often developed by a different team with different skills. I still believe this separation is important, but our interpretation of it has changed recently.

Our initial interpretation was that there needed to be a presentation server, separate from the domain logic server. The following diagram tries to illustrate this:

In addition to any philosophical objections, we have found at least two practical objections to this:

1) We have been inconsistent about what we implement in Javascript in the browser as opposed to Python/Ruby in the presentation server2) We have tended to implement presentation servers to the exclusion of logic tier servers, since doing both is usually a pain, at least at the beginning and for small applications

We have decided we can simplify and improve this model by enforcing the following rules

Another way of looking at this is to say that we put all the presentation tier in Javascript in the browser, and we get rid of the server-side presentation server completely. All we need on the server side is a server that serves up html/javascript/css to the browser - it does not require any logic of its own. The "pure" form of this would be to make all the pages static, in the sense that they are rendered directly from files without server-side processing. I believe this would work - I tried it - the Javascript logic on the page reads the URL from window.location, and starts the page processing by doing a GET to the logic server of the same URL asking for application/json instead of text/html. However, there are some philosophical and practical downsides to this "pure" approach. The philosophical objection is that in the theory of the web, the html is supposed to be a representation of the resource, but in this approach it contains absolutely no data specific to the resource and so fails the common-sense test of fit to the theory of the web. The practical consequence of this is that these html pages are completely useless for any kind of indexing by standard search engines. Our solution to this problem is to add the data of the resource to its html representation on the server. This means that, on receipt of a request for the html presentation for a resource, our presentation server will perform the following:

Read appropriate static html/javascript template for the resource

Send a request to the logic tier to retrieve the data representation of the same resource

Merge the data into the template

return the result

As a reminder, we always use the same URL for both html representations and data representations of the same resource, which is part of what makes this work easily. Some resources don't have any backing data - e.g. a "new entity" form - in which case steps 2 and 3 are skipped. The result of this change in design is that we still have a sort of presentation server, but its behavior is completely standard - it does not vary from application to application. What programmers actually create are

html/javascript templates that will be merged with data and the result loaded into the browser

Domain logic servers that store and retrieve data representations of resources and implement domain logic rules that constrain them and implement side-effects

This reverses the situation we had before where we tended to create a presentation server for each application and use a generic domain logic server. Now we use a generic presentation server and write a custom domain logic server. Our new diagram might look like this:

This might seem a bit academic, but for me it represents a fairly important shift in how I think about how we structure our applications.

We spent some time this week refining Julian's "Life-cycle Concepts" conceptual model. We ended up with some changes to what Julian had written. Some of these changes were simplifications in order to create something attainable for a first milestone, but some of these were simply that there were places where we liked our own ideas a bit better than what we saw in Julian's work. I also chose a slightly different way to express Julian's concept of a Role in UML - we are not changing Julian's design there, we are just using a different way of drawing it in UML which we hope is clearer. Our current UML model is included below. Frank has made some progress towards a working prototype of some significant parts of this. Dave Brauneis is working on the foaf:Person implementation. We are following the standard W3C model for people and their URLs. In this model, each person (as well as other real-world entities) has a URL. I have mine already (http://martin-nally.name) but many folks don't yet. If you do a GET on the URL for a real-world entity, it should not return a document - that would imply that I am a document. Instead, it should 30x redirect to a different resource where you can find information about me. This is the pattern Dave is implementing. [I spent a few hours this week fixing the redirect for http://martin-nally.name, which had been broken for a long time since Cox stopped hosting personal web sites. You can try it now.] In preparation for implementing ourt logon story, I did some cleanup and refactoring of the "site-server" - our reverse proxy that delegates onto the applications that implement particular resources of our site. In particular, I changed the way our server loads its configuration to make it more modular, easier to understand and easier to tinker with for testing. Dave Tropeano is making progress on an implementation of a mobile presentation tier for this application.

The first week back was consumed with an internal IBM conference and organizing roles and staffing for the new year. The good news - from my perspective at least - is that we will be continuing this cloud exploration project even though I am back full-time at IBM.

A significant decision we took this week was to start to implement a more realistic application, or more accurately a suite of applications that together look like one web site. There were two obvious alternatives - pick something inspired by our own product needs, or pick an IBM customer problem. The latter would have many benefits, but it would us take longer to find a suitable customer organization and get their agreement to share a problem specification with us, so while this might be the preferable longer-term approach, we have decided to defer that for now. For now we will focus on a Rational product scenario. Our choice is to implement what we call "life-cycle concepts". Life-cycle concepts is a term Rational has coined for those concepts that are common to almost all software development tools, and specific to none. Life-cycle concepts are the things that need to be shared and understood by all the members of a development team, regardless of whether they are programmers, testers, requirements engineers, project managers or whatever. Examples are the concepts of projects, products, releases, milestones, team members, accounts and so on. Note that it is not enough to just share the concepts across the team, the actual instances need to be shared too. Because of this, these instances do not really belong in any particular tool (bugs, tests, requirements, code, ...), but need to be referenced by them all.

Working towards this goal, we started constructing a simple wire frame for a UI for these concepts. A design for the names of the concepts and their principal relationships was developed last year by Julian Jones et al and we are building on that work. Since we are programmers with little UI design skill we have been consulting others, most notably Kim Peter, one of Rational's lead UI designers, who has been generous with her time. Currently we are looking at implementing this function in 2 or 3 separate applications (or "services" if you prefer):

An application that manages accounts and their passwords and logon. Pretty much every site needs some sort of account management, even if it is just an implementation of a strategy to delegate to accounts implemented elsewhere (e.g. Google or Facebook)

An application that implements the URLs for people and the web documents that describe them. The most common practice is to conflate people and accounts, but this practice has some significant drawbacks in a more distributed world. The deluxe approach is for each person to identify themselves everywhere with a web-registered URL - mine is http://martin-nally.name and I pay a web registrar $10 or $20 a year for it. This "service" will define URLs for people who choose not to take the path I did.

An application that implements projects with their milestones and products with their releases. Defects are logged against releases and tasks contribute work to milestones.

Another common need for a site is to implement common navigation bars and other common UI elements. This week we implemented an initial version of a service that does this. It needs much more work, but we hope our initial version will set us in the right direction from both the provider and consumer perspective.

Lastly we have made some amount of progress towards a document that describes the high-level implementation design (or "architecture" if you like that word) we propose for our site and we hope more generally for similar sites. we'll POSt a copy here when we are a little further along.

Many years ago I read a scientific report of a study of multi-lingual stroke victims. The study found a very peculiar phenomenon. A portion of the stoke victims had learned multiple languages (2 or more) from birth, while another portion had acquired their polyglot skills later in life. When the people who had learned multiple,languages from birth had major strokes, they would often loose one of their languages completely, while their abilities in the other languages remained unimpaired. When the people who learned languages later in life had a major stroke, they would suffer impairment, but not total loss in all their languages. The conclusion of the study was that the organization of language in the brains of the two groups was quite different. Children who learned multiple languages from birth seemed to partition up their brains, using different parts for different languages, while people who added languages later spread their languages throughout their brains.

At the time, I took this as being proof that very young children learn languages in a different way from older people. I have seen several studies since that tend to support this conclusion. For example, I have read that past the age of eight, it is very difficult or impossible to be truly native in a foreign language, and your ability to learn a language from a different language group from your own (say Chinese if you are a native English speaker) diminishes rapidly after the age of 3. There are some languages that are so difficult that they have never been successfully learned as a foreign language and are only spoken by native speakers who started at birth.

So what made me think about this? Here you are going to laugh at me. Over the last month or so I have been trying to learn and write in 3 different computer languages - Python, Ruby and Javascript. Learning each language has not been difficult, but what has been more difficult has been keeping them separate. As I switch from one to the other, I often mistakenly attempt to apply an idiom I learned in one language to the other. As I was thinking about that, I suddenly realized that I may have drawn the wrong conclusion from the stroke victim study. One difference between the two groups is that one group learned multiple languages when they were very young, and the other group learned them when they were older. But another difference is that the first group learned multiple languages simultaneously, while the other group probably learned their languages serially. Is it possible that putting different languages in different parts of the brain is the mind's solution to the problem of learning multiple languages simultaneously without mixing them up?

I think we need a followup study that tries to locate polyglot stroke victims who learned multiple foreign languages simultaneously later in life to see if they exhibit the same brain compartmentalization as those who learned simultaneously as infants. I wonder how many of those there are and how to find them. I imagine Santa Claus is one of them - presumably he speaks all the languages. Happy holidays to all of you.

I reimplemented my application pretty quickly with Sinatra. Sinatra is light and simple - you can even read and make sense of the source. Things went quickly and I started having fun again. How nice.

I found another benefit with Sinatra. Rails had generated a skeletal application for me and I worked within that structure. When I moved to Sinatra, I started questioning whether or not that was the right structure. In the Rails application, a GET produced an HTML page that POSTed back to the rails application for create, update and delete. The Rails application then turned around and made REST calls to a "business logic server". It occurred to me that this wasn't really ideal. Now my Sinatra application only handles GET. Javascript on the HTML pages in the browser now send POST, PATCH and DELETE messages directly to the "business logic server", bypassing the Sinatra presentation server completely. One of the things that makes this possible is our use of a reverse proxy so that the Javascript can do this without falling afoul of browser server-of-origin rules. Nice.

I also removed another pattern that I had inherited from Rails. Rails likes "model objects" that have instance variables and accessor methods for fields, and Rails takes whatever the original data was - usually relational data - uses it to fluff up model objects and then throws the relational data away. In the other direction, it takes field values from the objects and reconstructs data for the database. Many programmers think of their objects as being the "true data" and the database is only there for the purpose of storing their objects for the unfortunate case where the server crashes. This leads to the object-to-xxx-mapping category of software. I prefer the opposite view where the data in the database (or other back-end) is the "true data", and the application's job is just to manipulate the data, not define it. So I changed my application to keep the original "back-end" data all the way through the program. My "back-end" data is JSON-LD, and instead of converting it to objects, I just wrap the JSON-LD in objects that provide some helpful methods for reading and writing the JSON-LD. I don't keep the JSON-LD in string format - I convert it to hashes, arrays, strings and numbers, but the representation is faithful. No more Object-to-XXX-mapping.

I got the application working locally pretty quickly - about a day including changing the overall design and implementing some Javascript code - my first experience with that language. I also got it deployed and partially running on EC2 in about a half hour. I can't get it running fully on EC2 because it depends on Frank's reverse proxy at OpenShift that seems to be having some problems. I might have to wait til the New Year to finish it.

So now I have finally deployed my rails application and got past the prerequisite problem using the bundler. Will it now run? Of course not. Now it is trying to connect to a relational database, even though I don't use one. For reasons I do not understand, running the bundler has changed the behavior of my application. It appears to be running some more aggressive start-up initialization that wants to connect to the database I don't have. I wasted many more hours on this and even tried to get help from the friendly crowd at stack overflow.

You might think this proves the value of a PaaS platform, since I did not have these problems with OpenShift, CloudFoundry or Heroku. Why did those work? My guess is that the reason they worked is that they load up a "kitchen sink" image with everything in it including the relational database interfaces I don't want or need. I'm guessing that if I loaded them up too, my app might actually work as it did before packaging. But I have a better solution .... I am done with Rails, hopefully forever. I am going to redo my application using one of the "micro-frameworks" that are much lighter than Rails, or perhaps with nothing more than a basic Rack interface, analogous to the WSGI approach we took with Python.

One positive side-effect of finally getting Git to work was that I got
my bloated application moved quickly to the EC2 instance. Probably a
more important benefit is that I understood better what is actually
happening at OpenShift. I had previously had the impression that when I deployed my application code to OpenShift, it was really going to OpenShift. What I now think is happening is that when I define an application at OpenShift, Openshift creates an Amazon EC2 instance for me that will be the instance that runs my application. When I deploy code with Git, I'm not really deploying to OpenShift, I'm deploying directly to my own target "production" virtual machine created for me by OpenShift. OpenShift defined the image that my instance was made from, and OpenShift put on it the libraries for my selected programming environment and services and some scripts that run to configure it and to react when I deploy code. I also believe that OpenShift's strategy for dependencies is to build a rather large image that has "everything you could ever want" on it rather than to tailor the image to my particular choices of programming libraries (gems, eggs, etc.) or even cartridges (MySQL, PostreSQL, MongoDB, RabbitMQ etc.) I can't be sure of this, and I would not be surprised if they also have an "incremental add" capability, but it seems to fit what we have seen.

And my surprising conclusion? Learning to create my own Amazon EC2 instance, populate it with the software I needed, and configure it was somewhat painful, and I'm far from being really competent, but now that I feel able to do this, I am disinclined to go back to OpenShift, CloudFoundry and Heroku. I have the same aversion to those things that I have to frameworks like Rails. So long as everything is going smoothly, and the framework is working and meeting your needs, it feels good to be helped by the framework. As soon as you need something unforeseen by the framework, or as soon as anything goes wrong, you are immediately in a debugging hell as you struggle to understand the complexity of the framework and then fight it. I believe that I can now see how to create a set of relatively straightforward deployment and configuration scripts that would be usable for a whole class of similar applications within an organization. I would certainly be duplicating function that I can get from a PaaS, but the effort does not look huge to me, and the benefit is that I have independence and control. Most importantly, my solution can be only as complex as I need it to be, whereas frameworks always trend to a level of complexity that tries to cover the union of their users' needs. In my opinion, the biggest single problem in software is management and limitation of complexity, and here I think I can do better without a generic solution.

It is possible that as I learn more, I will see more value in these PaaS platforms. For now I am a skeptic.

I decided that since Rails/Bundler had now bloated my application to a size that it was difficult to deploy with a simple copy over ssh, I would see if Git could do it better. The answer is that Git is spectacularly better, even on the first deploy, and it's even more spectacularly better on redeploy. Unfortunately getting it to work was somewhat painful. One problem is that you can only easily "push" code into a "bare" Git repository that does not have an associated "working tree", so you can't actually see the files. There are some discussions on Google of how you can have your cake and eat it too, but they were beyond my comprehension. The simple workaround I used was to push into a bare repository, and then clone the repository locally to see the files. This sounds and feel a bit hokey, but it works fine. The bigger problem was to make Git actually do the push over ssh. As is often the case, the solution is not too hard once you figure out what is really happening, but it took me the best part of a frustrating day to figure this out. Here is my current understanding of how it works and what you have to do.

Git works over ssh (secure shell) protocol which uses public/private keys instead of userid/password for security. Git also knows how to work over https, which is how we use it with Github, but for working with Amazon EC2 virtual machines, ssh seems to be the way to do it. Git on Linux uses standard ssh software, for which Windows does not have an equivalent, so Git on Windows provides its own. We have already used ssh, for example for deploying with Git to OpenShift and for accessing our application on OpenShift (which is really on EC2). When deploying to Openshift, we use Git, which uses its own ssh software, but when accessing our application, we were using an open-source Windows ssh utility called Putty. (I know some of you are very familiar with all this). Although Git and Putty use different ssh software, they use the same keys, which are by default in your user/.ssh directory. As far as I can tell, for Git on Windows, your key must be in this directory, and further it must be called id_rsa.ppk. If you are on Linux (and also Mac, I think), there are simple techniques for storing multiple keys, putting them in different places and having the standard ssh software used by Git pick them up. On Windows, however, you cannot do this, and since I was already using ida_rsa for OpenShift, I was stuck. Actually, the real reason I was stuck was I didn't understand what was going on and how things were supposed to work - once I figured that out, finding a solution on the web wasn't so hard. The fix is 2-part. The first part is to tell Git to stop using its own ssh client and to use the one bundled with Putty, called plink. Once Git is using plink, you can use another Putty application called Pageant to manage multiple keys, and since plink knows how to talk to Pageant, it all works. Simple, eh?

It is perhaps worth noting that we would have come across this problem with PaaS platforms too - we got lucky with OpenShift because it was the first and was therefore able to take ownership of the id_rsa key.

All my Rails application does is implement some html pages for managing a rolodex-style application, with storage being through HTTP REST calls to another server. It is about as trivial as you can imagine. Thanks to Rails, my application is a complex beast requiring a dozen or more gems composed of over 4000 files and taking up almost 50 megabytes. If I'd just written the application from scratch, its size would be measured in kilobytes, not megabytes. I believe that as my application grows, it will likely benefit from more of the heft that Rails brings, but still. And in the meantime, a side-effect of all that Rails complexity that is beyond my skills to manage is that I can't get my application deployed. I really don't like frameworks.

Part of the instructions for deploying a Rails app to OpenShift and CloudFoundry is to run Bundler. The basic idea of Bundler seems to be to gather up all the libraries (in Ruby they are called gems) your application depends on and to deliver them with the application itself. This seems like a good idea - it avoids having applications co-dependent on shared libraries, and ensures that an application runs with exactly the libraries it was developed and tested with. Unfortunately I find Bundler hard to understand and use, even though it;s command are simple. The initial symptom I found was that my Rails app would not run because Bundler had recorded the fact that my application used rake 10.0.2, while 10.0.3 was the one installed as a shared library on my virtual machine. Further, there was no available ubuntu package for rake 10.0.2. It was annoying that my application would not run, because Rake 10.0.2 was in fact in the directory structure of my application, in vendor/cache/, carefully put there by the "bundle install" command when it recorded the fact that I needed it. I believe that Bundler may be working as designed here, but I have not yet figured out how the whole thing works. What I think may be the case is that bundle install's job is only to copy the required gems into the cache directory so they will flow to the target on deployment. A further step is then required on the target to assemble the bundle from the cache. Just guessing. I then discovered that there is a option of the 'bundle install" command (--deployment) that might fix my problem, so I ran that locally. [With hindsight, it might have been better to do it on the target.] Bundler then created a new structure in vendor/bundle that was rather large. When I discovered that this structure was now going to take over an hour to copy to the target with (p)scp, I decided I needed a better solution. This sent me on a whole new adventure that can be the subject of another blog.

Having completed our little multi-server application and deployed the pieces on various PaaS platforms (OpenShift, CloudFoundry and Heroku) we decided it would be educational and character-building to try to deploy some of these pieces onto virtual machines on Amazon's EC2 cloud. We figured that by doing it directly ourselves, we would gain a greater appreciation of the value the PaaS platforms like OpenShift are bringing us. We probably spent over a week doing this, and I won't deny that it was sometimes incredibly frustrating. Nevertheless, I think it was a very useful thing to do and it led me to a surprising conclusion. There were some days of pure frustration during this effort where nothing I tried worked, and every attempt I made to get around a problem led to a new problem. Here are some examples. I never did manage to install rails successfully on one of Amazon's own base AMIs. I cannot remember now what the problems were I ran into, but with my very limited unix skills it was tough and I gave up. I found a site with clear instructions on how to do this for an ubuntu release that seemed to closely match one that I found on an Amazon AMI. [There are plenty of Amazon AMIs available that already have the full Rails stack on them, and if I had picked one of those, I would probably have had a much easier time, but that would have been cheating - I wanted to learn what it takes to build up "from scratch".] Even following these instructions proved painful, because they did not entirely work. I'm not sure if that is because of subtle differences between my starting image and theirs, or because the instructions were wrong, or perhaps because the instruction glossed over details that someone more experienced than me might have been able to overcome quickly. Nevertheless after a day or two of struggling I was able to build a Rails stack on top of a base Ubuntu AMI, and then build a script that would reproduce the result whenever I wanted. One thing I learned is that this is not a very quick process - it takes an hour or two for the full script to run, so you wouldn't want to build from scratch every time you wanted to launch an instance - you would have to freeze your instance as a new AMI, or pick one that someone else had built. The software I ended up installing included RVM (manages Ruby installs), Ruby 1.9.3 (The base image had an old Ruby level on it), NGinx (an alternative to Apache), Passenger (links NGinx to Ruby), Git, and a few smaller things that were necessary for one reason or another. I also had to download my code, configure the we server and start it.

Sadly, it turned out that this was the beginning of my troubles not the end. More on that later as well as my surprising conclusion

It took a few days to do it, but we did finally figure out how to write a
Rails application that did REST access to its back-end instead of RDBMS
access. Once it was written and working locally, deployment to OpenShift was fairly straightforward. Deployment to CloudFoundry took a bit longer, requiring us to learn how CloudFoundry worked and to work around some Ruby version mismatch issues, but it we got through it in a day or so. Dave Brauneis deployed the app on Heroku also without difficulty. OpenShift was a nicer experience in general than CloudFoundry. OpenShift has a nice web console where you can go and look at things, which CloudFoundry.com lacks - with CloudFoundry, the command-line tools are your only option. Also, OpenShift ran a sort of build and packaging script automatically for our Rails app while CloudFoundry gave us a list of instructions of things it wanted us to run locally before uploading. For a Rails application, at least, the experience was not quite as easy and slick as they advertise in the brochure, but it's something you can work through in a day or so.

The obvious upside of PaaS is that you do not have to take responsibility for constructing virtual machine images with all your system pre-requisites on them, add your applications and then launch the images. A downside we have experienced is that the vendors' choices of pre-requisites can cause problems. We initially developed a couple of servers using Python 2.7. When we came to deploy, we first picked OpenShift, which it turns out supports only Python 2.6. We then learned by trial and error which of the libraries we had used are missing in Python 2.6 (OrderedDictionary and json were two). We then developed a Ruby application using Ruby 1.9.3, which happens to be the level supported by OpenShift, so we did not have problems deploying there. Sadly, when we deployed the same application to CloudFoundry.com, we discovered the highest version of Ruby they support is 1.9.2, which does not have one of the classes we were using (Net::HTTP::Patch, in case you are curious). You might remember a previous post where I mentioned that neither CloudFoundry nor Openshift support either replication or sharding with MongoDB, so if those are important to you, you could add them to the "list of downsides" of PaaS. One of the things we plan to do is to make our own virtual machine images for our applications and deploy them on AWS ourselves. If nothing else, the experience will give us a better appreciation for the value we are getting from the PaaS vendors.