Rapid7 Blog

PaaSt Times

POST STATS:

SHARE

Boy was I naive. When I first learned about the term PaaS, I threw it away quickly into a metaphorical garbage can. Of course you need your servers I thought. This was about five years ago. But now, I realize how silly that was.

While IaaS is still the primary backend for applications, there are many out there that contain a combination of PaaS and IaaS, and a rare handful that are 100% PaaS. And for both of these scenarios the importance of log management and analysis remains.

PaaS is an abstraction of the server. A services layer that can run independently of any IaaS and provide specific functionality like a NoSQL backend, MySQL, workers etc. The nice thing about PaaS is that all infrastructure considerations are up to the vendor. And in fact, the infrastructure your PaaS services are running on probably changes frequently.

The downside to PaaS is that if the need arises to get server level access for special patches, add-ons, configurations, etc you can’t. So for whatever PaaS solution you use, it will need to be vanilla. Unless you have the resources and breadth to build an internal PaaS solution, which is also possible.

There are, usually small, applications which are 100% PaaS. For example a stack could be:

Azure Websites – Application Layer

GitHub – Source Repository

MongoLabs – NoSQL Backend

ClearDB – MySQL backend

Azure Cache – in Memory key/value store

Azure Machine Learning – data manipulation

Visual Studio Online – release automation

With the above stack you will never touch a VM, in fact you cannot.

A lot of tooling in the market is also PaaS. For example for the end-user Logentries is a PaaS solution. It would be more commonly considered SaaS, but because they have an API for log ingestion and consumption it’s also “PaaS.”

A hybrid scenario is where a similar stack to the above has components on IaaS. Such as AWS compute running a manually configured instance of MySQL that has to integrate with all the other PaaS services.

In IaaS it is obvious how to setup, and what to log in your log analysis system. You get admin level access to your VM. You install your agents. And then you pick your data sources. But with PaaS you cannot do this. This does not mean you should not log your data however.

Especially in the hybrid scenario where you need to correlate data from the other services with your infrastructure data. This is especially important in 100% PaaS solutions where you need to know what is going on with data you may not necessarily have access to. But how do you get PaaS into your log analysis system?

The biggest challenge in getting logs out of your PaaS solutions is that the ideal implementation, the solution provider gives you the logs directly, the setup is unique for each. And you are walking on ice with vendor changes.

For example it is cool that Microsoft provides you logs for Azure websites. You have to setup and retrieve the data in visual studio, but it is all there, and pretty easy to consume. In this case you would most likely write a separate application, on your backend, to periodically download and store these logs in your log analysis service. This might be similar to what the agents do for IaaS, except you are writing the agent. If you want to be really tricky you could setup a worker on Azure Worker Role, or AWS Background Task in Elastic Beanstalk. PaaS to log PaaS.

And with other solutions, especially DBs, you can download, via a REST call, XML or JSON formats of all transaction logs over a short period of time. Here you would make two calls simultaneously to download, and store into your log system.

This is a lot of work, and there is an easier way. And that is to create your own wrapper for PaaS logging. This is a very nice way to log every call to, or event received, from your PaaS solution. And it would live within the application itself. You can use a log variable that always takes the result of every call to the PaaS solution, or an even more comprehensive way is a function that takes the call, and logs all actions around it, even start, and end times.

If you plan ahead you can actually extend all your functions for your PaaS components to do this automatically. So after it’s set-up the developer doesn’t even need to think about it. The hardest part is making sure it’s in your standard design pattern or corporate development standards. And making sure IT understands that a developer is going to need to set this up.

One of the biggest mistakes a PaaS consuming organization can make is to ignore the log data from their PaaS system.

If you do, your log analysis service is standing on one leg. And you can get misleading information about an outage. For example your PaaS provider caused an unhandled exception due to lost packets or request timeout. Without the proper logs this will look like your IaaS server receiving the result had an issue, but really it was your PaaS provider.

As the trend of hybrid applications grows, some PaaS and some IaaS, this all becomes even more critical. And instead of waiting for it to bite you, build a wrapper around your PaaS services so they can be logged along with everything else.