Ok boys, slow down a bit there. I know we are all in a hurry to capitalize on the new industry buzz around the “Clouds”. However, after taking a brief look at Hyperic’s new CloudStatus, it looks like they are riding on all the “hype” associated with the cloud rather than looking for real business impact that could be done with cloud monitoring. It looks like CloudStatus provides non customer specific synthetic monitors for the five major components of Amazon’s Web Services (AWS).

Elastic Compute Cloud (EC2)

Simple Storage Service (S3)

Simple Queue Service (SQS)

Simple DB (SDB)

Flexible Payment Service (FPS)

The first issue I have with the CloudStatus offering is that it doesn’t appear to be specific to any one client’s AWS implementation. It looks like they are doing synthetic monitoring of the generic AWS services from their own public and private servers. This might be very interesting to the blog-o-sphere to discuss outages at Amazon; however; it adds little value to a specific customer’s business impact/service. Generically snapshot-ing parts of Amazon’s infrastructure will not give any one customer a bird’s eye view of how their services are impacted. We have already seen where parts of Amazon or the AWS services were not available in some parts of their network and available in others areas. I am not sure how the synthetic and isolated nature of Cloudstatus latency aggregation will be useful to any one particular customer’s service. Let’s take a look at each of the specific monitored areas provided by Hyperic’s new CloudStatus.

EC2 Health

For EC2 Health CloudStatus measure the time it takes to start a small EC2 instance. They measure from the time the instance is started until it is available. This clearly is a rookie mistake. What do they mean by available? An instance that is available is not a service that is available. Here again I go back to my point, unless you are measuring from within the customer’s perspective, this measurement adds little more than “Hype”. The real measure should be when the service is available to the customer. Also, the service is usually not made up of just one instance. They are made up of tiers of servers and clusters. Yes Virginia, even in the clouds. The lag time between an instances being registered as up and/or being ping-able and a customer service being available can be a life time. Telling a customer when a service made up of multiple configured systems is up…now that is something worth monitoring.

S3 Health

For S3 Health CloudStatus simulates puts and gets to Amazon’s S3 from within EC2 instances. They simulate from both the US and Europe. Since most of the AWS services are housed on the east cost it is no surprise that the EU times are significantly higher. What does that tell us? I am not privy to the architectural infrastructure of Amazon’s S3 topology; however; I would think that picking one or two points from within their network to hit a specific S3 bucket that checks latency would not render real meaningful information. I would say this might be similar to an exercise of reading and writing to one SAN in a very large enterprise infrastructure, and using that as a metric to show the health of the whole IT infrastructure. Would this simulation give you reasonable results to base business decisions upon? Also, wouldn’t competing resources on the requesting application server render varied results. Unless this kind of monitoring is done from within a specific customer’s infrastructure, I am not sure how useful the business impact can be relied upon.

SQS Health

SQS Health simulates the round trip time it takes to put and get messages to and from the AWS SQS queues. A much better implementation for monitoring SQS would have been to implement a real queue manager from a customer’s perspective. This is something SmugMug has done to monitor their own queue manager (they use EC2/S3 but not SQS). However, they do monitor their queues and that kind of monitoring provides real business value. If a vendor is going to monitor SQS, they should look to how other vendors monitor queue managers. When I first heard about Hyperic’s CloudStatus announcement, I was hoping they would have included monitoring that gives customers this kind of information. Monitoring things like, how many jobs are backing up on the queue? Are they in a storm? Are there any dead queues? … For all the reasons listed above I believe Hyperic’s SQS Health is a novelty at best unless they can provide true business level feedback.

SDB Health

SDB simulates gets and puts to Amazon’s SimpleDB. I really shouldn’t have to explain why this is pure “Hype”. However, here is my spin… This the equivalent of doing a select him,her,them from youseguysDB and saying you are monitoring Oracle. Ignoring the fact that the adoption rates in SimpleDb are low, I just don’t see how this adds any value to the cloud conversation. The first vendors who figure out how to analyze applications like SimpleDB, CouchDB and BigTable like the way vendors monitor Oracle and DB2, are the ones who are going to lead the pack in cloud monitoring.

FPS Health

This is out right silly. No comment… except, did you notice all the zeros?

Summary

I will use the analogy of the movie Get Smart for this one. I went to see Get Smart this weekend with my boys, and I had very high expectations for this movie. The movie turned out to be funny and basically amusing; however, I was very disappointed because it wasn’t great. That is how I feel about Hyperic’s Cloudstatus announcement. Hyperic and Zenoss, IMHO, have been leading the non-hype charge for open source monitoring and until this announcement, Hyperic has been really walking the walk when it comes to aligning business activities with IT management. When I heard about the Hyperic announcement last week, I was really excited. I knew they could not solve all the “Cloud” problems in a first offering, but I was disappointed in how they went for the quick marketing win (Did you see their nonsense Video?). CloudStatus is free and that is a good thing; however, I wish they would have taken a more serious approach to this endeavor. I was hoping to see more of an emerging open source type project where they would lay out a few gems and then try to get industry input and involvement to grow the space. Instead they opted for the quick win and they used a massive PR machine to add a tremendous amount of hype to make this service look like it has some real business value. This in an already over hyped space (“The Cloud”).

James you should have at least looked at the Amazon’s status page before putting a comment. I hope you don’t offer such a crappy service to your clients/customers. Don’t you know that a green/red up/down lights is a historical thing

As for the bloger, you amused me with your analysis, I did not expect you would go to such a level to save your other company Mr James. I totally understand if you missed the issue cloudstatus.com was showing realtime when Amazon’s SQS service fell apart today evening around 5 pm . Anyways hope you learn something from here.

Amazon to monitor Amazon? They definitely have a role to play in providing data/capabilities to management tools. In that capacity, APIs are better than their status web site, but I have nothing against the site. In any case they can’t be your one trusted provider for availability and performance monitoring. Well, they can be but they shouldn’t be.

They need to be (i.e., Amazon). Basically we useally do rely on the vendors to provide us the guts. Perfmon from MS, AIX Perlib for IBM, SNMP from hardware vendors…

I think you are spot on about Amazon beefing up it’s API’s. In all honesty that is what I thought Hyperic was going to do w/Cloudstatus. I was hoping they would have started an open project w/Amazon to start getting the cloud vendors to think about IT management for the clouds. Today each of the vendors (e.g., Rightscale, Elastra, 3Tera ,…) have their own secret secret sauce for monitoring their clients cloud infrastructure.

I am sorry you didn’t get the full picture from the news yesterday. http://www.cloudstatus.com is a reference tool for developers and users to look at when they notice their performance is degrading in the cloud.

The genesis of the new product came of course from our existing install base. Hyperic HQ, our flagship product, has an ever-increasing number of open source users and enterprise customers running in the cloud. When our customers suffered some degradation in performance – they basically had no easy way to triage if the problem was their application or the cloud itself. CloudStatus fixes that.

The idea is to start with this free service, and then to expand it (we’re looking for forum conversation as well as working with Amazon and other cloud providers directly). The goal will be to have personalized versions of this that will work with your Hyperic HQ deployment in the cloud. This will line up your application with the cloud infrastructure its deployed against. That said, I do think that CloudStatus as a free service will continue to be extremely relevant for developers and users beginning in the cloud to benchmark their performance with independent relative trending of the services they are using. The real monitor of their application soup to nuts – will take Hyperic HQ and the future CloudStatus service.

Also – disclaimer on FPS, you should note it is actually the sandbox. Its noted in the service. This is because we are ramping up various locations and committing multiple transactions of $.01 each a minute. The bank constantly shuts this down suspecting fraud. We are trying to reason with them… but it may take time. This does bring up a great point though – many developers would probably line to see these trend reports in both the sandbox and production since they have various performance differences and it would be good to have the relative information for future capacity planning. You’ll also see that there was an outage in the sandbox on Friday, which definitely worried several developers that their tests weren’t working!

I hope that gives you a better picture of where this service is going. If you have any more questions, you know where to reach us!

Thanks for the update. First off I was disappointed to see that you guys yanked my comment on your blog. That doesn’t seem real open.

My point stands…

I do not see how looking at single synthetic points in a huge network like AWS can give you meaningful information to make business decisions on. In you PR yesterday Hyperic suggest that Cloudware does a lot more than what you have described in this comment (e.e., your Video and other Business references). If your point was to satisfy developers and the blog-o-sphere then you have created a good tool. However your PR, IMHO, suggests a lot more than that.

I look forward to see any future customer solutions that you guys add to the clouds. I have and am a Hyperic fan and will be the first one to point out (as I have done before) when you guys do something not Hype[PR]ic.

Sorry about the comments – there were 6 in there for moderation that I simply hadn’t gotten to since I am on the showfloor. It was actually bad communication on my part to offload that to someone else on my team while I am using my PC as a demo computer here at Velocity. Your comment is published now – I never deny anyone except spammers selling blue pills or printer ink or whatever.

As for how directly useful it is to your exact implementation, of course it can’t cover every corner, but we are ramping up and destroying instances all over AWS every minute and running a battery of tests with origination internally to AWS and externally. While we can’t cover the whole thing it is as the product page says a reasonable result of the general service levels.

I’ll address your EC2 comment first, “They measure from the time the instance is started until it is available.”

Well, this is an important metric that even Amazon has numbers on. If your application requires 30 instances up in order to be available, or your app just needs 1 instance + a ton of software, this number is still useful.

What we tell you (in the case of EC2) is how much time Amazon needs to process your request and give control of the instance to you. This is important! If that number were 20 minutes instead of 60 seconds, you’d be outraged! “How can Amazon claim elasticity when they are bringing up instances as fast as RackSpace can rack them?” “How can I respond to a surge in traffic if bringing up an instance takes an hour?”

And I think that’s where you missed the boat — CloudStatus provides normalized metrics that are applicable to everyone using the services.

We’ve distributed the monitoring of Amazon through many instances so we can provide many datapoints — not just a ‘single synthetic’ one.

Let’s move on to SQS. Say you’re writing an application and considering SQS. What kind of optimal performance can you expect? Well, if you look at CloudStatus, it’s readily obvious that you’re not going to get faster than a 3 second latency in your queue — and that’s something you can make business and application architecture decisions on.

Again, these are metrics which provide information about what you can expect from Amazon’s service.

If you want specific information about when your complex 30 instance application becomes available, then you need a different tool … might I suggest Hyperic HQ?

I think I am the only one out there who has questioned the value of this service. I think that is a good thing. In fact it has created this debate. Would you prefer in the future I just oooh and ahhh every time you guys make an announcement. I am not sure if you have followed my blog. I have on a number of occasions praised Hyperic innovations. I was disappointed in this service and I expected more and I am just taking a pass on this one.

CloudStatus is monitoring Amazon’s service, not your application.

I understand what you do and in fact you restated it in this comment. Not monitoring what you call the application (I prefer to call it the service) is why, IMO, you add very little business value to anyone other than developers and the blog-o-sphere.

Well, this is an important metric that even Amazon has numbers on.

Of course they do, however, they are not a monitoring company. You are.

If your application requires 30 instances up in order to be available, or your app just needs 1 instance + a ton of software, this number is still useful.,

Again you are making my point…. Just measuring the start of the instance is of little value to the business service. Take your second example “1 instance + a ton of software”. If you are telling me that Cloudstatus can tell me when a “ton of software” is loaded, configured, and available to the service then I am all ears. What a business needs to know is when the service is available and how degradation in service availability affects the business. Take the LAMP stack. What the business needs to know is how long it takes a new server to be started and configured in relation to the stack.

In the case of EC2 queuing models, a business needs to know what the relationship of jobs in the queue related to the amount of instances that are available. Stacey, claims that your Cloudstatus grew out of customer requirements. Since most of the successful EC2 business models today seem to resolve around batch processing, (e.g., Smugmug, Animoto, NY Times), I am not sure how you guys missed this one. Make me king for a day and I would have found a way to monitor a services queue requests and then relate that to instances.

Also, I am not clear that a few synthetic applications starting EC2 instances and measuring the startup time relates to other applications in the AWS network. Are all EC2 instance creations the same or does mileage vary depending on Network routes, EC2 Zones, and a number of other factors. Other than ballpark numbers will the Cloudstatus numbers give me meaningful data.

What we tell you (in the case of EC2) is how much time Amazon needs to process your request and give control of the instance to you. This is important! If that number were 20 minutes instead of 60 seconds, you’d be outraged!

I am going to go out on a limb here and say that taking 20 minutes to start an EC2 image is probably and outage and I am not sure a business needs Cloudstatus to tell us that.

How can Amazon claim elasticity when they are bringing up instances as fast as RackSpace can rack them?” “How can I respond to a surge in traffic if bringing up an instance takes an hour?”

Now you are really zoning in on my point. Elasticity is not just starting an instance. It is using autonomics. Autonomics ties monitoring with provisioning. If I were a monitoring company and I wanted to add monitoring value to the cloud I would look at what companies like 3Tera, Elastra, and Rightscale are doing to provide elasticity. In fact the open source project Sclar project also includes this kind of elsaticity. Theses vendors use monitoring metrics to autonomiclly provision servers. They use simple metrics like loadavg, number of users, and network traffic. Elasticity in the clouds is the ability to dynamically provision instances when you need them and destroy them when you don’t. Static AJAX charts or emails can not aid in a customer’s need for elasticity. I take it that your assumption is that users of the cloud will have to go to cloudstatus.com to find out how thier business is running. That doesn’t seem a likely scenario. What the cloud industry needs is common open source, non hype, monitoring tools that understand this and can tie those metrics, or make available, to the specific cloud API’s. These were some of the things I thought Hyperic was going to provide when I heard about the Cloudstatus announcement.

Also, can Rackspace dynamically add a server to a customer’s application in twenty minutes with a dynamic on-demand request?

CloudStatus provides normalized metrics that are applicable to everyone using the services.

Why not ping google.com while you at it…

We’ve distributed the monitoring of Amazon through many instances so we can provide many datapoints — not just a ’single synthetic’ one.

How many? What locations? Are you using different Zones? How often?

Well, if you look at CloudStatus, it’s readily obvious that you’re not going to get faster than a 3 second latency in your queue — and that’s something you can make business and application architecture decisions on.

Again you are hitting the nail on the head. Most people know through the blog-o-sphere that it takes on average about 60 seconds to start an EC2 instance. I have to admit I have not seen any one specifically measure SQS and show metrics and now we know. These numbers are useful to developers, however, traditionally monitoring is not used by developers it is mostly used after the application is deployed.

If you want specific information about when your complex 30 instance application becomes available, then you need a different tool … might I suggest Hyperic HQ?

From the Hyperic PR campaign this is what it looked like Cloudstatus was going to provide. Now that you have confirmed and cleared the “Hype” behind the video and the other viral articles we now have a clearer view of what Cloudstatus is and is not.

In the end the real value Cloudspace adds will be determined by the users of EC2 and not blogger like me or bloggers that have gone through the Hyperic PR “Junket”.

Thanks for agreeing on one of my point. For the other don’t tell me you don’t like flashy gui (probably not, I am looking at Zenoss sceenshots). I recently started using django and am learning python while doing so, I love their first Zen of “Beautiful is better than ugly.”

Can’t comment about scale as my client are small/mid size deployments. I agree that there are many open bugs with Hyperic and wish they fix all of them tomorrow but I won’t stop using it if they can’t.

[...] We heard a lot about cloud computing at the Gartner show this week. You can read a bit about their take on it here. While we’ve been musing on the different ways we monitor cloud computing resources, Hyperic is already announcing their solution to monitor Amazon’s cloud computing availability. Hyperic believes that “making use of cloud resources would be more popular if the customers had an independent means to monitor cloud services.” They plan to offer the monitoring service to other cloud companies this year. However, John Willis questions the hype of Hyperic. [...]

[...] Taking the Hype out of Hyperic’s new Cloudstatus. | IT Management and Cloud Blog Hyperic and Zenoss, IMHO, have been leading the non-hype charge for open source monitoring and until this announcement, Hyperic has been really walking the walk when it comes to aligning business activities with IT management. (tags: hyerpic) [...]