CloudWatch Is of the Devil, but I Must Use It

CloudWatch Is of the Devil, but I Must Use It

Let’s discuss Amazon CloudWatch.

For the ones lucky sufficient not to be caught in the weeds of Amazon Web Services (AWS), CloudWatch is, and I quote from the reliableAWS description, “a tracking and control provider constructed for builders, device operators, web site reliability engineers (SRE), and IT managers.” This is all nicely and excellent, except for for the phase the place there isn’t any unmarried named constituency who enjoys running with the product. Allow me to dispense some tracking heresy.

Better, let me describe this in the context of the 14 Amazon Leadership Principles that reportedly information each and every resolution Amazon makes. When you are taking a difficult have a look at CloudWatch’s whole failure throughout all 14 Leadership Principles, you surprise how this product ever made it out the door in its present state.

“Frugality”

I’ll get started with billing. Normally left for the tail finish of articles like this, the CloudWatch billing paradigm is so horrible, I’m main with it as an alternative. You get billed consistent with metric, per 30 days. You get billed consistent with thousand metrics you request to view by way of the API. You get billed consistent with dashboard per 30 days. You get billed consistent with alarm per 30 days. You get charged for logs based totally upon information quantity ingested, information quantity saved and “vended logs” that get printed natively through AWS products and services on behalf of the buyer. And, you get billed consistent with customized match. All of this can also be summed up absolute best as “no person on the planet understands how your CloudWatch metrics and logs get billed”, and it ends up in eventualities the place tracking distributors can inadvertently value you 1000’s of greenbacks through polling CloudWatch too often. When the AWS fees are better than what you are paying your tracking dealer, it is now not a lovely feeling.

“Invent and Simplify”

CloudWatch Logs, CloudWatch Events, Custom Metrics, Vended Logs and Custom Dashboards all imply various things internally to CloudWatch from what you’ll be expecting, in comparison to metrics answers that if truth be told make some fathomable degree of sense. There are, thus, a couple of products and services that do very other issues, all working underneath the “CloudWatch” moniker. For instance, it isn’t specifically intuitive to most of the people that scheduling a Lambda serve as to invoke as soon as an hour calls for a customized CloudWatch Event. It feels overly sophisticated, extremely complicated, and in no time, you end up in a scenario the place you are having to construct complicated relationships to watch issues which are themselves a long way more practical.

“Think Big”

All industry folks, when requested what they would like from a tracking platform, will reply with one thing that resembles “a dashboard” or “a unmarried pane of glass view”. CloudWatch gives minutia up the wazoo, but it categorically gives no world view, no inexperienced/yellow/pink standing indicator that provides you with even a glimmer of the total well being of your web site. Want a graph of every core to your example’s CPU for the previous 30 seconds? Easy! Want to understand if your whole corporate will have to be placing out the burning hearth this is the present manufacturing state of your web page? Keep having a look—CloudWatch has not anything to give you.

“Insist on the Highest Standards”

By its very nature, CloudWatch appears like small considering. The complete enjoy, begin to end, smacks of “what is the absolute least we may just do and escape with it?” They constructed their MVP, after which simply sorta…stopped, frozen in amber. They created a suite of development blocks, except for they did not resolve the downside of “how do I monitor my AWS resources?” Instead, it appears like the complete group phoned it in and let a big marketplace of tracking distributors expand consequently. None of the ones distributors have the degree of get admission to to the uncooked information that CloudWatch does; all of them have constructed higher merchandise. You’d suppose the CloudWatch group would take a clue from the innovation that is all of a sudden taking place on this house, but that’d require any person to Learn and Be Curious.

“Are Right, a Lot”

Recent information is “eventually consistent”, so that you all the time get graphs like the one proven in Figure 1.

Figure 1. Example CloudWatch Graph

Here if truth be told, that may be a terrifying factor to peer on an correct dashboard—one thing is clearly very improper along with your web site! For higher or worse, the “accurate” description does not follow to CloudWatch, and that’s the reason simply how your graphs all the time glance. “Your metrics might be in the end constant” may be very just about the last item you need to listen to about your tracking platform, 2d best to “what metrics?” This ties without delay to…

“Earn Trust”

Let me be very transparent right here—the actual factor is not the ingestion downside. Absolutely each and every dealer on the planet has the similar factor—you’ll be able to’t show information you should not have. Where CloudWatch drops the ball is in exposing this habits to the finish consumer with out clarification as to what is occurring. Thus, till you develop familiar with it, you’ve gotten a heart-stopping second of “what the hell just happened to the site” on every occasion you look at a dashboard. This stipulations you to be totally too calm when having a look at smart dashboards when a crisis simply took place. If you believe what the CloudWatch dashboards display you, you are making a horrible mistake.

“Dive Deep”

If you are the use of Lambda or Fargate, you don’t have any selection but to make use of CloudWatch Logs, in which looking for the whole lot is basically horrible. If you are the use of CloudWatch Logs to diagnose the rest, congratulations: you are diving so deep, chances are you’ll drown earlier than making it again to the floor. For instance, if I have a Lambda serve as that throws an error, with the intention to diagnose the downside, I will have to:

Find the indisputable fact that it encountered an error in the first position through having a look at the invocation error CloudWatch dashboard. I additionally may just arrange a filter out to run a continual question on the logs and alert when one thing presentations up, except for that’s not natively supported—I want a third-party device for that (such as PagerDuty).

Go diving into a wide range of CloudWatch log teams and to find the one named after the particular erroring serve as.

Scroll manually thru the many, many, many pages of log teams to seek out the particular invocation that threw an error.

Realize that the JSON object that is retained is not sufficient to troubleshoot with, cry in melancholy, and pass write a piece of writing identical to this one.

Do some fast math and understand I’m paying an uncomfortable share of my AWS invoice for a provider that is best of quite marginal software at absolute best.

“Deliver Results”

All of your metrics, all of your logs—they are locked away inside of CloudWatch’s quite a lot of elements. You’re now not going to discover a “page me when this threshold is exceeded” choice in CloudWatch; your choices are relegated to “design an alert supply pipeline with baling twine and SNS” or pay a non-AWS dealer for any other tracking product.

“Customer Obsession”

CloudWatch helps to keep all of your metrics. It helps to keep your logs. It means that you can construct customized dashboards to view your metrics multi function position. The development blocks of an ideal provider are already right here—it is the expression of that software that falls quick, now and again significantly. The reality that enormous tracking distributors are premier sponsors of AWS occasions can be laughable if CloudWatch ever have been to get its act in combination. You’d now not want a 0.33 celebration to make sense of a natural AWS setting, and plenty of of them would starve to demise as they develop too susceptible to break your dialog to invite if they are able to scan your badge. Choosing to make use of CloudWatch vs. actually anything is like purchasing a automotive. “Why sure, I wish to purchase the Yugo as an alternative of the Honda. After all, it tests all the containers of technically being a automotive, so it is effective, proper?”

“Disagree and Commit”

It could be that the root motive of many of CloudWatch’s failings comes from the product engineers who constructed it false impression this (admittedly slippery!) Leadership Principle. It’s envisioned as passionately expressing your reservations a couple of resolution, but as soon as it is reached that, you decide to the resolution that used to be made. Unfortunately, apparently that the engineering groups liable for CloudWatch made up our minds to “Disagree in Commits” and inflict their arguments upon the global in the shape of the product.

“Ownership”

If I have been to move on the web and publish about how horrible just about any different AWS provider used to be, folks would rally to that provider’s protection. It’s the web; folks will do this. But when those and plenty of extra equivalent feedback about CloudWatch seem, and no person from AWS pipes in to say “wow, I’m sorry, why do you feel that way?”, it is abundantly transparent that if any folks on the CloudWatch group actually care about the product, they have been locked in a malfunctioning rest room stall for the best part of a decade. Thesefeedback return a minimum of that a long way, butAmazoniscompletelyonit, rocking the corporate’s “Bias for Action” idea.

“Hire and Develop the Best”

The individuals who construct CloudWatch are not horrible at their jobs; I if truth be told consider they do not moderately clutch how their product is perceived. Given that it is deficient shape to write down a rant like this and now not be offering ideas for certain development, listed here are some product improvements I’d like to peer:

Give me the solution to rate-limit API calls at arbitrary ranges slightly than being stunned at month finish through a invoice that is roughly Zanzibar’s GDP.

“Here’s an error that your Lambda serve as threw, this is the log output from that exact serve as” will have to be at maximum two clicks away—now not 30.

If your canine has a clutter of 14 pups, most likely you do not wish to identify all of them delicate diversifications of the time period “CloudWatch”. The proliferation of products and services and corporations that every one get started with the phrase “Cloud” is the matter of an absolutely separate rant.

Please do not misunderstand me. I use, experience and advertise AWS products and services, and I’m thought to be to be “an authentic voice” in large part as a result of in addition to praising issues which are glorious, I’ll name out issues that are not, as I’ve simply carried out. I’ve constructed my occupation and industry on running inside that ecosystem. I to find AWS workers to be clever and well-intentioned, and maximum of their products and services moderately excellent. CloudWatch may just get there with some paintings, but it has got a bunch of very painful usability problems that stay it from being excellent, let on my own nice.