Tag Archive for Gotchas

The subject matter of this post is a startup of sorts and was triggered by a conversation I had with an industry veteran a few months back. By veteran of course, I mean an old bugger! 😉

It is an entity which begins its journey sourcing a target market in the tech industry and spends day and night pursuing that market to the best of its ability.

It brings in resources to help meet the key requirements of the target market; some of those resources are costly, and others not so much.

Occasionally it hits a bump in the road with funding and potentially needs to find other sources of investment, and may go through several rounds of funding over the course of a number of years. Eventually it gets to a point where the product is of a decent quality and market value.

Then it does a market analysis and discovers that the market has shifted and if the entity does not pivot or indeed re-skill, they will become irrelevant within a few short years.

Eh?

I am of course talking about the career of an IT professional.

Though I may be slightly exaggerating on the becoming irrelevant quite so fast, we certainly all made the choice to follow a career in one of the fastest moving industries on the planet. We have no choice but to continue to develop and maintain our knowledge, in order to keep driving our careers forward.

As a self-confessed virtual server hugger with a penchant for maintaining a pretty reasonable home lab, I enjoy understanding the detailed elements of a technology, how they interact, and acknowledging where the potential pitfalls are. The cloud, however, is largely obfuscated in this respect; to the point where many cloud companies will not even divulge the location of their data centres, never mind the equipment inside them and configuration thereof!

Obfuscation

That said, those of you with a keen eye may have noticed a shift in my twitter stream in the past year or so, with subjects tending towards a more public cloudy outlook… Talking to a huge range of customers in various verticals on a regular basis, it feels to me that a great many organisations are right on the tipping point between their current on-premises / dedicated managed services deployment models, and full public cloud adoption (or at the very least hybrid!).

It’s hard to believe that companies like AWS have actually been living and breathing public cloud for over ten years already; that’s almost as long as my entire career! In that time they have grown from niche players selling a bit of object storage, to the Behemoth-aaS they are today. To a greater or lesser extent (and for better or worse!), they are now the yardstick upon which many cloud and non-cloud services are measured. This is also particularly the case when it comes to cost, much to the chagrin of many across the industry!

To me, this feels like the optimum time for engineers and architects across our industry (most definitely including myself) to fully embrace public and hybrid cloud design patterns. My development has pivoted predominantly towards technologies which are either native to, or which support public cloud solutions. Between family commitments, work, etc, we have precious little time to spend in personal development, so we need to spend it where we think we will get the most ROI!

So what have I been doing?

Instead of messing about with my vSphere lab of an evening, I have spent recent months working towards certified status in AWS, Azure, and soon, GCP. This has really been an eye opener for me around the possibilities of designs which can be achieved on the current public cloud platforms; never mind the huge quantity of features these players are likely to release in the coming 12 months, or the many more after that.

Don’t get me wrong, of course, everything is not perfect in the land of milk and honey! I have learned as much in these past months about workloads and solutions which are NOT appropriate for the public cloud, as I have about solutions which are! Indeed, I have recently produced a series of posts covering some of the more interesting AWS gotchas, and some potential workarounds for them. I will be following up with something similar for Azure in the coming months.

Taking AWS as an example, something which strikes me is that many of the features are not 100% perfect and don’t have every feature and nerd knob under the sun available. Most seem to have been designed to meet the 80/20 rule and are generally good enough to meet the majority of design requirements more than adequately. If you want to meet a corner use case or a very specific requirement, then maybe you need to go beyond native public cloud tooling.

Anyhow, that’s enough rambling from me… By no means does this kind of pivot imply that everything we as infrastructure folks have learned to date has been wasted. Indeed I personally have no intention to drop “on premises” skills and stop designing managed dedicated solutions. For the foreseeable future there will likely be a huge number of appropriate use cases, but in many, if not most cases I am being engaged to look at new solutions with a publicly cloudy mindset!

Continuing in this series of blog posts taking a bit of a “warts and all” view of a few Amazon AWS features, below is another tip for designing and implementing solutions on Amazon AWS. In this case, Scale-Up Patching of Auto-Scaling Groups (ASGs) and a couple of wee bonuses about Dark Launch techniques.

19. AWS Tips and Gotchas – Part 9 – Scale-Up Patching in ASGs

Very quick tip on Auto Scaling Groups this week, courtesy of an awesome session I attended at the AWS User Group UK (London) last week on DevOps, presented by Chris Turvil from The Trainline.

Assuming you need to just do a code release to an existing farm of servers running in an ASG, and you aren’t planning anything complex such as a DB schema update, you can use a technique called “Scale-Up Patching”. I hadn’t heard the term before, but it’s actually incredibly simple, but very effective! There are a couple of methods you might use, depending on how you deliver your code, but the technique is the same; make your new code or image live, double the minimum size of your ASG, then halve it! Job done!So how does this work?

If you have looked into the detail of ASGs, assuming you have roughly even instances spread over multiple AZs then when an ASG shrinks / scales down, the oldest EC2 instances are killed first. For more detail on the exact rules, see here.

If you double the size of your current number of instances, all of the new instances will be deployed with your new code version. This leaves you with a farm of 50% vOld and 50% vNew. When you then tell the ASG to scale to the original size, it will obviously kill off all of the vOld instances, leaving your entire farm upgraded. If you found an issue and had to roll back, you simply rinse and repeat the same exercise! How brilliant is that?!

This process will work exactly the same regardless of whether you deploy your code via updated AMIs each time, or simply post-boot using a user-data script which pulls your source from a bucket, repo, or similar. Either way, the result is the same and infinitely repeatable!

The one counter to this which a colleague of mine brought up, is that you are explicitly depending on a specific feature of AWS always functioning in the same way and not changing in the future. An alternative might be to deploy in a blue-green setup with independent ELBs and instances. You then simply failover using Route53, either all in one go or using weighted routing for a canary release process. Funnily enough, AWS released a white paper on exactly that subject a couple of months ago:Blue/Green Deployments on AWS Whitepaper

They also cover the scale-up patching method in detail from page 17 of the whitepaper.

Brucie Bonus One – Deployment Dictionary

Incidentally, you can actually deploy said code, without it actually going live immediately, by using methods called “Dark Launch Techniques”. As the name suggests, this separates code deployment from feature launches. You pre-release your code into production, but you simply don’t toggle it on for anyone (or everyone) at first. You can then either toggle it on for everyone, or even better, smaller canary groups. Web-scale companies such as Netflix, Facebook and Google have been doing this for many years!

This process then completely avoids the panic-inducing impact of deploying a large new code release whilst simultaneously having that code go live and ramping up utilisation at the same time!

Combining dark launch methods with scale-up patching or blue/green deployments should lead to a few less grey hairs in the long run, that’s for sure!

Brucie Bonus Two – Environment Manager

Lastly, a bit of interesting news which also came from The Trainline is that they have open sourced their own internal deployment tool, they call Environment Manager.

With an AngularJS front end, and a Node.js back end, it’s a home-grown continuous deployment tool which includes a self-service portal, REST APIs, and a number of operational governance features. The governance elements include a feature which prevents rogue developers deploying anything which hasn’t already been defined in the central service catalogue.