Amazon AWS Tips and Gotchas – Part 5 – Managing Multiple VPCs

Continuing in this series of blog posts taking a bit of a “warts and all” view of a few Amazon AWS features, below are a handful more tips and gotchas when designing and implementing solutions on Amazon AWS, based around VPCs and VPC design.

AWS Tips and Gotchas – Part 5

11. Managing Multiple VPCs & Accounts

Following on from the previous post, let us assume that instead of just talking about public services endpoints (e.g. S3, Glacier, etc), and instead we are talking about environments with multiple VPCs, possibly multiple accounts, and the potential addition of Direct Connect on top.

Why would you do this? Well, there are numerous reasons for logically separating things such as your dev/test and production environments from a security and compliance perspective. The one that people sometimes get hung up on is why would I want more than one account? As it goes, some AWS customers run many tens or even hundreds of accounts! Here are a few examples:

The simplest answer to this is so that you can avoid being “CodeSpaced” by keeping copies of your data / backups in a second account with separate credentials!

Separation of applications which have no direct interaction, or perhaps minimal dependencies, to improve security.

Running separate applications for different business units in their own accounts to make for easier LoB billing.

Allowing different development teams to securely work on their own applications without risking impact to any other applications or data.

With the mergers and acquisitions growth strategy which many companies adopt, it is fairly common these days for companies to be picked up and bring their AWS accounts and resources with them.

Lastly, a very common design pattern for compliance is to use a separate account to gather all of your CloudTrail and other audit logs in a single account, inaccessible to anyone except your security team, and therefore secure from tampering.

The great thing is that with consolidated billing, you can have as many accounts as you like whilst still receiving a single monthly bill for your organisation!

We will now look at a few examples of ways to hang together your VPCs and accounts, and in the majority of cases, you can effectively consider the two as interchangeable in so far as the scope of this post.

Scenario A – Lots of Random VPC Peering and a Services VPC

This option is ok for small solutions but definitely does NOT scale and is also against best practice recommendations from AWS. As mentioned in the previous section, transitive peering is also not possible unless you are somehow proxying the connections, so if you are looking to add Direct Connect to this configuration, this just simply isn’t going to fly.

Imagine that all of the blue dotted arrows in the following diagram were VPC peering connections! Aaaaargh!

Option B – Bastion Server in Services VPC

If each of your VPCs is independent, and you only need to manage them remotely (i.e. you are not passing significant traffic between many different VPCs, or from AWS to your MPLS, then a services VPC with a bastion server may be a reasonable option (hub and spoke):

In this example, you could push a Direct Connect VIF into VPC A and via your bastion server, manage servers in each of your other VPCs. This would NOT be appropriate if your other servers / clients on premises wanted to access those resources directly, however, and is more likely in the scenario where each VPC hosts some form of production or dev/test platform which is internet facing, and this is effectively your management connection in the back door.

You might also potentially aggregate all of your security logs etc into the bastion VPC.

Scenario C – Full Mesh

This is like a neater version of Scenario A. Holy moly! Can you imagine trying to manage, support or troubleshoot this?

Even something as simple as managing your subnets and route tables would become a living, breathing nightmare! Then what happens every time you want to add another VPC? shudder

If you require this level of inter-VPC communication, then my first question would be why are you splitting the workloads across so many dependent VPCs, and where is the business benefit to doing so? Better to look at rationalising your architecture than try to maintain something like this.

Scenario D – Lollipop Routing

If you absolutely must allow every VPC to talk to most or even every other VPC, and the quantity of VPCs is significant then it may be worthwhile looking at something more scalable and easy to manage.

This one is more scalable from a management perspective, but if I am honest, I am not massively keen on it! It feels a bit like AWS absolving themselves of all responsibility when it comes to designing and supporting more complex network configurations. It could potentially also work out rather expensive as you could end up needing a fairly hefty amount of Direct Connect bandwidth to support the potential quantity of traffic at this scale, as well as adding a load of unnecessary latency.

I would prefer that AWS simply allowed some form of auto configured mesh with a simple tag/label assigned to each VPC to allow traffic to route automatically. If only such a technology existed or could be used as a design template!?! (sarcasm mode off – MPLS anyone?)

I am confident that at the rate AWS are developing new services, providing automation of VPC peering won’t be miles off (as suggested by the word “presently” in the following slide from an AWS presentation available on slideshare from last July (2015):

In the meantime, we are left with something that looks a bit like this:

When reaching this kind of scale, there are also a few limitations you want to be aware of:

And Finally… NOTE: Direct Connect is per-Region

When you procure a direct connect, you are not procuring a connection to “AWS”, you are procuring a connection to a specific region. If you want to be connected to multiple AWS regions, you will need to procure connections to each region individually.

To an extent I can see that this makes some logical sense. Let’s say they allowed access through one region to others, if you have connections to a single region and that region has a major issue, you could end up losing access to all regions.

What would be good though would be the ability to connect to two regions, which would then provide you with region resilient access to the entire AWS network of regions. Whether this will become a reality is yet to be seen, but I have heard rumblings that there may be some movement on this in the future.

Wrapping Things Up

As you can see, getting your VPC peering and Direct Connect working appropriately, especially at scale, is a bit of a minefield.

I would suggest that if you are seriously looking at using Direct Connect, and need some guidance you could do worse than have a chat with your ISP, MSP or hosting provider of choice. They can help you to work out a solution which is best for your businesses requirements!