AWSCLI S3 Backup via AWS Direct Connect

By Ken Lassey, Cornell EMCS / IPP

Problem

AWSCLI tools default to using the Internet for its connection when reaching out to services like S3. This is due to the fact that S3 only provides public endpoints for network access to the service. This is an issue if your device is in Cornell 10-space as it cannot get on the Internet. It also could be a security concern depending on the data, although AWSCLI S3 commands do use HTTPS for their connections.

Another concern is the potential data egress charges for transferring large amounts of data from AWS back to campus. If you need to restore an on-premise system directly from S3, this transfer would accrue egress charges.

Solution

Assuming you have worked with Cloudification to enable Cornell’s AWS Direct Connect service in your VPC, you can force the connection through Direct Connect with a free tier eligible t2.micro Linux instance. By running the AWSCLI commands from this EC2 instance, it can transfer data from your on-premise system, through Direct Connect and drop the data in S3 all in one go.

Note that if your on-premise systems have only public routable IP Addresses, and no 10-Space addresses, your on-premise systems will not be able to route over Direct Connect (unless Cloudification has worked with you to enable special routing). In most campus accounts, VPC’s are only configured to route 10-Space traffic over Direct Connect. If the systems you want to back up have a 10-Space address, you are good to go!

Example Cases

You have several servers that backup to a local storage server. You want to copy the backup data into AWS S3 buckets for disaster recovery.

You want to backup several servers directly into AWS S3 buckets.

In either case, you do not want your data going in or out over the internet, or it cannot due to the servers being in Cornell 10-space.

awscli s3 sync <backupfolder> <s3bucket>

By running the awscli command from the backup server or individual server the data goes out over the internet.

Example Solution

By utilizing a free tier eligible t2.micro Linux server you can force the traffic over Cornell direct connect to AWS.

Running the awscli commands from the t2.micro instance you force the data to utilize AWS direct connect.

On your local Windows server (backup or individual) you need to ‘share’ the folder you want to copy to S3. You should be using a service account and not your own NetID. You can alternatively enable NFS on your folder if you are copying files from a Linux server, or even just tunnel through SSH. The following examples are copying from a Windows share.

In AWS:

Create an s3 bucket to hold your data

On your systems:

Share the folders to be backed up

Create a service account for the backup process

Give the service account at least read access to the shared backup folder

If needed allow access through the Managed Firewall from your AWS VPC to the server(s) or subnet where the servers reside