Using AWS Storage Gateway Virtual Tape Library in Linux

Using AWS Storage Gateway Virtual Tape Library in Linux

If you are of around my age or younger you probably didn’t have much exposure to tape backup technologies. Tapes are sooo 90’s right?! I definitely didn’t expect that as an AWS Consultant I will have to learn about tapes. But I did! One of our customers wanted to use AWS Storage Gateway (SGW) in the Virtual Tape Library (VTL) mode and use it for backups with Veeam Backup suite. Veeam seems to “just work” with SGW VTL but I like to understand how things work under the hood so I decided to backup my Linux test system to VTL using just the low level Linux tools. That’s the best way to learn.

AWS Storage Gateway Virtual Tape Library in a nutshell

AWS SGW is a preconfigured virtual machine that can be deployed either to an on-premises VM in VMware ESX, Microsoft Hyper-V or in AWS as an EC2 instance. The Storage Gateway is fully managed through AWS Console and there is very little you can do on the VM or EC2 itself. Essentially you can only set up networking so it can connect to AWS. SGW can be configured in on of 3 modes – File gateway (NFS), Volume gateway (iSCSI) or Tape gateway (Virtual Tape Library – VTL). We will discuss the last one.

SGW VTL provides 10 virtual tape drives and we can create 1 or more virtual tapes to use in those drives. Each tape has an ID (Barcode) e.g. AB123456, XYZ98765, etc. SGW VTL also provides a virtual media changer that can move the tapes in to and out from the drives. Data written to a tape in a drive are stored locally on SGW in its Cache disk and also immediately sent to S3.

In our customer’s case we’ve got 10 drives and 10 tapes (1TB each) and Veeam backs up to one tape after another. Once a tape is full it moves on to the next drive loaded with an empty tape and continues to backup there. And so on, until the backup cycle is finished. Next week it automatically swaps the week’s tape set with a new tape set for the next week. That’s all Veeam’s job, AWS SGW VTL only handles the drives and tapes and backing them up to S3.

Now the interesting part. Interesting to me at least. VTL stores the tapes in Storage elements (slots) and there are 3200 of them! Half of them are “normal” slots (ID 1 ~ 1600) where we can store unused tapes and move them to and from the drives when needed. Tapes in these slots (as well as those in the drives) are backed by S3 and are ready to use.

The other half are “Import/Export Storage elements” (slots ID 1601 ~ 3200). Newly created virtual tapes pop up there and can be transferred to normal slots or loaded into virtual drives.

However when a tape is moved into an Import/Export slot it is immediately archived to AWS Glacier and is no longer available for use. From Glacier it can be retrieved in read-only mode for backup recovery or deleted permanently.

That’s in a nutshell what AWS SGW VTL does. The rest is your backup software job.

Not we will go step by step from installing the gateway through using the media changer to actually backing up and restoring data.

Step 1 – Install AWS Storage Gateway – Virtual Tape Library

There’s no magic to this step. Just follow the prompts in the AWS Console and deploy SGW either on EC2 or in your on-prem VMware or Hyper-V. I also tried to stand it up in VirtualBox and it worked just fine. For this demo I deployed SGW on EC2 instance in the Oregon region and attached two 150GB data disks – one for Cache and one for Upload buffer. In production you may want to have bigger disks, depending on your daily / weekly backup size and internet link speed. Note that you will need HTTPS access to the SGW instance in order to complete the installation – make sure the security group permits that.

Once the Storage Gateway is installed we create 5 new tapes, 100GiB each with a “Barcode” prefix “DEMO”. Yes I know that 5x 100GiB is more than the disk space allocated to SGW but it doesn’t matter. The primary place where the tapes are stored is S3 and the Cache and Upload Buffer disks are only used to .. well .. cache the data from S3 locally on the SGW and buffer the uploads to S3. We can just as well create 10x 2.5TB tapes and it will still work, albeit slower.

We can also create another EC2 instance with Amazon Linux for testing client access. We will need iscsi-initiator-utils, mt-st and mtx packages. The latter two from Fedora 27 Rawhide repository as they are not available in Amazon Linux repo.

It may also be a good idea to attach another large disk with some sample data to test the backups. I created 200GB filesystem under /data200, downloaded lots and lots of Linux kernel tarballs from a nearby kernel.org mirror and unpacked them side by side. Those 200GB filled up pretty quickly 🙂

Step 2 – Connect VTL devices to Linux

All communication between the Linux client and VTL is over iSCSI protocol, which means TCP traffic on port 3260. Make sure your Firewalls or Security Groups permit traffic from the Linux client to the Storage Gateway IP address.

First of all we have to discover and attach all the iSCSI targets that SGW VTL offers. Here 172.31.15.7 is my Linux box and 172.31.7.216 is the AWS Storage Gateway.

Now we’ve got all the remote tapes accessible as local devices /dev/st0 ~ /dev/st9 for the tapes and /dev/sgX for the media changer. The assignment of /dev/stX indexes to the tape drive IDs is a bit chaotic and I’m afraid it’s not even reboot-proof, i.e. the device names may be different next time the system reboots. Likewise media changer /dev/sgX index is a little unpredictable.

Fortunately there are stable symlinks for the stX and sgX names under /dev/tape/by-id and /dev/tape/by-path:

That’s great, now we can refer to the media changer as /dev/tape/by-id/scsi-2414d14236 (and we could make this a symlink to /dev/changer as well) and the tape drive 05 will always be /dev/tape/by-path/ip-172.31.7.216:3260-iscsi-iqn.1997-05.com.amazon:sgw-ab6587c2-tapedrive-05-lun-0 for example. Nice, stable, descriptive names, although a bit long. Never mind, we’ll live with that.

Step 3 – Using the media changer

The media changer is controlled by the mtx program. Let’s see if we can gather some info:

Now we’ve got 1 tape in the first drive ready to use. The other 4 tapes are still in their Import slots.

We can also “unload” tapes from drives back to the slots and “transfer” between slots. If the tape is unloaded / transferred to a “Normal” slot (slot id 1 ~ 1600) it will stay there ready for another use. If however the tape is unloaded to an “Export” slot (slot id 1601 ~ 3200) it will disappear and will be immediately archived to Glacier and no longer available for loading back to a drive.

See the man mtx for details on load, unload, transfer and other commands.

Step 4 – Backing up data

Now with a tape in a drive we are finally in a position to write something to the tape. We will use the classic tar – the Tape ARchiver – and also the mt tool to find out some info about the tape drive that we will need.

That’s clear and descriptive but long. These long names are actually symbolic links to the actual linux device names like /dev/st0. Of course we can use those instead if we want to save some typing. In the directory listing above you can see that tapedrive-01 is a symlink to /dev/st4.

tar -c -f /dev/st4 <file>

This command is exactly equivalent to the previous one. It’s shorter but not obvious which tape drive we are using.

To confuse things even more mtx numbers the drives 0 ~ 9 while the iSCSI target names are tapedrive-01 to tapedrive-10 and the corresponding to /dev/stX numbers are mixed up in no particular order. Phew, what a mess!

Now that we know the device name let’s try to backup a cloned Linux kernel GIT repository onto the virtual tape.

/dev/stX vs /dev/nstX

You may have noticed that for each virtual tape drive we’ve got two devices – for example /dev/st4 and /dev/nst4. What’s the difference? Those /dev/stX devices are rewinding and the /dev/nstX devices are non-rewinding. What does that mean?

After we finish writing our archive to /dev/st4 or to /dev/tape/by-path/ip-...-tapedrive-01-lun-0 it automatically rewinds the virtul tape back to start and positions the virtual “head” back to the beginning of the tape. Next write will start from the beginning and overwrite the previous archive. Or the next read will read the archive that we just wrote.

On the other hand when we finish writing to /dev/nst4 or /dev/tape/by-path/ip-...-tapedrive-01-lun-0-nst it will stay there, at that position on the tape and ready to write the next archive. This way we can write multiple archives on a single tape, one after another. The next read will complain that we are at the end of the tape 🙂

Step 7 – Archiving the tapes to Glacier

Many organisations require backup tapes to be stored off-site for a long time for compliance reasons. This is where VTL tape archiving comes to play.

I have backed up some 58 GiB of kernel source files onto our virtual tape and decided to preserve this precious collection for future generations. Note that at the moment the tape is in “Available” state.

To archive it all I need to do is unload it from the tape drive into one of the Import/Export slots with IDs 1601 ~ 3200. Lets unload it to slot 3200. Note that it will no longer appear in the list of tapes:

At the same time in the AWS Storage Gateway console the tape status will change from Available to Archived. To use it again we will have to “Retrieve” it from Glacier. Note that you can retrieve it to the same or to a different AWS SGW VTL than the one used to create it.

Once it is Retrieved it will pop up again in the Import slot in read-only mode. Then we can load it back to one of the virtual tape drives and restore any data we need from it.

Step 8 – Delete it all

There are a few steps to delete the AWS Storage Gateway – Virtual Tape Library:

Delete all Available and Archived tapes using the AWS Console or AWS CLI.

Delete the Storage Gateway sgw-abcd1234 using the AWS Console or AWS CLI.

Shut down and delete the SGW EC2 instance or VMware VM.

If you don’t follow these steps you may end up with messages like:

Gateways that failed deletion: sgw-AB6587C2

Tapes that failed deletion: DEMO18E3BD DEMO19E3BC

Cannot delete resources due to one or more resources’ status such as archiving or retrieving.

If you get any of these follow the steps above and try again 🙂

Backup schedules

With regards to backup schedules we were deciding between 2 options.

One scenario is to have e.g. 4 sets of tapes (4x 10 tapes, e.g. labeled AAxxxxxx, BBxxxxxx, CCxxxxxx, DDxxxxxx), one set per week, and rotate them weekly between the drives and the “normal” slots. Week 1 backups go to AA tapes, Week 2 backups to BB tapes, etc. Week 5 goes to AA tapes again. These will never be stored away in Glacier and will give you 3 recent weeks of backups plus the current week. That should be enough for most users and is simple to setup and manage.

Another scenario is to create a new tape set every week. Move the tapes from the Import slots to the drives, run the backups and at the end of the week move them to the Export slots for storing in Glacier. It is more work as it requires creating and deleting tapes every week, but it can be automated of course for example through Lambda functions. This is more suitable for customers who want to keep the backups for a very long time, perhaps for compliance purposes.

For our customer we decided to implement the first scenario with 4 tape sets and no Glacier archiving.

Troubleshooting

Being new to SGW VTL and to tape archiving in general I encountered a number of problems, and often spent quite a bit of time trying to figure out what was wrong. For future reference here are some common problems:

Most likely the tape in the drive was “Retrieved” from Archive (Glacier) and is therefore Read-Only. Retrieved tapes can’t be modified, erased or overwritten.

No devices show up under /dev/tape/by-path

Either you are not logged into the iSCSI targets, in which case follow the steps in Step 2 above.
Or you don’t have st kernel module loaded. Run: ~ # modprobe st

SGW crashing out of memory

SGQ will not complain if the VM or EC2 instance has less than 16GB RAM assigned but it may occasionally crash due to Out Of Memory. You can see these messages on the VM console in such a case. Give it 16GB RAM and it will work.

Problems deleting tapes or the storage gateway

Refer to Step 8 above. The tapes must be ejected (unloaded) before they can be deleted and all tapes must be deleted before the storage gateway can be deleted.

That’s all I’ve got to write about AWS Storage Gateway—Virtual Tape Library and Linux. Let me know in the comments if you found it useful 🙂