Menu

Category Archives: Storage

Post navigation

The AWS Elastic Filesystem (EFS) gives you an NFSv4-mountable file system with almost unlimited storage capacity. The filesystem I just created to write this article reports 9,007,199,254,739,968 bytes free. In human-readable format df -kh reports 8.0E (Exabytes) of available disk space. In the year 2019, that’s a lot of storage space.

In past articles I’ve shown how to create EFS resources manually, but this week I wanted to programmatically create EFS resources with Terraform so that I could easily create, test, and tear-down EFS and VM resources on AWS.

I also wanted to make sure that my EFS resources are secure, that only VMs within my Virtual Private Cloud (VPC) could access the EFS data, so that no one outside of my VPC could mount or otherwise access the data.

The file_system_id is automatically set to the efs-example resource’s ID, which ties the mount target to the EFS file system.

The subnet_id for subnet-efs is a separate /24 subnet I created from my VPC just for EFS. The ingress-efs security group is a separate security group I created for EFS. Let’s cover each one of these separately.

A separate EFS subnet

First off I’ve allocated a /16 subnet for my VPC and I carve out individual /24 subnets from that VPC for each cluster of VMs and/or EFS resources that I add to an AWS availability zone. Here’s how I’ve defined my test environment VPC and EFS subnet:

The EFS security group

Finally, I need a security group that only allows traffic between my test environment VMs and my test environment EFS volume. I already have a security group called ingress-test-env that is used to control security for my VMs. For EFS I create another security group that allows inbound traffic on port 2049 (the NFSv4 port), allows egress traffic on any port.

By setting the ingress-efs-test resource’s security_groups attribute to ingress-test-env this only allows network traffic to and from VMs in the ingress-test-env security group to talk to the EFS volume. If you use security_groups like this, you really lock down the EFS volume and you don’t need to set the cidr_blocks attribute at all.

Ceph is a distributed storage system that provides object, file, and block storage. On each storage node you’ll find a file system where Ceph stores objects and a Ceph OSD (Object storage daemon) process. On a Ceph cluster you’ll also find Ceph MON (monitoring) daemons, which ensure that the Ceph cluster remains highly available.

Rook acts as a Kubernetes orchestration layer for Ceph, deploying the OSD and MON processes as POD replica sets. From the Rook README file:

Rook deploys the PODs in two namespaces, rook-ceph-system and rook-ceph. On my cluster it took about 2 minutes for the PODs to deploy, initialize, and get to a running state. While I was waiting for everything to finish I checked the POD status with:

Final tasks

Now I need to do two more things before I can install Prometheus and Grafana:

I need to make Rook the default storage provider for my cluster.

Since the Prometheus Helm chart requests volumes formatted with the XFS filesystem, I need to install XFS tools on all of my Ubuntu Kubernetes nodes. (XFS is not yet installed by Kubespray by default, although there’s currently a PR up that addresses that issue.)

Make Rook the default storage provider

To make Rook the default storage provider I just run a kubectl command:

That updates the rook-ceph-block storage class and makes it the default for storage on the cluster. Any applications that I install will use Rook+Ceph for their data storage if they don’t specify a specific storage class.

Install XFS tools

Normally I would not recommend running one-off commands on a cluster. If you want to make a change to a cluster, you should encode the change in a playbook so it’s applied every time you update the cluster or add a new node. That’s why I submitted a PR to Kubespray to address this problem.

However, since my Kubespray PR has not yet merged, and I built the cluster using Kubespray, and Kubespray uses Ansible, one of the easiest ways to install XFS tools on all hosts is by using the Ansible “run a single command on all hosts” feature:

I just upgraded from Ubuntu 17.04 to 17.10 and one of the first things I noticed was all of the disk volumes that are mounted under my home directory appeared on my desktop. In Ubuntu 17.10, all volumes that are mounted under /home or /media appear on your desktop, and none of the switches in the Settings tool will make them go away.

The names of the folders aren’t even useful. They’re names like 10GB Volume and 20GB Volume. If you have two volumes the same size they’ll both have the same useless name. No hint of where the volume is mounted appears.

I have files, documents, databases, and email going back 20 years, much of it archival data that I want to be able to search but which never gets updated, so I keep these archive directories on separate read-only logical volumes. If my home directory’s file system gets corrupted beyond repair, the archives will still be intact. Since the volumes are read-only a misbehaving program or command-line oops won’t destroy the data.

But I don’t want to see them all over my desktop.

Tweak tool to the rescue! Install the tool and run it:

sudo apt install gnome-tweak-tool
gnome-tweak-tool

Then:

Desktop > Mounted Volumes > Off

No more volume icons on the desktop!

gnome-tweak-tool has other useful settings that are absent from the Settings tool, such as giving you the ability to move the window buttons to the upper left side of your windows.

Want to make the icons on your desktop smaller? Open up the File Manager, browse to Desktop, and select the icon size you want by moving the slider bar. The size of the icons on your Desktop and the size in the File Manager’s Desktop folder both use the same setting.

CockroachDB is new distributed database which, like its namesake, is really hard to kill.

CockroachDB implements SQL DML commands for creating schemas, tables, and indexes using the same syntax as PostgreSQL, and it supports the PostgreSQL wire protocol, meaning that any PostgreSQL database driver or client can be used to connect to a CockroachDB database. If you’re currently using PostgresSQL and you want an easier scale-out, highly-available way to deploy a database, you should take a look at CockroachDB. In many cases you can just repoint your application at a CockroachDB server and your application will run the same as it did using PostgreSQL.

The first day I tried using CockroachDB I got a six-node system up and running using CockroachDB’s Docker image on my Apcera cluster using AWS EFS as a backing store in less than an hour. This is what I did to get it working.

Create a namespace and a private network

“roachnet” is a private VxLAN created by the Apcera platform that only containers that I’ve joined to the network can see.

Create the first CockroachDB node

Next I create a container instance called “roach1” from the latest Docker image, open ports 8080 and 26257, tell it to use the EFS provider for storage, and to advertise itself to other CockroachDB nodes so they can find it and join the DB cluster.

I added the “sleep 3” command because when I originally tested this (on CockroachDB 1.1.0) the platform started the containers so fast that the DB got confused and didn’t add all of them to the cluster. All nodes started, but only some joined the cluster. After I added the delay all nodes joined the cluster.

Verify that the containers are all running.:

After that the cluster was up and running. I could connect to the database, create schemas, create tables, add, update, and delete records. I’m pretty happy with the initial results. Next step is automatically generating secure certificates so I’m not operating in insecure mode, then I’m going to run actual applications against the cluster.

Hope you found this useful.

CockroachDB overview screen

CockroachDB Storage Screen

CockroachDB Queues

Amazon announced the development of the Amazon Elastic File System (AWS EFS) in 2015. EFS was designed to provide multiple EC2 instances with shared, low-latency access to a fully-managed file system. On June 28, 2016 Amazon announced that EFS is now available for production use in the US East (Northern Virginia), US West (Oregon), and Europe (Ireland) Regions.

Apcera‘s NFS Service Gateway can be used to access AWS EFS storage volumes within containers. You can use EFS to provide persistent storage to your containers running on AWS-hosted clouds in regions where EFS is available.

Gathering information

Before you begin you will need to know:

The name of the AWS Region where your Apcera Platform is running

The name/ID of the AWS VPC where your Apcera Platform is running

The name/ID of the AWS security group for your Apcera Platform

Setting up an EFS volume

Log into your AWS console.

Select the name of the AWS Region where your Apcera Platform is running on the upper right side of the screen.

Select Elastic File System.

Click Create File System.

Configure the file system access:

Select the name of the VPC.

The availability zone and subnet should be selected for you automatically.

If your VPC has more than one subnet (unusual) then select the subnet containing the Instance Managers that will be connecting to the EFS volume.

Leave IP address set to Automatic.

The first EFS volume you create will create a new security group. Use that security group for this and all future EFS volumes. Write down the name of the new EFS security group – we’ll configure it in the next few steps.

You should see a “Success!” message and a new EFS volume with “Life Cycle State” = “Creating”.

Write down the IP address of the EFS volume.

Update the EFS security group

Go back to the main console menu and select EC2.

Click Security Groups in the left hand nav menu.

Type the name of the new EFS security group into the search filter list.

On the bottom half of the screen delete the default inbound and outbound rules.

Add one inbound rule to allow all TCP traffic on port 2049 from the source “name/ID of the AWS security group for your Apcera Platform”

Add one outbound rule to allow all TCP traffic on port 2049 to the destination “name/ID of the AWS security group for your Apcera Platform”

This allows all VMs within your Apcera Platform security group to connect to your EFS volume on port 2049 (NFS).

No other traffic from any other source or to any other destination is allowed.|

Create an NFS Provider for the EFS volume

We’re going to create a single provider for the EFS volume. Each time you have a container or set of containers that need a persistent file system, just create a new service from the same provider. Each new service will carve out a new namespace on the EFS volume, keeping the files associated with that service separate from the files in all other services that use the same provider.

According to the EFS FAQ, When you create a file system, you create endpoints in your VPC called “mount targets.” Each mount target provides an IP address and a DNS name, and you use this IP address or DNS name in your mount command. Only resources that can access a mount target can access your file system. Since the Apcera Platform isn’t using Amazon DNS services internally, we’ll use the IP address to connect to the EFS volume.

To create the provider, you need to construct a URL describing the volume. In this case, we’ll use the internal IP address of the EFS volume as the hostname and / as the exported volume name. All EFS volumes use the NFS v4.1 protocol. If the IP address of the EFS volume is 10.0.0.112 we’d construct a provider using:

You can bind this service to any container that needs a shared, persistent file system. Each time you need a new shared, persistent file system for a container or group of containers just create a new service using the same provider and bind the service to your job or jobs.

Persistence for Docker

Now that we have a provider that can carve out EFS storage for containers, let’s try spinning up some Docker images.

On the Apcera Platform, if the specification for a Docker image (Dockerfile) specifies that the app requires persisted volumes, you must do one of the following when creating the job:

Include the –provider flag when you create or run the Docker job. You must include this flag if you include the –volume flag when creating or running the Docker job.

Include the –ignore-volumes flag when you create or run the Docker job.

Here is an example of running NGINX inside a Docker container on the Apcera platform, where the content for the site is stored on an EFS volume:

I’m using the Apcera “apc” command-line tool to build the container, pulling the nginx image directly off hub.docker.com, telling it to use the awsefs EFS volume provider I created earlier for persistence, and to mount the EFS volume at the mount point “/usr/share/nginx/html”.

Now connect to the container:

/proc/mounts contains a list of all of the container’s mount points. I can verify that the container does indeed have an EFS volume by grepping /proc/mounts for the mount point:

Grepping for “/usr/share/nginx/html” shows the IP address 10.0.0.112, which is the IP of the EFS volume, the log directory name after is the unique namespace for the service, the mountpoint is “/usr/share/nginx/html”, and the mount type “nfs4”.

There is no content in the directory, so I add some by echoing some HTML code to an index.html file. My container will proclaim to the world “NGINX in a Docker container on Apcera with content stored on EFS” in an H3 typeface!

Now that I have some content I need to add a route to the content. Right now the NGINX container is running, and listening on ports 80 and 443, but it’s completely isolated from the outside world — no one can connect to those ports unless there’s a route (a URL) set up.

My cluster is running on the domain earlruby.apcera-platform.io, so I add a route like so:

I have successfully added the http route http://nginx.earlruby.apcera-platform.io/ to my NGINX container. This is a real public DNS entry. To verify that it works I point my browser at the route I just added:

Success!

Such an amazing app is bound to go viral, and a single NGINX container may not be able to keep up with the load. I want to ensure that my app can keep up and remain highly-available, and that it keeps running even if one or more VMs in my cluster get killed off, so I add more NGINX containers:

Now I’ve got 20 containers running my NGINX app, all serving up the same content, running on multiple VMs across my cluster, all load-balanced under the single URL http://nginx.earlruby.apcera-platform.io/. If any container gets killed off, the Apcera platform will spin up a new one. If any VM in the cluster dies, any containers running on it will automatically be migrated to new hosts. If I want to scale up the app to 100 or 1000 containers, or back down to 1, it’s a one-line command to make the change.

In terms of resources, I’m using slightly less than 45 MiB to run those 20 containers. That’s not a typo — 45MiB! Containers are much more efficient users of RAM than VMs.

This is a talk I gave last week at the SF Microservices Meetup titled Policy-based Cloud Storage, Persisting Data in a Multi-Site, Multi-Cloud World. In it I cover Apcera‘s approach to storage for containers and how to use policy to manage very large scale application deployments.

I have an Ubuntu 15.04 “Vivid” workstation already set up with LUKS full disk encryption, and I have a Synology DS414 NAS with 12TB raw storage on my home network. I wanted to add a disk volume on the Synology DS414 that I could mount on the Ubuntu server, but NFS doesn’t support “at rest” encrypted file systems, and using EncFS over NFS seemed like the wrong way to go about it, so I decided to try setting up an iSCSI volume and encrypting it with LUKS. Using this type of setup, all data is encrypted both “on the wire” and “at rest”.

Log into the Synology Admin Panel and select Main Menu > Storage Manager:

The StratoStor project I’ve been working on for the past 10 months just got a “Top 5 New Products or Technologies to Watch” award from HPCwire announced at this week’s SuperComputing 2014 (SC14) conference in New Orleans.

HPC = High Performance Computing, HPCwire is a news bureau for all things regarding High Performance Computing, and SC14 is where every major vendor of HPC equipment and products shows off their wares, so getting this bit of recognition from the readers of HPCwire is really nice.

This is the talk I gave at RICON this year on Validating Distributed Application Workloads. It’s about how we set up test environments at Seagate for validating storage system performance at the petabyte scale. This talk centers around the testing done to validate performance of a 2PB rack running Riak CS.

You can use a hard link in Linux to create two file names that both point to the same physical location on a hard disk. For instance, if I type:

> echo xxxx > a
> cp -l a b
> cat a
xxxx
> cat b
xxxx

I create a file named “a” that contains the string “xxxx”. Then I create a hard link “b” that also points to the same spot on the disk. Now if I write to the file “a” whatever I write also appears in file “b” and vice versa:

What most people don’t know is that rsync is an exception to this rule. If you use rsync to sync two files, and it sees that the target file is a hard link, it will create a new target file but only if the contents of the two files are not the same:

At this point “a” and “b” both point to the same file on the disk, which contains the string “xxxx”. “c” is a separate file that also contains the string “xxxx” and has the same permissions and timestamp as “a”.

At this point I’ve rsynced file “c” to “b”, but since c has the same contents and timestamp as “a” and “b” rsync does nothing at all. It doesn’t break the hard link. If I change “b” it still updates “a”:

> echo yyyy > b
> cat a b c
yyyy
yyyy
xxxx

This is how many modern file system backup programs work. On day 1 you make an rsync copy of your entire file system:

“cp -al” makes a hard link copy of the entire /home/earl/ directory structure from the previous day, then rsync runs against the copy of the tree. If a file remains unchanged then rsync does nothing — the file remains a hard link. However, if the file’s contents changed, then rsync will create a new copy of the file in the target directory. If a file was deleted from /home/earl then rsync deletes the hard link from that day’s copy.

In this way, the $DAY1 directory has a snapshot of the /home/earl tree as it existed on day 1, and the $DAY2 directory has a snapshot of the /home/earl tree as it existed on day 2, but only the files that changed take up additional disk space. If you need to find a file as it existed at some point in time you can look at that day’s tree. If you need to restore yesterday’s backup you can rsync the tree from yesterday, but you don’t have to store a copy of all of the data from each day, you only use additional disk space for files that changed or were added.

I use this technique to keep 90 daily backups of a 500GB file system on a 1TB drive.

One caveat: The hard links do use up inodes. If you’re using a file system such as ext3, which has a set number of inodes, you should allocate extra inodes on the backup volume when you create it. If you’re using a file system that can dynamically add inodes, such as ext4, zfs or btrfs, then you don’t need to worry about this.