Welcome!

Are you ready to develop or modernize your application for use with SwiftStack cloud storage? This is the place to get started with tips, examples, and links to more resources!

Getting Started: Basic Concepts

Concept #1 — Cloud storage isn't like NAS or SAN

To start, SwiftStack is not a NAS (Network Attached Storage). NAS filers are typically storage systems that store data on one or more disks and share specific portions of that data over a network using file sharing protocols like NFS or SMB. These network protocols are designed to operate over a local area network and present the application or user with a file system that looks just like a local disk.

Cloud "object" storage is not a NAS; it was inherently designed to work at a global scale and be accessible from anywhere. In order to achieve this scale, the traditional file system interfaces needed to be discarded for something that scales on a global level. Instead of legacy file system interfaces, the SwiftStack uses URLs to identify an object, and the systems servicing the request are responsible for retrieving the actual data from the storage cluster. URLs are inherently designed around a distributed, global system of data, which lends itself well to storage and retrieval of unstructured data.

Cloud "object" storage is also not a SAN (Storage Area Network); SANs are clusters of centralized storage designed around low-latency access to storage volumes usually served as block devices (raw, unformatted disks) to servers and virtual machines. SANs are best-suited for virtualization data stores, VDI storage, attached disks, and structured data like relational databases. These systems are optimized for low-latency, random IO, and these optimizations usually result in higher cost (in price-per-capacity) than cloud storage systems optimized for scalable capacity and throughput. Object storage is not designed for low-latency, random IO workloads; instead object storage is best suited for unstructured data to be accessed by applications and users.

While files and objects look similar (e.g., cat.gif on a filesystem looks just like cat.gif on an object storage system), while files and objects look similar (e.g., cat.gif on a filesystem looks just like cat.gif on an object storage system), they have some inherent differences.

One key difference is that a file is referenced using a disk path and hierarchical folder structure (eg, c:\path\to\file.jpg), but an object is referenced using a unique URL identifier in an almost flat namespace. Using the S3 API (see below), objects are organized in "buckets," and there is no nesting of buckets. Using the OpenStack Swift API, there is one additional level of nesting: Objects are organized in "containers" nested within "accounts." (Note: The OpenStack Swift "container" is synonymous with the S3 "bucket" and should not be confused with the lightweight virtualization technology, e.g., Docker; to avoid confusion, we often refer to "buckets" or "bucket/container" when discussing both the S3 and Swift APIs.)

Also, because object storage was designed for web-scale applications to store large quantities of unstructured data, custom metadata can be created and stored with objects to organize and identify data for fast indexing and retrieval by applications.

Cloud storage uses RESTful APIs—not file system protocols

With traditional storage systems, when a user saves a file to disk, the application uses an operating system function to open a file descriptor (an abstract interface for data access), write the data to disk, and then close the descriptor. The filesystem driver organizes the information into fixed-sized blocks on disk, and saves certain metadata related to the file as well (file name, file size, creation time, etc).

Depending on the programming language the application uses, it might look something like this:

myFile = open(file.txt)

write(myFile,"This will be output to testfile.txt\n")

close(myFile)

SwiftStack uses the HTTP protocol, and applications never interact directly with an underlying operating system; nor do applications write directly to disk, since SwiftStack distributes data to the back-end object storage based on internal logic around distribution and durability.

Instead, applications use HTTP verbs to interact with the system, such as GET, PUT, POST, and DELETE. Instead of using file system paths, the API centers around URIs. For example, an object's URI might be http://storage.example.com/path/to/file. To upload a new file, an HTTP PUT is used to create or overwrite the file. To read the file, an HTTP GET is used on that same URI. To update metadata about the file, a POST operation might be used. To remove a file, the DELETE verb is employed.

As an example, the following API call stores a file named “file1” as an object in SwiftStack storage accessible at IP address 1.2.3.4, an account called “AUTH_system” and bucket (i.e., "container" in OpenStack Swift parlance) called “mycontainer.” (The authorization token would have been received in a previous API call.)

To reiterate—and if you’re new to software development, this may seem obvious, but if you’ve been programming applications that use file storage for a while, this may be a newer idea: The “native language” of cloud storage is a RESTful API using HTTP through which you can create, read, update, or delete objects. Every object is addressed using a unique URL accessible from anywhere; there is no mounting of a filesystem or share like in NFS or SMB/CIFS, and the concepts of opening, locking, or closing files don’t apply.

Concept #4 — Choosing an API

Though there are others, two APIs have emerged as the dominant choices for application development using cloud storage: Amazon’s S3 API was introduced by Amazon as the native language of its popular S3 public cloud storage service, and the OpenStack Swift API was developed as the native language for the open-source OpenStack Swift private cloud storage project. S3 has become the most popular API in the industry, but there are some who caution that the API is entirely owned by a single company (i.e., Amazon); that said, it has remained stable for several years. The Swift API, while somewhat less ubiquitous, has the benefit of being defined as an open standard and boasts a handful of unique functions not present in the S3 API. SwiftStack supports both APIs, so the choice is yours.

Concept #5 — Object metadata

Files on disk can have metadata. Examples are mp3 tags or Exif metadata internal to images and videos (by internal, we mean it's part of the data format and inside the file itself). External metadata might include the filename, content type/file extension, modification time, SELinux security context, and other attributes about the file stored in the filesystem.

Objects have metadata too; in this case, metadata exists as key/value pairs associated with an object that are user- or application-defined as needed.

Some metadata is related to the object itself, such as content-type or date-modified and may be system-generated. Other metadata can be created by the application as needed—such as information related to a specific project or customer; this could prove helpful for organization or subsequent searching for objects. Given that the bucket/object structure is flat, there is no hierarchical organization of nested folders for users to organize data, but within the flat structure, there can be multiple identifying metadata tags associated with a single object.

Concept #6 — Data Durability and Dispersal

When a client app requests an object from storage, the system will generally look for a local copy (or local data segments) in the nearest data center. If multiple copies are available for retrieval, then the system will return one of those. If one copy is unavailable (eg, if a disk failed or if a node is offline), then the system will attempt to retrieve each of the other copies until successful.

When a client app requests an object from storage, the system will generally look for a local copy in the nearest data center. If multiple copies are available for retrieval, then the system will return one of those. If one copy is unavailable (eg, if a disk failed or if a node is offline), then the system will attempt to retrieve each of the other copies until successful.

In a globally replicated cluster, if a local replica or data segments is/are unavailable, then the system will retrieve the necessary data from a remote data center to return to the client. Of course, retrieving data from a remote region means higher latency, but there are many applications for which data availability (even if an entire data center is offline!) is valuable.

When writing an object, SwiftStack ensures data is written durably before returning a success (HTTP 200) to the client application. Usually, a quorum of writes (either replicas or data+parity segments) is required to complete the transaction (with the others being asynchronously replicated as needed). This allows the system to continue accepting data into the system even in the event that one or more disks, nodes, or even data centers is offline.

Concept #7 — Eventual Consistency

According to Wikipedia, "Eventual consistency is a consistency model used in distributed computing to achieve high availability that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value." [wikipedia]

With SwiftStack, an object is often written to a local service/access point (such as a proxy node in the local datacenter), but the data may need to be replicated across multiple regions. Depending on the configuration, this may happen at the time of the write (i.e., all replicas are written simultaneously), while other configurations may write all data locally and then asynchronously replicate some to remote data centers. Therefore, writes in one region followed immediately by reads of the same objects in a remote region may take a little longer at first.

Additionally, if an object is overwritten in one region, the updated data will not be instantly present in all regions, so immediate reads in a remote region could return an outdated version. All copies will eventually be consistent with each other, but this mechanism is designed to allow the system to remain online and serving requests even when there a network partitions between regions or zones or even unavailability issues within a single area of the cluster.

Concept #8 — Storage Policies

Storage policies are typically defined by the storage operations team to control where data will be physically located within a cluster and what the durability of that data should be. For example, a SwiftStack administrator may define a policy where all data is replicated three times with one copy in each of three data centers. Another policy might leverage specific high-performance disk media such as SAS drives or SSDs and specify a performance policy for high-throughput applications. Or, compliance teams may require policies where certain data must reside only within specific data centers or geopolitical boundaries. Other policies may define erasure coding of data rather than replication.

Developers and applications can leverage these policies by creating a bucket (i.e., "container" in OpenStack Swift parlance) using a specific policy and placing data within that bucket. A bucket/container can only be associated with a single policy.

These concepts can provide options around data lifecycle management; for example, a bucket/container with a three-replica policy might be used for active data, but then an application may later move inactive data to a bucket with an archive policy using erasure coding to conserve space.

Concept #9 — Performance Expectations

With traditional NAS or SAN storage systems (described above), IOPS is a common metric for measuring performance. IOPS is short for I/O Operations Per Second and refers to how many disk operations (e.g., reads or writes) an application or operating system can perform. Disk manufacturers publish IOPS numbers related to individual disks, and RAID systems can distribute read and write requests across multiple disks to increase overall IOPS. This is useful for applications with random I/O characteristics like databases, VDI, or virtualization datastores.

SwiftStack is a scale-out object store that is not suitable for these types of workloads. Instead of random I/O like a database with many read/write/append/update operations, SwiftStack stores large amounts of unstructured data that is relatively static, and data streams are most often sequential in nature rather than random. So, SwiftStack is optimized for scaling capacity and throughput rather than IOPS.

That said, because of SwiftStack's distributed nature, interactions can be parallelized to increase throughput; that is, for a given file, multiple transfer streams can be employed to ingest or read data at much larger throughputs than traditional storage systems. It is not uncommon for applications to read or write data from/to SwiftStack at line-speed on 10Gb or 40Gb networks.

Getting Started: What you need to start coding

If you have decided to use SwiftStack cloud storage for your application, you'll need three key pieces of information to get started:

Auth URL

User account or API key

Password or API secret

Depending on your environment, you may also need the following:

Tenant or Project name

Region

The Auth URL is where everything begins: When you initially connect to a SwiftStack cluster, you contact an authorization service with your credentials (either a user/password or API keys) to receive an auth token. This authorization service may reside on a Keystone server in an OpenStack environment or use another service like SwiftStack Auth in a standalone deployment or LDAP or Active Directory in many enterprise data centers.

An Auth URL might look like either of these:

https://storage.example.com/auth/v1.0

https://auth.example.com/auth/v2.0

The v1.0 or v2.0 indicates the authorization version. Many SwiftStack deployments support both, though specific environments may prefer one or the other. SwiftStack supports both in a standalone mode, while OpenStack deployments with SwiftStack may require v2 or even v3 authorization through Keystone. Either way, use what your Ops team provides to you.

Once your initial request is authorized, the service running on the Auth URL provides a storage URL for the actual location of data and to service further API requests. The Storage URL may include another hostname entirely, and a URL that includes the storage account area matching the user or tenant.

The storage URL could look something like these:

https://storage.example.com/v1/AUTH_username

https://api.example.com/v1/TENANT_235298273509

Subsequent requests should always operate on containers and objects using this storage URL. If you are configuring an application to use with SwiftStack storage, you typically only use the Auth URL for entering your credentials but may not need to use the Storage URL directly.

SDKs and Sample Code Snippets

The following section of the SwiftStack administrators documentation includes several examples of using cURL to interact with the Swift API directly and parallel examples using an available command-line utility written in Python. (Note, this also includes documentation of the SwiftStack Controller API, which storage administrators can use to programmatically manage the SwiftStack system.)

Moving Beyond the Basics

Several instructional talks have been given to assist developers who begin using more advanced features of the Swift and S3 APIs. The following videos reinforce concepts and provide example code and explanations that may be helpful as you expand your skills.

Using Advanced SwiftStack Features

Cloud Sync

Cloud Sync allows replication of data from a SwiftStack cluster into a public cloud bucket based on preset policies. This may be useful for collaboration, "cloud-bursting," archiving, or disaster recovery. More information can be found on the SwiftStack product page, and administrative and usage details are included in the SwiftStack documentation.

Metadata Sync and Search

SwiftStack can automatically send metadata for each object to Elasticsearch, a popular open search platform, so it is indexed and searchable. Applications and users can then use that index to quickly find what they need—even among a multi-petabyte storage cloud containing billions of objects. More information can be found on the SwiftStack product page, and administrative and usage details are included in the SwiftStack documentation.

SwiftStack Auth

SwiftStack Auth is a fast and simple authentication system built into SwiftStack that can be leveraged in environments when LDAP, AD, or Keystone are unavailable, unnecessary, or inadequate. Configuration of SwiftStack Auth is typically managed by the SwiftStack storage administrator, but the available documentation will provide a developer with insight into how the authentication system works.

Delegated Auth

To support additional asset workflows, the SwiftStack Controller includes the Delegated Authorization middleware. This middleware delegates authorization to an external web service, or "Permit Server."Configuration of Delegated Auth is typically managed by the SwiftStack storage administrator, but the available documentation will provide a developer with insight into how the authentication system works.

Object Notifications

SwiftStack has developed an event system that can be used to enable applications to detect and act on object changes in the cluster, e.g. when an object is created, deleted, or its metadata is modified. Documentation for this is not yet publicly available; if you are interested in leveraging this in your application development, please contact SwiftStack directly info@swiftstack.com.

File Access

While the best way to leverage the scalability, flexibility, and added features of SwiftStack cloud storage is via its native APIs, SwiftStack also recognizes that there are many legacy applications written for the NFS and SMB/CIFS protocols that may be difficult to change. For that reason, SwiftStack has developed NFS and SMB/CIFS support for accessing SwiftStack cloud storage. This new feature is available in a "limited release" form today; if you are interested in leveraging this in your application development,please contact SwiftStack directly at info@swiftstack.com.

Example Applications

The following example applications are for demonstration purposes only; in most cases, they lack the error-handling and performance optimizations that would be appropriate in production-ready code. Nonetheless, they provide functional examples of how SwiftStack can be used from within real-world application scenarios.

Example #1 — Mapping photos using EXIF metadata

This sample web application uses three independent pieces of code to showcase some interesting possibilities enabled by SwiftStack features:

First, a piece of custom middleware identifies objects as they are uploaded to SwiftStack and calls a second piece of code to process them.

When a JPEG image is found, a second piece of code extracts the EXIF data and applies the metadata fields as object metadata; when that metadata is applied, SwiftStack's "metadata search" feature sends that metadata to an Elasticsearch instance for indexing.

The third piece of code is a simple HTML page that allows users to define a box on a Google Map and then queries the Elasticsearch index for photos with GPS coordinates within that region; then, for photos in that region, it pulls the photos themselves from SwiftStack to display.

Example #2 — Research computing automation in a BASH script

This simple script is one that SwiftStack uses commonly in demonstrations within the Life Sciences industry (see swiftstack.com/life-sciences); it aims to demonstrate—in an overly simplified manner—how data can be stored, retrieved, and leveraged throughout a genomic sequencing pipeline. In this case, the python-swiftclient command-line utility is used from within a bash script. Note in particular the use of metadata tags when objects are PUT into SwiftStack, the use of the Temporary URL (TempURL) feature to provide time-limited access to a single object, and the use of the X-Delete-After tag so that SwiftStack automatically deletes specific objects after the defined period of time. If SwiftStack's "metadata search" feature is used to index the object metadata in Elasticsearch, it's trivial to develop a dashboard and search front-end for that data as well.

The following snippet uploads all files that start with the prefix defined in variable $DEMO_ID from a local folder $SQR_DIR to a SwiftStack bucket named $SCRATCH_CONTAINER, and the objects are set to be automatically deleted after 300 seconds.

This simple bash script uses widely available tools to automatically archive Avid shared environments to SwiftStack-leveraging both the cloud APIs and file protocols supported by SwiftStack. It also builds a searchable index of media content in Elasticsearch.

SwiftStack Client: A simple UI if you need one

If you’d like a simple and free but powerful GUI tool to access and interact with SwiftStack storage as you develop your application, visit https://www.swiftstack.com/downloads to download the SwiftStack Client. If it’s any inspiration for you, this client application is written entirely using node.js using the Electron framework and the OpenStack Swift API for interaction with SwiftStack storage!

Questions?

If you can't find what you need on this page or the resources linked from it, let us know. We'd be happy to help you with your application. You can reach us at info@swiftstack.com.