The Panda Kaltura AWS Cluster – S3 and Cloudfront

In the previous posts we have discussed the various Amazon AWS services and how we use them for operating Kaltura CE clusters. In this post, we will discuss the last part of our cluster: the video storage and streaming. A Kaltura deployment usually stores the videos locally on the file system. While this is good for small installations, sometimes you want the flexibility and durability of a remote storage.

Simple Storage Service (S3)

Amazon S3 is a scalable storage service. It enables REST access and has a permission management system.

Concepts

Buckets - a bucket is the highest level container used in S3. Every object in S3 is stored in a bucket. The bucket name is unique and defines the bucket access URL. For example if your bucket is called panda then the bucket URL is http://panda.s3.amazonaws.com. Buckets also have a role in access control and usage reporting. Buckets can be configured to be region specific, and can also assign unique ids to objects stored in them.

Objects – object are similar to files in that they are the fundamental units of data. They also have default HTTP metadata, such as content-type, and you can add more. Objects also have versions.

Keys – Keys identify objects in a bucket. A combination of bucket, key and version uniquely identifies an object.

In our Kaltura AWS cluster we use S3 to store videos. When a video is uploaded to our Kaltura cluster, we store it locally on an NFS volume. Once the file is converted to all the required flavors, it is automatically exported by our Kaltura server to an S3 bucket.

S3 enables you to receive access logs to a bucket. We use this to provide bandwidth reports on different partners.

CloudFront Content Delivery Network

CloudFront is a web service that speeds up distribution of your static and dynamic web content to end users. CloudFront delivers your content through a worldwide network of edge locations. When an end user requests content that you’re serving with CloudFront, the user is routed to the edge location that provides the lowest latency, so content is delivered with the best possible performance. If the content is already in that edge location, CloudFront delivers it immediately. If the content is not currently in that edge location, CloudFront retrieves it from an Amazon S3 bucket or an HTTP server (for example, a web server) that you have identified as the source for the definitive version of your content.

Concepts

Objects - Files to be delivered by CloudFront. Can be anything that can be served over HTTP or Adobe RTMP.

Origin Server – The location of the original version of your objects, for example an S3 bucket or an HTTP server.

Distributions – A distribution tells cloudfront where your objects are. A distribution comes with a domain name you can use to access objects such as d111111abcdef8.cloudfront.net. You can also associate your own domain names to a distribution. There are two types of distributions:

A download distribution – delivers content using HTTP(S) from up to 10 buckets and custom origins.

A streaming distribution – delivers media using Adobe Flash Media Server and RTMP from a bucket.

Expiration - by default an objects expires after 24 hours. After that, the origin is checked for newer versions of the object.

Costs

You pay for the storage in the origin server (such as S3), copying the objects to edge locations, and serving objects from edge locations (which are lower than serving directly from S3). See more here.

You can have CloudFront access logs delivered to an S3 bucket.

At Panda OS, we usually use CloudFront for delivery of videos to clients. This provides performance and scalability benefits, and integrates well with S3.

Notes

Reduced Redundancy Storage (RRS) can help lower costs. It is ideal for easily reproducible data like thumbnails and transcoded videos. The data is replicated less, but there’s still a 99.99% durability of objects per year (0.01% objects lost over a year).

You can choose the geographical Region where Amazon S3 will store the buckets you create. You might choose a Region to optimize latency, minimize costs, or address regulatory requirements. For example, if you reside in Europe, you will probably find it advantageous to create buckets in the EU (Ireland) Region.

You can use BitTorrent to distribute content at high scale (read more here).

About Us

PandaOS is a freelance open-source software development and consulting agency. We specialise in web applications, online video, CMS, and much more!
Feel Free to Contact Us. We are always available for new projects and ideas.