Welcome to the Cloud Computing Applications course, the first part of a two-course series designed to give you a comprehensive view on the world of Cloud Computing and Big Data!
In this first course we cover a multitude of technologies that comprise the modern concept of cloud computing. Cloud computing is an information technology revolution that has just started to impact many enterprise computing systems in major ways, and it will change the face of computing in the years to come.
We start the first week by introducing some major concepts in cloud computing, the economics foundations of it and we introduce the concept of big data. We also cover the concept of software defined architectures, and how virtualization results in cloud infrastructure and how cloud service providers organize their offerings. In week two, we cover virtualization and containers with deeper focus, including lectures on Docker, JVM and Kubernates. We finish up week two by comparing the infrastructure as a service offering by the big three: Amazon, Google and Microsoft.
Week three moves to higher level of cloud offering, including platform as a service, mobile backend as a service and even serverless architectures. We also talk about some of the cloud middleware technologies that are fundamental to cloud based applications such as RPC and REST, JSON and load balancing. Week three also covers metal as a service (MaaS), where physical machines are provisioned in a cloud environment.
Week four introduces higher level cloud services with special focus on cloud storage services. We introduce Hive, HDFS and Ceph as pure Big Data Storage and file systems, and move on to cloud object storage systems, virtual hard drives and virtual archival storage options. As discussion on Dropbox cloud solution wraps up week 4 and the course.

CK

This course had great content (best of the first 3) and covers a lot of the key technologies used in cloud systems.

MD

May 06, 2018

Filled StarFilled StarFilled StarFilled StarFilled Star

Great Course! It gave me solid foundation in understanding cloud computing and its applications.

From the lesson

Module 4: Storage: Ceph, SWIFT, HDFS, NAAS, SAN, Zookeeper

Welcome to the last and final module of the cloud computing course! So far we have covered various methods of running certain computations on the cloud. Now it's time to focus on data storage in the clouds. In this module, we introduce big data and cloud file systems such as HDFS and Ceph, cloud object stores such has Open Stack Swift or Amazon S3, virtualized block storage devices such as Amazon EBS and archival storage options like the Amazon Glacier. Finally, we conclude the module with introducing the DropBox cloud API that enables developers to quickly integrate cloud storage options in their applications.

Taught By

Roy H. Campbell

Reza Farivar

Transcript

[SOUND] In this video, we will talk about another storage, cloud storage service provided by Amazon. And, of course, we talk about Amazon in this case because it's kind of a common cloud provider. But it's really a case study, right? So other cloud providers provide similar services. Now, Amazon Glacier is an archival storage service. This is, unlike the previous two types of Amazon storage services we talked about, Amazon S3, which was a file, it was really an object storage system that would provide you buckets and you can just put a whole file in it or not. And then the second type was block storage based services. So we had Amazon EBS, Elastic Block Store, and instant store, as that kind of mimic a hard drive. Now, Amazon AWS Glacier, if you want to actually extend that analogy, it basically mimics a tape drive, right? So typically, in enterprises, when you want to create an archive of your data, especially if it's large amount of data, you use a tape drive system to backup your archives. You take out the tape drive, you drive, you put it some sort of a very secure storage environment, maybe like in an old mine or something, abandoned mine or something, and you don't touch it, right? Any time that you want to get that data out, again, you have to send somebody to go grab that tape, come back, put the tape in the machine, then download it. So a couple hours of latency required to access your data. So it's basically the same sort of idea here for Amazon Glacier. It's very low cost, it's only $0.007 per GB, and now in 2016, per month. So it's very low cost, it's very durable so the facilities that are designed to store your data on Amazon doesn't quite clarify whether it's really tape or not, but it probably is. They guarantee an average annual durability of 99., a whole bunch of 9s. So it's very durable and it's guaranteed for your files to remain there pretty much forever. Each single archive that you want to put in Glacier can archive up to 40 TB of information. And then, of course, archives are themselves organized in terms of vaults and you can have as many vaults as you want, or a whole bunch of vaults per user account. So basically, if you want to archive a huge amount of data, you can just use Amazon Glacier. Now the main access point to Amazon Glacier is Amazon S3. So you don't directly read files from Glacier or you don't directly write to a file from your application or web service or whatever, into Amazon Glacier. You do your work with S3, and then once your data in S3 is kind of old. I mean, you still keep your data in S3 up to, I don't know, maybe a month or two or three. And once you're pretty sure that, okay, the data's pretty much old, nobody's ever going to use it. Pretty much, nobody's ever going to use it again, but you have to still keep it for maybe regulator reasons or what not, then you move them to Glacier. So, for example, when you put data in Glacier, and you want read them back, it typically takes between three to five hours to prepare your download requests, right? So you say, hey, I want that archive. It takes five hours for you to be ready. And after that, you have a 24-hour window to download it from the staging location or the file will be removed from the staging location and then you have to start a new request again. And all of these, of course, add costs. So it's very inexpensive storage, only for the storage. If you want to use it, whole bunch of cost will get added on. So you use Glacier for data that you really never want to touch again. So just taking a quick look at this diagram from Amazon itself. So you see you can, to access data on Glacier, you create a vault to be able to figure out who can access or not. You can use the Amazon sort of IAM user access management tools. And basically, what you do is work with your data inside S3 and then you upload your archives to Glacier. And any time you want to actually download them back it takes, again, as it says on the diagram, three to five hours to initiate the job and track job and then five hours after that your download job is ready to go. So again, to wrap up this service, basically it's only for archive. Do not use it for day-to-day activities. Only really use it, the best use case for this is where you want to store data, probably never touch the data again, you need to store the data for regulator reasons, that's when you use Amazon Glacier. [MUSIC]

Explore our Catalog

Join for free and get personalized recommendations, updates and offers.

Coursera provides universal access to the world’s best education, partnering with top universities and organizations to offer courses online.