Saturday, October 10, 2009

Cloud'ed Thoughts

Each one of you has a different view (PaaS, services, testing, startup, management) in the domain. A 5-minute warmer on your take on cloud computing based on your current work will be great. This will set the stage nicely for the discussion.

There are many “definitions” of cloud computing but for me “Cloud Computing is the fifth generation of computing after Mainframe, Personal Computer, Client-Server and the Web.” Its not often that we have a whole new platform and delivery model to create businesses on. And what's more its a new business model as well – using a 1000 servers for 1 hour costs the same as using 1 server for 1000 hours – no upfront costs, completely pay as you go!

How has cloud computing suddenly creeped on us and become technologically and economically viable? Because of 3 reasons:

Use of commodity hardware and increased software complexity to manage redundancy on such hardware. The perfect example of such softwares is virtualisation, MapReduce, Google File System, Amazon's Dynamo, etc.

Economies of scale. In a medium sized data center it costs $2.2 /GB/month while in a large data center it costs $0.40/GB/month. That is a cost saving of 5.7 times which cloud computing vendors have been possible to pass on to the customers. In general, cloud infrastructure players can avail 5 to 7 times decrease in cost.

The third and according to me the most important reason: there was a need to scale for many organizations but not the ability to scale: As the world became data intensive, players realized that unless scalable computing, scalable storage and scalable software was available, their business models won't scale. Consider analytics as an example. Some years back it was possible for mid-sized companies to mine the data in their own data center but with data doubling every year they have been unable to keep up. They have decided to scale out to the cloud. Amazon, Google realized this from their own needs very early and look here we are eating their dog-food!

Developers with new ideas for innovative internet services no longer require large capital investments in hardware to deploy their service. They can potentially go from 1 customer to 100k customers in a matter of days. Over-provisioning or under-provisioning is no longer a factor if your product is hosted on cloud computing platforms. This enables small companies to focus on their core competency rather than worrying about infrastructure. This enables a much quicker go-to-market strategy.

Another advantage is that clouds are available in various forms:

Amazon EC2 is as good as a physical machine and you can control the entire software stack.

Google AppEngine and salesforce.com are platforms which are highly restrictive but good for quick development and allows the scaling complexity to be handled by the platform itself.

Microsoft Azure is at an intermediate point between the above two.

So depending on your needs, you can choose the right cloud!

As I said earlier its a new development environment and there is lot of scope for innovation which is what my company “Clogeny” is focusing on.

Cloud computing is not just about “compute” – it is also storage, content distribution and a new way of visualizing and using unlimited storage. How has storage progressed from multi-million dollar arrays and tapes to S3 and Azure and Google Apps?

I remember that when I started writing filesystems I needed to check for an error indicating that the filesystem was full. It just struck me that I have no need for such error checking when using cloud storage. So yes, its actually possible to have potentially infinite storage.

Storage: Storage arrays have grown in capacity and complexity over the years to satisfy the ever-increasing demand for size and speed. But cloud storage is pretty solid as well. Amazon, Microsoft and most other cloud vendors keep 3 copies of data and atleast 1 copy is kept at a separate geographical location. When you factor this into the costs, cloud storage is pretty cheap. Having said that, cloud storage is not going to replace local storage, fast and expensive arrays will still be needed for IOPS and latency hungry applications. But the market for such arrays may taper off.

Content Distribution: A content delivery network is a system of nodes in multiple locations which co-operate to satisfy requests for content efficiently. These nodes move the content around to serve it optimally where the node nearest to the user, serves the request. All the cloud providers offer content distribution services thereby improving reach and performance since requests can be served around the world from the nearest available server. This makes the distribution extremely scalable and cost efficient. The fun part is that the integration between cloud and CDN is seamless and can be done through simple APIs.

Visualizing storage: Storage models for the cloud have undergone a change as compared to the POSIX model and relational databases that we are used to. The POSIX model has given way to a more scalable flat key-value store in which a “bucket-name, object-name” tuple points to a piece of data. There is no concept of folder and files that we are used to. Note that for ease of use a folder-file hierarchy can be emulated. Amazon provides SimpleDB, a non-traditional database which is again easier to scale but your data organization and modeling will need to change when migrating to SimpleDB. MapReduce is a framework to operate on very large data sets in highly parallel environments. MapReduce can work on structured or unstructured data.

Consider this as an example, there is a online photo sharing company called SmugMug which estimates that it has saved $500,000 in storage expenditures and cut its disk storage array costs in half by using Amazon S3.

CC breaks the traditional models of scalability and infrastructure investment, especially for startups. A 1-person startup can easily compare with an IBM or Google on infrastructure availability if the revenue model is in place. What are the implications and an example of how?

Definitely, startups need to only focus on their revenue model and implementing their differentiators. The infrastructure, management and scaling are inherently available in a pay as you go manner so that ups and downs in traffic can be sustained. For examples, some sites get hit by very high traffic in first few weeks and need high infrastructure costs to service this traffic. But then the load tapers off and infrastructure lies unused. This is where the pay as you go model works very well. So yes, cloud computing is a leveller fostering many start-ups.

Also many businesses are using cloud computing for scale-out whereby their in-house data center is enough to handle certain amount of load but when load goes beyond a certain point they avail the cloud. Such hybrid computing is sometimes more economically viable.

Xignite employs Amazon EC2 and S3 to deliver financial market data to enterprise applications, portals, and websites for clients such as Forbes, Citi and Starbucks. This data needs to be delivered in real-time and needs rapid scale up and scale down.

What do you see when you gaze in the crystal bowl?

Security is a concern for many customers but consider that the most paranoid customer – the US government has started a cloud computing initiative called “App.gov” where they are providing SaaS applications for federal use. Even if there are some issues, they are being surmounted as we speak. Cloud computing has now reached a critical mass and the ecosystem will continue to grow.

In terms of technology, I believe that there will be some application software running on-premise and another piece running on the cloud for scaling out. The client part can provide service in case of disconnected operations and importantly can help to resolve latency issues. Most cloud computing applications will have in-built billing systems that will either be a standard or software that both the vendor and customer trust. I would love to see some standards emerging in this space since that will help to accelerate acceptance.

“Over the long term, absent of other barriers, economics always wins!” and the economics of cloud computing are too strong to be ignored.

7 comments:

Could you give some detail on how virtualization helps in cloud computing and reduces the stack? I am also curious about the market segmentation..is Force winning the race among SMEs and Amazon more popular among smaller units? How is IBM doing in this space? Congratulations on Clogeny! Starting out, what are you working on?

* Virtualization enables to improve utilization and hence the ROI. Virtualization provides flexibility with which a server can be "sold" as 3 high-end VMs or 30 low-end VMs. It also allows fast provisioning and setting up of machines.

* Segmentation is in the form of IaaS, PaaS and SaaS. Force.com and Google AppEngine are PaaS offerings while Amazon/Rackspace are IaaS offerings. Both are popular in their own way. They cannot be compared since they solve pretty different problems. I think all of the 3 will continue to grow and we may see more specialized offerings as well.

* IBM came up with a cloud email service a few days back. IBM is coming up with many cloud products, have a look here - http://www.ibm.com/grid/

Hi! Your blog is simply super. you have create a differentiate. Thanks for the sharing this website. it is very useful professional knowledge. Great idea you know about company background. Customized application development

Can anyone recommend the well-priced MSP system for a small IT service company like mine? Does anyone use Kaseya.com or GFI.com? How do they compare to these guys I found recently: N-able N-central service management? What is your best take in cost vs performance among those three? I need a good advice please... Thanks in advance!