Google Launches Cloud DataProc, A Tool To Simplify Spark And Hadoop

Google today announced a new service named Cloud Dataproc for big data management. The tool, according to the tech giant, is a managed Spark and Hadoop service. It will help you create clusters quickly, manage them easily and it is also going to be economical as it will allow you to turn clusters off when you don’t need them.

The tool will allow you to take advantage of open source data tools for batch processing, querying, streaming and machine learning, according to Google. The new service simplifies data creation, configuration and management and will also get your work done in a very short span of time.

The tool works very fast. And by fast, I mean that Cloud Dataproc clusters can be started, scaled and shutdown in an average of 90 seconds per operation. Standard Spark and Hadoop clusters on-premises or through IaaS providers, in comparison, could take from 5 to 30 minutes.

The Google Cloud Dataproc is priced at a mere 1 cent per virtual CPU in your cluster per hour, on top of the other Cloud Platform resources that you might be using. Your clusters can also include preemptible instances that have lower compute prices, reducing your costs even further. Also, the service will be charged for a minimum of 10 minutes instead of rounding off the time of usage to the nearest hour.

James Malone, Product Manager at Google, in the blog post announcing the service also states that the new tool has built-in integration with other Google Cloud Platform services, such as BigQuery, Cloud Storage, Cloud Bigtable, Cloud Logging and Cloud Monitoring.

Also, as the tool uses the traditional Spark and Hadoop distributions, you can expect it to be completely compatible to virtually any existing Hadoop-based products. This means that porting your existing workloads to the new service should be seamless.