Amazon Debuts Low-Cost, Big Data Warehousing

Amazon Web Services (AWS) on Wednesday announced Amazon Redshift, a cloud-based data warehouse service that it says will deliver better scalability and performance than conventional on-premises data warehouses at dramatically lower costs.

"We did the math and found that it generally costs between $19,000 and $25,000 per terabyte per year, at list prices, to build and run a good-sized data warehouse on your own," stated AWS Evangelist Jeff Barr in a blog on the announcement. "Amazon Redshift, all-in, will cost you less than $1,000 per terabyte per year."

Promising more than a cost advantage, Amazon said its managed service approach also liberates data warehouse administrators from the tasks of monitoring, tuning, doing backups, patching software and recovering from faults. Users launch and manage Redshift nodes and clusters from the AWS Management Console, and Amazon said they can start with a few hundred gigabytes and scale up to more than a petabyte.

Redshift is based on relational database technology, so it uses SQL as its query language and is compatible with existing BI tools. It's pretty clear that the database in question is ParAccel, as Amazon is an investor in that company and statements about Redshift acknowledge licensing key technology from the company.

ParAccel's database includes advanced features such as columnar data storage and advanced compression, but these are also offered by competitors including EMC Greenplum, HP Vertica and Teradata, and they are promised in the next release of Oracle Database. Despite Amazon's "ten times faster" claim, performance will clearly vary depending on the workload and the "conventional database" point of comparison.

The distinction between the previously available Amazon Relational Database Service (RDS) and Redshift is that the latter is exclusively for warehousing and analytics (as opposed to transactional database uses) and is capable of big-data scale. "RDS is based on Microsoft SQL Server, Oracle and MySQL, and those aren't systems that are designed to do petabyte-scale data warehousing," said Jaspersoft's Karl Van den Bergh, VP of product and alliances. Jaspersoft is one of two initial business intelligence partners on Redshift, along with MicroStrategy, though Amazon said that other BI partners will soon follow.

Despite the potential for big data analysis, Amazon seemed intent to highlight the potential for small and midsize companies to get into data warehousing at a very low cost. Customers can spin up two node types, including either 2 terabytes or 16 terabytes of compressed customer data per node. Pricing starts at $0.85 per hour for a 2-terabyte data warehouse. Reserved-instance pricing lowers the price to $0.228 per hour, or under $1,000 per terabyte, per year, according to Amazon.

"Like anything that Amazon does, they're disrupting the market and offering something that nobody else has been able to offer from a cost-value perspective," said Van den Bergh. "This is a big deal for the data warehousing space, so it will be interesting to see how much uptake it gets."

One thing Amazon doesn't address in detail on its Redshift site is just how companies large and small will upload and synchronize their data with Redshift. Uploading data from one source isn't complicated, but the delays and complexities of data movement multiply as the number of sources increases. Presumably, BI systems will also have to operate in the cloud in order to avoid the potentially time-consuming step of moving data back and forth between on-premises systems and the cloud.

Amazon representatives were not available for comment at press time, but InformationWeek will follow up with deeper analysis of Redshift capabilities and how it might impact the data warehousing industry.

Predictive analysis is getting faster, more accurate and more accessible. Combined with big data, it's driving a new age of experiments. Also in the new, all-digital Advanced Analytics issue of InformationWeek: Are project management offices a waste of money? (Free registration required.)

Amazon's Redshift announcement validates that enterprises are ready for cloud-based big data warehousing solutions. XtremeData, also available on Amazon as well as other clouds, is targeted for organizations that need a massively scalable DBMS solution for mixed read and write workloads, for example, with serious ELT. Redshift (a column-store licensed from ParAccel) is well-suited for read-only data marts of all sizes. The market is rapidly moving to a tipping point where the specialized solutions available on premise are becoming available on the cloud, Amazon and others.

ITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.