Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Without capable data analytics, it is practically impossible to compete in any contemporary business sector. But to do analysis right, first, you need to achieve centralized and easily queryable data. The problem that many companies face is that their data is siloed. In other words, it resides in numerous different locations — each potentially with its own data model and query language. Perhaps each department in the company maintains its own dataset or maybe a whole new group of additional databases was gained in an acquisition. In either case, the fact remains that having as few as two disparate data stores can make it difficult to easily perform analyses.

Data Warehouses

The definitive solution to siloed data is a data warehouse (DWH), which, by uniting separate data stores, allows you to query from a central location and enables connections to business intelligence and visualization tools like Tableau and Qlik. Most DWHs utilize massively parallel queries, which divides querying control among the respective servers in a setup. This differs from a search on a traditional database cluster, which uses a more centralized control structure, and it means that a practically unlimited number of servers can be added to a DWH setup.

A primary advantage of a DWH is that operational and analytical data operations are separately maintained, the former in your original data stores and the latter in your DWH. This means that analytical operations don’t add additional workloads to your operational databases and alternately that you can stockpile large amounts of historical data in your DWH. This stockpiling will eventually allow you to employ sophisticated analyses on “big” amounts of data — like machine learning.

There is no doubt that a DWH solution can scale and empower the growth of a business and its data. So, the significant question is: Which data warehouse to choose?

TaranHouse

A recent entrant to the DWH market is TaranHouse, an economical solution that is intended for businesses that are just starting to feel the limitations of traditional relational solutions and that are ready to experiment with the advantages that a DWH can bring. In fact, such businesses would see instant speed gains just by dumping and restoring a single relational database into TaranHouse. Consider, for example, that a moderately complex SQL query performed against TaranHouse outperforms the same query performed against a relational database by approximately a factor of five. And calling TaranHouse directly from Tableau in “live mode” is a completely different experience than calling from a relational database or Tableau’s embedded server (have a look at the video below to see the above TaranHouse scenarios in action).

TaranHouse can reside in a cloud of your choosing, on-prem, or a combination of the two. With TaranHouse cloud, your data is safe, as all data is replicated to three data centers and any node failure triggers an automatic failover. And unlike competing products that may require proprietary hardware, TaranHouse can be infinitely horizontally scaled on commodity servers for significant cost savings. Additionally, TaranHouse offers many user-selected parameters, such as the ability to choose between row and column-based configurations, and between ODBC, JDBC, Python, and R connectors for BI tools. Finally, note that TaranHouse is designed to be combined with the Tarantool in-memory database for real-time monitoring alerts and dynamic decision-making.

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.