Category: Big Data

Josh Klahr is vice president of product management at AtScale. Over the past few years, there has been a subtle but significant shift in the way that data is structured in databases. Whereas yesterday’s databases were typically limited to storing data in rows and tables, today’s modern databases often make use of nested data structures.Read More

The Open Compute Project’s open hardware standards have done much to push forward the development of cloud-scale hardware. By sharing designs for connectors, racks, servers, switches, and storage hardware, the OCP has defined a new generation of data center technologies and made them widely available – and able to be manufactured at the scale theRead More

Recently, I visited a few conferences and I noticed a somewhat hidden theme. While a lot of attention was being paid to moving to a (hybrid) cloud-based architecture and what you need for that (such as cloud management platforms), a few presentations showed an interesting overall development that everybody acknowledges but that does not getRead More

InfoWorld Machine learning, and especially deep learning, have turned out to be incredibly useful in the right hands, as well as incredibly demanding of computer hardware. The boom in availability of high-end GPGPUs (general purpose graphics processing units), FPGAs (field-programmable gate arrays), and custom chips such as Google’s Tensor Processing Unit (TPU) isn’t an accident,Read More

Venerable Shogun was created in 1999 and written in C++, but can be used with Java, Python, C#, Ruby, R, Lua, Octave, and Matlab. The latest version, 6.0.0, adds native support for Microsoft Windows and the Scala language. Though popular and wide-ranging, Shogun has competition. Another C++-based machine learning library, Mlpack, has been around only since 2011,Read More

We’ve all heard of exciting new technologies in the data warehouse world—tools like Amazon Redshift, Google BigQuery, and more recently Azure SQL Data Warehouse. What would you call this category of tools? Well, of course, “cloud data warehouse.” Check out the Google Trends graph for this search term. Explosive growth. Gilad David Maayan But lookRead More

H2O, now in its third major revision, provides access to machine learning algorithms by way of common development environments (Python, Java, Scala, R), big data systems (Hadoop, Spark), and data sources (HDFS, S3, SQL, NoSQL). H2O is meant to be used as an end-to-end solution for gathering data, building models, and serving predictions. For instance,Read More

Apache Kafka is an open source, distributed, scalable, high-performance, publish-subscribe message broker. It is a great choice for building systems capable of processing high volumes of data. In this article we’ll look at how we can create a producer and consumer application for Kafka in C#. To get started using Kafka, you should download KafkaRead More