Featured in AI, ML & Data Engineering

In this article, author shows how to use big data query and processing language U-SQL on Azure Data Lake Analytics platform. U-SQL combines the concepts and constructs both of SQL and C#. It combines the simplicity and declarative nature of SQL with the programmatic power of C# including rich types and expressions.

Featured in Culture & Methods

The book Agile Leadership in Practice - Applying Management 3.0 by Dominik Maximini is an experience report of the agile transformation journey of NovaTec. Maximini shares his experiences from applying principles and practices from Management 3.0, success stories, failure stories, and learnings from experiments.

Featured in DevOps

Yuri Shkuro presents a methodology that uses data mining to learn the typical behavior of the system from massive amounts of distributed traces, compares it with pathological behavior during outages, and uses complexity reduction and intuitive visualizations to guide the user towards actionable insights about the root cause of the outages.

GridGain recently announced that the GridGain In-Memory Data Fabric has been accepted into the Apache Incubator program under the name "Apache Ignite."

Earlier this year, GridGain was transformed to an open source model through Apache 2.0 license. Now, the product will be available under the Apache Foundation project portfolio. The goal for contributing the framework to Apache group, is to create a community driven development of the in memory data and compute solution, beyond just open source consumption.

InfoQ caught up with Nikita Ivanov, CTO and founder of GridGain, about the In-Memory Computing framework becoming an Apache project, motivation behind this decision, and upcoming features and enhancements of GridGain.

InfoQ: What is the reason to become an Apache project now after GridGain was recently made open source?

Earlier ​this year, GridGain finally back ported many extensions and improvements that were made in the commercial product over the last couple of years to its open source product, and changed the license of its open source project to Apache 2.0. We quickly realized that the project was gaining significant following and interest from the global community due to the 2000% increase in downloads since going open source in March 2014. The further move to ASF - as Apache Ignite - was the natural and logical next step in this development for several reasons. Not only does being a part of the Apache Software community promote continued adoption of the GridGain core code base, but it also encourages additional adaptation and rapid innovation within the developer community, as well as ensures long term viability of this code base to its growing user base. There's simply no better place to grow and foster a community of developers and users than with the Apache Software Foundation, specifically for infrastructure software such as GridGain’s.

InfoQ: Which parts of GridGain’s product are being contributed to Apache? Is it only the In Memory Computing framework or In Memory Data Store as well?

​Over 90% of the code base is contributed to ASF, which means that all core capabilities of the GridGain In-Memory Data Fabric including the data grid, compute grid and streaming engine will be part of the Apache Ignite code base. What's left is what defines the commercial product that GridGain continues to develop, and where the innovation will focus on high-end enterprise features such as security, data center replication, advanced management and monitoring, etc.​

InfoQ: What is the license model for the new Apache initiative? How would the Apache project be managed compared to GridGain open source as well as commercial products?

​There's no difference between Apache Ignite​ and any other ASF projects. The license is Apache 2.0 and the project will be managed by the PMC consisting of GridGain contributors as well as outside committers. The commercial product will continue to be made available via GridGain’s enterprise licensing program.

InfoQ: What is the timeline for the new initiative to get out of the incubation status?

​We are expecting a first code drop to be made by end of the year, and are expecting Apache Ignite to graduate to a top level project within 12 months. ​

InfoQ: How does GridGain work with Hadoop which is already a popular choice for data processing needs?

​Apache Ignite and Hadoop (including Spark) solve different problems, even though they may to some extent leverage a similar underlying base technology, such as in the case of Spark. Apache Ignite is all about a multi-purpose, OLAP/OLTP in-memory data fabric, and Hadoop is just one of the many data sources that it supports (and accelerates) natively.

InfoQ: How does GridGain fit in the emerging data analytics tool landscape where frameworks like Apache Spark (specifically Spark Streaming) are already supporting memory based real-time data processing use cases?

​Spark is sort of a sister project to Apache Ignite. While Spark is squarely focused on OLAP world, the Apache Ignite project excels for newly emerging hybrid, OLTP/OLAP use cases with its industry leading transactional capabilities. Specifically with respect to Hadoop, Apache Ignite will provide plug-and-play acceleration for existing M/R, Pig or Hive jobs, avoiding a rip and replace approach, while Spark requires data ETL and is more suitable for newly written analytics applications. ​

InfoQ: Can you discuss the upcoming features and enhancements in the In Memory Data Fabric product?

​More plug-and-play capabilities, more integration and more simplification are the main themes you will find on the Apache Ignite roadmap initially, to further accelerate adoption and easy adaptation of the core Apache code base.

Nikita also talk about how GridGain could fit in the Big Data management and analytics landscape along with the tools like Hadoop.

Every day, we see new and interesting use cases for in-memory technologies in areas where you might not have expected them even a few years ago – such as the real-time tracking of the NYC marathon runners the other day. We believe strongly that new classes of transactional, analytical and hybrid real-time applications will emerge that will allow even the smallest organizations to gain a competitive advantage from fast, data-driven decisions and operations. And we are convinced that community-driven adoption and adaptation of in-memory data fabric technology will play a big part in driving that innovation, making Apache Ignite for the Fast Data world of the future what Hadoop is for Big Data today.

About the Interviewee

Nikita Ivanov is the founder and CTO of GridGain Systems, started in 2007 and funded by RTP Ventures and Almaz Capital. Nikita has led GridGain to develop advanced and distributed in-memory data processing technologies – the top Java in-memory computing platform starting every 10 seconds around the world today. Nikita has over 20 years of experience in software application development, building HPC and middleware platforms, contributing to the efforts of other startups and notable companies including Adaptec, Visa and BEA Systems. Nikita was one of the pioneers in using Java technology for server side middleware development while working for one of Europe’s largest system integrators in 1996.