Why The Open Data Platform Is Such A Big Deal For Big Data

Today, fifteen industry leaders in the big data space announced the intent to create a new industry initiative, identified as the Open Data Platform (“ODP”), to promote open source-based big data technologies and standards for enterprises building data-driven applications (opendataplatform.org). The initial group of member companies include Platinum members GE, Hortonworks, IBM, Infosys, Pivotal, SAS, a large international telecommunications firm, and Gold members AltiScale, Capgemini, CenturyLink, EMC, Splunk, Verizon Enterprise Solutions, Teradata, and VMware.

Born from the playbook Pivotal used just a year ago to leverage open source and open collaboration to accelerate Cloud Foundry into becoming the biggest open source success in recent years, Open Data Platform promises to do the same for the Apache Hadoop® ecosystem and big data, and do it quickly.

Everything Starts With Open Source

Last year, Pivotal scribed its open source manifesto, detailing why open source is pivotal to the success of any technology. From recruiting top talent to accelerating adoption, feedback and innovation, open source has long since proven that no proprietary technology can compete with a viable open source alternative.

However, while single technologies have thrived with open source, ecosystems naturally lag in development without an organizing force. By openly joining forces with the leading vendors, service providers and users of Apache Hadoop® to focus specifically on the needs of the enterprise, the Open Data Platform aims to reduce fragmentation and accelerate developments and innovation across the Hadoop ecosystem.

Open Collaboration: A Rising Tide That Lifts All Boats

A thriving ecosystem is the key for real viability of any technology. With lots of eyes on the prize, the technology becomes more stable, offers more capabilities, and importantly, supports greater interoperability across technologies, making it easier to adopt and use, in a shorter amount of time. By creating a formal organization, the Open Data Platform will act as a forcing function to accelerate the maturation of an ecosystem around Big Data.

Of course, the caliber of the members of the organization are also very important. The members have to have relevant expertise and investment in the area. They also should be looking at the challenges from a variety of angles, balancing the views of consumers of the technologies with providers. This is why, when we set out to recruit for the Cloud Foundry Foundation, we recruited a variety of tech-savvy companies, from software giants like IBM, EMC and SAP to service providers like Savvis, Rackspace and NTT and industry leading consumers of PaaS like Monsanto, eBay, and BNY Mellon.

For the Open Data Platform, the first wave of members combines heavy-weight brands across Hadoop software providers including EMC, Hortonworks, IBM, Pivotal, Teradata, Splunk and VMware; service providers like AltiScale, CenturyLink, and Verizon Enterprise Solutions; advanced ISV’s like CapGemini, Infosys, and SAS; and, finally, leading Hadoop consumers like General Electric and another large international telco. This is just the first wave, and as an open foundation, we expect to expand the ranks quickly.

Once working under the foundations framework, each of these companies will pool resources and efforts in cooperation, eliminating redundancies and establishing a clear and agreed way for us all to work. Simply put, this creates operational efficiencies across an entire ecosystem. More investment will flow into the standardized open source, and more innovation and interoperability will flow out of the vendors in the ecosystem, accelerating benefits for all.

First Goals for the Open Data Platform Initiative

Translating this into real tactics and benefits, look for significant progress on 3 milestones toward a successful ecosystem in the Open Data Platform’s first year:

An industry standard and open data management core. Initially focused on Apache Hadoop®, the Open Data Platform will develop and promote a set of open, enterprise focused Hadoop® standards and technologies. This translates to immediate benefits that will increase stability, capabilities, and compatibility among Hadoop® distributions.

Certifying a common reference core. The Open Data Platform will deliver a certified, packaged, and tested reference core–giving the industry a coveted “test once, use everywhere” solution. With the entire industry enabled to create big data offerings using this reference and consistent implementation, software applications will be more likely to run on any distribution based on the Open Data Platform’s Hadoop® core, reducing risk and vendor lock-in while focusing vendor resources toward more innovation.

More support and contributions for the Apache Software Foundation. The Open Data Platform is expected to be complementary and beneficial to the efforts and stewardship of the Apache Software Foundation (ASF), using the existing ASF processes to contribute code, perform testing, integration, infrastructure support as well as increase participation in events and collaboration with the developer community.

The Future Is Near

Today’s announcement is about an organization that will be created in the near future. However, progress is not waiting for the Open Data Platform to stand itself up. It assembles many partners who are already working together on big data initiatives. GE helped get Pivotal started specifically to tackle modern challenges of combining big data and the Internet of Things (IoT), with results stacking up to save trillions in the next few years. Hortonworks and Pivotal announced today that they will be combining efforts to support Hadoop distributions and partner on data lake technologies. Real code contributions are also prepared, with Pivotal open sourcing our SQL on Hadoop engine called HAWQ, allowing it to run across any distribution of Hadoop based on the Open Data Platform Core.

Editor’s Note: Apache, Apache Hadoop, Hadoop, and the yellow elephant logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.

About the Author

Scott Yara is Pivotal’s President and Head of Products
Scott co-founded Greenplum and previously served as Senior Vice President of EMC’s Greenplum Division. He’s credited with having a deep knowledge of product development and go-to-market—spanning engineering, product management, and marketing.