Hortonworks » Venkatesh Seetharamhttp://hortonworks.com
Develops, Distributes and Supports Enterprise Apache Hadoop.Sun, 02 Aug 2015 15:42:57 +0000en-UShourly1http://wordpress.org/?v=4.2.3Announcing Apache Falcon 0.6.0http://hortonworks.com/blog/announcing-apache-falcon-0-6-0/
http://hortonworks.com/blog/announcing-apache-falcon-0-6-0/#commentsWed, 07 Jan 2015 17:23:45 +0000http://hortonworks.com/?p=65025With YARN as its architectural center, Apache Hadoop continues to attract new engines to run within the data platform, as organizations want to efficiently store their data in a single repository and interact with it for batch, interactive and real-time streaming use cases. As more data flows into and through a Hadoop cluster to feed these engines, Apache Falcon is a crucial framework for simplifying data management and pipeline processing.

Falcon enables data architects to automate the movement and processing of datasets for ingest, pipeline, disaster recovery and data retention use cases.

We recently released Apache Falcon 0.6.0. With this release, the community addressed more than 220 JIRA issues. Among these many bug fixes, improvements and new features, four stand out as particularly important:

Authorization with ACLs for entities

Enhancements to lineage metadata

Cloud archival

Falcon recipes

This blog gives an overview of these new features and how they integrate with other Hadoop services.…

]]>With YARN as its architectural center, Apache Hadoop continues to attract new engines to run within the data platform, as organizations want to efficiently store their data in a single repository and interact with it for batch, interactive and real-time streaming use cases. As more data flows into and through a Hadoop cluster to feed these engines, Apache Falcon is a crucial framework for simplifying data management and pipeline processing.

Falcon enables data architects to automate the movement and processing of datasets for ingest, pipeline, disaster recovery and data retention use cases.

We recently released Apache Falcon 0.6.0. With this release, the community addressed more than 220 JIRA issues. Among these many bug fixes, improvements and new features, four stand out as particularly important:

Authorization with ACLs for entities

Enhancements to lineage metadata

Cloud archival

Falcon recipes

This blog gives an overview of these new features and how they integrate with other Hadoop services. We’ll also touch on additional innovation we plan for upcoming releases.

Authorization with ACL for entities

Now Apache Falcon supports an access control list (ACL) that provides authorization for Feed, Cluster and Process entities. This allows Falcon to leverage existing security work to maintain consist controls throughout the HDP stack. This security enhancement lays the foundation for broader enterprise adoption and a variety of new use cases that will flow from that.

Enhancements to lineage metadata

This Falcon release provides better access to lineage metadata. This facilitates the quick and efficient search and retrieval of lineage information, which makes it easier to comply with data retention and discoverability regulations.

Cloud archival

Allows leveraging of Cloud infrastructure such as Amazon S3 and Microsoft Azure. We’re excited about this change because it extends the archive use case for continuity and ad hoc analysis.

Falcon recipes

A Falcon recipe is a static process template with parameterized workflow to realize a specific use case. Recipes are defined in the user space. All recipes can be modeled as a Process within Falcon, which then periodically executes the user workflow. As the process and its associated workflow are parameterized, the user will provide a properties file with name/value pairs that are substituted by Falcon before scheduling. Falcon translates these recipes as a process entity by replacing the parameters in the workflow definition. Recipes enable non-programmers to capture and re-use very complex business logic.

Plans for the Future of Falcon

We want to thank the Apache Falcon community for all of its hard work delivering this release. Looking forward to future releases, the Apache Falcon team plans:

Usability improvements, with a new UI, REST API additions and enhanced documentation

Further strengthening of HA capabilities, allowing Falcon to meet ever more stringent SLAs

Download Apache Falcon and Learn More

]]>http://hortonworks.com/blog/announcing-apache-falcon-0-6-0/feed/1Project Falcon: Tackling Hadoop Data Lifecycle Management via Community Driven Open Sourcehttp://hortonworks.com/blog/project-falcon-tackling-hadoop-data-lifecycle-management-via-community-driven-open-source/
http://hortonworks.com/blog/project-falcon-tackling-hadoop-data-lifecycle-management-via-community-driven-open-source/#commentsTue, 02 Apr 2013 15:53:53 +0000http://hortonworks.com/?p=19925Today we are excited to see another example of the power of community at work as we highlight the newly approved Apache Software Foundation incubator project named Falcon. This incubation project was initiated by the team at InMobi together with engineers from Hortonworks. Falcon is useful to anyone building apps on Hadoop as it simplifies data management through the introduction of a data lifecycle management framework.

All About Falcon and Data Lifecycle Management

Falcon is a data lifecycle management framework for Apache Hadoop that enables users to configure, manage and orchestrate data motion, disaster recovery, and data retention workflows in support of business continuity and data governance use cases.

Falcon’s goal is to simplify data management on Hadoop and achieves this by providing important data lifecycle management services that any Hadoop application can rely on. Instead of hard-coding complex data lifecycle capabilities, apps can now rely on a proven, well-tested and extremely scalable data management system built specifically for the unique capabilities that Hadoop offers.…

]]>Today we are excited to see another example of the power of community at work as we highlight the newly approved Apache Software Foundation incubator project named Falcon. This incubation project was initiated by the team at InMobi together with engineers from Hortonworks. Falcon is useful to anyone building apps on Hadoop as it simplifies data management through the introduction of a data lifecycle management framework.

All About Falcon and Data Lifecycle Management

Falcon is a data lifecycle management framework for Apache Hadoop that enables users to configure, manage and orchestrate data motion, disaster recovery, and data retention workflows in support of business continuity and data governance use cases.

Falcon’s goal is to simplify data management on Hadoop and achieves this by providing important data lifecycle management services that any Hadoop application can rely on. Instead of hard-coding complex data lifecycle capabilities, apps can now rely on a proven, well-tested and extremely scalable data management system built specifically for the unique capabilities that Hadoop offers.

For example consider the challenge of preparing raw data such that it can be consumed by business intelligence applications. In addition to this routine use case suppose you also want to replicate data to a failover cluster that is smaller than the primary cluster. In this case you probably only want to replicate the staged data as well as the data presented to BI applications, relying on the primary cluster to be the sole source of intermediate data.

We see our customers building solutions like this but they are very tricky to develop, difficult to test and error-prone. With Falcon however, the data processing pipeline and all replication points are expressed in a single configuration file and use well-tested Falcon services to ensure data is processed and replicated reliably. Using Falcon you speed app development with greater overall quality.

The Power of Community

Unwavering belief in the power of community-driven open source software is the cornerstone of Hortonworks’ approach. As we discussed in “The Road Ahead for Hortonworks and Hadoop”, one of the key areas of investment for enterprise Hadoop this year are features that address the business continuity and data governance needs of the mainstream enterprise.

New but proven

The team at InMobi (who have been significant contributors to Apache Hadoop since their inception) couldn’t agree more, which is why they built Falcon for their own usage almost 18 months ago. And now, having proved it successfully at scale in their production environment for more than 12 months and managing hundreds of data feeds into and out of Hadoop, this technology has now been contributed to the Apache Software Foundation so that the entire community may benefit.

We are thrilled to welcome Falcon as yet another example of the relentless march of innovation that is community driven, open source Apache Hadoop, and hope you join us on the journey.