Announcing Apache Pig 0.12…The Community Breeds a More Powerful Pig

Today we are proud to announce the general availability of Apache Pig 0.12!

If you are a Pig user and you’ve been yearning to use additional languages, for more data validation tools, for more expressions, operators and data types, then read on. Version 0.12 includes all of those additions, and now Pig runs on Windows without Cygwin.

This was a great team effort over the past six months with over 30 engineers from Twitter, Yahoo, LinkedIn, Netflix, Microsoft, IBM, Salesforce, Mortardata, Cloudera and several others (including Hortonworks of course). Between Pig 0.11 and Pig 0.12, we resolved 305 Jira issues.

Improvements in Apache Pig 0.12

Assert operator

An assert operator can be used for data validation. For example, the following script will fail if any value is a negative integer:

a = load 'something' as (a0:int, a1:int);assert a by a0 > 0, 'a cant be negative for reasons';

Streaming UDF

Users can now write a UDF using a language without JVM implementations. In particular, we implemented C Python UDF in this version. Users are able to write Python UDF using C Python extensions which otherwise are not possible in Jython.

Rewrite of AvroStorage

We completely revamped the AvroStorage. It is now part of Pig built-in functions. It uses the latest version of Avro and is significantly faster, with many bug fixes.

IN operator

Previously, Pig had no support for IN operators. To mimic those, users had to concatenate several OR operators, as in this example: