Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training,
learning paths, books, tutorials, and more.

Chapter 13. Use Cases and Programming Examples

In this chapter we will take a look at several comprehensive Pig
examples and real-world Pig use cases.

Sparse Tuples

In “Schema Tuple Optimization” we introduced a more compact tuple implementation called the schema tuple.
However, if your input data is sparse, a schema tuple is not the most
efficient way to represent your data. You only need to store the position
and value of nonempty fields of the tuple—which you
can do with a sparse tuple. Since the vast majority
of fields in the tuple will be empty, you can save a lot of space with
this data structure. Sparse tuples are not natively supported by Pig.
However, Pig allows users to define custom tuple implementations, so you
can implement them by yourself. In this section, we will show you how to
implement the sparse tuple and use it in Pig.

First, we will need to write a SparseTuple
class that implements the Tuple interface. However,
implementing all methods of the Tuple interface is
tedious. To make it easier we derive SparseTuple
from AbstractTuple, which already implements most
common methods. Inside SparseTuple, we create a
TreeMap that stores the index and value of each
nonempty field. We also keep track of the size of the tuple. With both
fields, we have the complete state of the sparse tuple. Here is the data
structure along with the getter and setter methods of
SparseTuple: