Partitions
The input paths are scanned by the loader for [partition name]=[value]
patterns in the subdirectories.
If detected these partitions are appended to the table schema.
For example if you have the directory structure:

/user/hive/warehouse/mytable
/year=2010/month=02/day=01

The mytable schema is (id int,name string).
The final schema returned in pig will be (id:int, name:chararray,
year:chararray, month:chararray, day:chararray).

To load a hive table: uid bigint, ts long, arr ARRAY, m
MAP only reading column uid and ts for dates 2009-10-01 to
2009-10-02.Old UsageNote: This behaviour is now supported in pig by LoadPushDown adding
the columns needed to be loaded like below is ignored and pig will
automatically send the columns used by the script to the loader.

Table schema definition
The schema definition must be column name followed by a space then a comma
then no space and the next column name and so on.
This so column1 string, column2 string will not work, it must be column1
string,column2 string

Partitioning
Partitions must be in the format [partition name]=[partition value]
Only strings are supported in the partitioning.
Partitions must follow the same naming convention for all sub directories in
a table
For example:
The following is not valid:

LOG

tupleFactory

HiveColumnarLoader

Table schema should be a space and comma separated string describing the
Hive schema.
For example uid BIGINT, pid long, means 1 column of uid type BIGINT and
one column of pid type LONG.
The types are not case sensitive.

Parameters:

table_schema - This property cannot be null

HiveColumnarLoader

This constructor is for backward compatibility.
Table schema should be a space and comma separated string describing the
Hive schema.
For example uid BIGINT, pid long, means 1 column of uid type BIGINT and
one column of pid type LONG.
The types are not case sensitive.

Parameters:

table_schema - This property cannot be null

dateRange - String

columns - String not used any more

HiveColumnarLoader

This constructor is for backward compatibility.
Table schema should be a space and comma separated string describing the
Hive schema.
For example uid BIGINT, pid long, means 1 column of uid type BIGINT and
one column of pid type LONG.
The types are not case sensitive.

Parameters:

table_schema - This property cannot be null

dateRange - String

Method Detail

getInputFormat

This will be called during planning on the front end. This is the
instance of InputFormat (rather than the class name) because the
load function may need to instantiate the InputFormat in order
to control how it is constructed.

prepareToRead

Initializes LoadFunc for reading data. This will be called during execution
before any calls to getNext. The RecordReader needs to be passed here because
it has been instantiated for a particular InputSplit.

setLocation

Communicate to the loader the location of the object(s) being loaded.
The location string passed to the LoadFunc here is the return value of
LoadFunc.relativeToAbsolutePath(String, Path). Implementations
should use this method to communicate the location (and any other information)
to its underlying InputFormat through the Job object.
This method will be called in the backend multiple times. Implementations
should bear in mind that this method is called multiple times and should
ensure there are no inconsistent side effects due to the multiple calls.

job - The Job object - this should be used only to obtain
cluster properties through JobContext.getConfiguration() and not to set/query
any runtime job information.

Returns:

schema for the data to be loaded. This schema should represent
all tuples of the returned data. If the schema is unknown or it is
not possible to return a schema that represents all returned data,
then null should be returned. The schema should not be affected by pushProjection, ie.
getSchema should always return the original schema even after pushProjection

setPartitionFilter

Set the filter for partitioning. It is assumed that this filter
will only contain references to fields given as partition keys in
getPartitionKeys. So if the implementation returns null in
LoadMetadata.getPartitionKeys(String, Job), then this method is not
called by Pig runtime. This method is also not called by the Pig runtime
if there are no partition filter conditions.

getFeatures

Determine the operators that can be pushed to the loader.
Note that by indicating a loader can accept a certain operator
(such as selection) the loader is not promising that it can handle
all selections. When it is passed the actual operators to
push down it will still have a chance to reject them.

pushProjection

Indicate to the loader fields that will be needed. This can be useful for
loaders that access data that is stored in a columnar format where indicating
columns to be accessed a head of time will save scans. This method will
not be invoked by the Pig runtime if all fields are required. So implementations
should assume that if this method is not invoked, then all fields from
the input are required. If the loader function cannot make use of this
information, it is free to ignore it by returning an appropriate Response