Below are a few more Pig Interview Questions and Answers

1. What is a tuple?

A tuple is an ordered set of fields and A field is a piece of data.

2. What is a relation in Pig?

A Pig relation is a bag of tuples. A Pig relation is similar to a table in a relational database,
where the tuples in the bag correspond to the rows in a table. Unlike a relational table,
however, Pig relations don’t require that every tuple contain the same number of fields or that the fields in the same position (column) have the same type.

3. What does mean by unordered collection in a bag or in a relation?

Relations are unordered means there is no guarantee that tuples are processed in any particular order. Furthermore, processing may be paralleled in which case tuples are not processed according to any total ordering.

4. How the fields are referenced in a relation?

Fields in a relation can be referenced in two ways, by positional notation or by name (alias)

Positional notation is generated by the system. Positional notation is indicated with the dollar sign ($) and begins with zero (0); for example, $0, $1, $2.

Names are assigned by user using schema (or, in the case of the GROUP operator and some functions, by the system). We can use any name that is not a Pig keyword.

5. What are the simple data types supported by Pig?

Simple Types

Description

Example

int

Signed 32-bit

integer 10

long

Signed 64-bit

integer Data:10L or 10l

float

32-bit floating point

Data: 10.5F or 10.5f or 10.5e2f

double

64-bit floating point

Data: 10.5 or 10.5e2 or 10.5E2

chararray

Character array

hello world

bytearray

Byte array

boolean

boolean

true/false (case insensitive)

datetime

datetime

1970-01-01T00:00:00.000+00:00

biginteger

Java BigInteger

200000000000

bigdecimal

Java BigDecimal

33.4567833213

6. What are the complex data types supported in Pig Latin?

Data Types

Description

Example

tuple

An ordered set of fields.

(19,2)

bag

A collection of tuples.

{(19,2), (18,1)}

map

A set of key value pairs.

[open#apache]

7. What are the features of bag?

A bag can have duplicate tuples.

A bag can have tuples with differing numbers of fields. However, if Pig tries to access a field that does not exist, a null value is substituted.

A bag can have tuples with fields that have different data types. However, for Pig to
effectively process bags, the schemas of the tuples within those bags should be the same.

8. What is an outer bag?

An outer bag is nothing but a relation.

9. What is an inner bag?

An inner bag is a relation inside any other bag.

Example: (4,{(4,2,1),(4,3,3)})

In the above example, the complete relation is an outer bag and {(4,2,1),(4,3,3)} is an inner bag.

10. What is a Map?

A map is a set of key/value pairs. Key values within a relation must be unique.

11. What does FOREACH do?

FOREACH is used to apply transformations to the data and to generate new data items. The name itself is indicating that for each element of a data bag, the respective action will be performed.

Syntax:FOREACH bagname GENERATE expr1, expr2, …..

The meaning of this statement is that the expressions mentioned after GENERATE will be applied to the current record of the data bag.

14. What does GROUP operator will do in Pig?

15. What is difference between GROUP and COGROUP?

The GROUP and COGROUP operators are identical. Both operators work with one or more relations. For readability GROUP is used in statements involving one relation and COGROUP is used in statements involving two or more relations. We can COGROUP up to
but no more than 127 relations at a time.