45.
GROUP Groups data in one or more relations Groups tuples that have the same group key Similar to SQL group by operatorouterbag = LOAD data/data-bag.txt USING PigStorage(,) AS (f1:int, f2:int, f3:int);DUMP outerbag;innerbag = GROUP outerbag BY f1;DUMP innerbag;https://github.com/sudar/pig-samples/group-by.pig

48.
ORDER BySort a relation based on one or more fields. Similar to SQL order bydata = LOAD data/nested-sample.txt USING PigStorage(,) AS (f1:int, f2:int, f3:int);DUMP data;ordera = ORDER data BY f1 ASC;DUMP ordera;orderd = ORDER data BY f1 DESC;DUMP orderd;https://github.com/sudar/pig-samples/order-by.pig

55.
Why UDF? Do operations on more than one field Do more than grouping and filtering Programmer is comfortable Want to reuse existing logicTraditionally UDF can be written only in Java. Now otherlanguages like Python are also supported

57.
Eval Functions Can be used in FOREACH statement Most common type of UDF Can return simple types or Tuplesb = FOREACH a generate udf.Function($0);b = FOREACH a generate udf.Function($0, $1);

58.
Eval FunctionsExtend EvalFunc<T> interfaceThe generic <T> should contain the return typeInput comes as a TupleShould check for empty and nulls in inputExtend exec() function and it should return the valueExtend getArgToFuncMapping() to let UDF know aboutArgument mappingExtend outputSchema() to let UDF know about outputschema

59.
Using Java UDF in Pig ScriptsCreate a jar file which contains your UDF classesRegister the jar at the top of Pig scriptRegister other jars if neededDefine the UDF functionUse your UDF function

60.
Let’s see an example which returns a string https://github.com/sudar/pig-samples/strip-quote.pig

61.
Let’s see an example which returns a Tuple https://github.com/sudar/pig-samples/get-twitter-names.pig

62.
Filter Functions Can be used in the Filter statements Returns a boolean valueEg:vim_tweets = FILTER data By FromVim(StripQuote($6));

63.
Filter FunctionsExtends FilterFun, which is a EvalFunc<Boolean>Should return a booleanInput it is same as EvalFunc<T>Should check for empty and nulls in inputExtend getArgToFuncMapping() to let UDF knowabout Argument mapping

64.
Let’s see an example which returns a Boolean https://github.com/sudar/pig-samples/from-vim.pig

65.
Error Handling in UDFIf the error affects only particular row then returnnull.If the error affects other rows, but can recover, thenthrow an IOExceptionIf the error affects other rows, and can’trecover, then also throw an IOException. Pig andHadoop will quit, if there are many IOExceptions.

72.
Debugging Pig ScriptsDUMP is your friend, but use with LIMITDESCRIBE – will print the schema namesILLUSTRATE – Will show the structure of the schemaIn UDF’s, we can use warn() function. It supportsupto 15 different debug levelsUse Penny -https://cwiki.apache.org/PIG/pennytoollibrary.html

73.
Optimizing Pig ScriptsProject early and oftenFilter early and oftenDrop nulls before a joinPrefer DISTINCT over GROUP BYUse the right data structure