set mapred.compress.map.output true;
set mapred.map.output.compression.codec org.apache.hadoop.io.compress.SnappyCodec;

This should get you going with using Snappy for Map output compression with Pig. You can read and write Snappy compressed files as well, though I would not recommend doing that as its not very efficient space-wise compared to other compression algorithms. There is work being done to be able to use Snappy for creating intermediate/temporary files between multiple MR jobs. You can watch the work item here https://issues.apache.org/jira/browse/PIG-2319

Snappy is not CPU intensive – which means MR tasks have more CPU for user operations

What you SHOULD use Snappy for

Map output: Snappy works great if you have large amounts of data flowing from Mappers to the Reducers (you might not see a significant difference if data volume between Map and Reduce is low)

Temporary Intermediate files (not available currently as of Pig 0.9.2, applicable only to native Map Reduce) : If you have a series of MR jobs chained together, Snappy compression is a good way to store the intermediate files. Please do make sure these intermediate files are cleaned up soon enough so we don’t have disk space issues on the cluster.

What you should NOT use Snappy for

Permanent Storage: Snappy compression is not efficient space-wise and it is expensive to store data on HDFS (3-way replication)

Plain text files: Like Gzip, Snappy is not splittable. Do not store plain text files in Snappy compressed form, instead use a container like SequenceFile.

Map output forwarded to reducers is not stored permanently on disk. It is deleted after the job completes. Reducers simply accept input from mappers, decompress this data and process it. Not being splittable in this case does not matter much as the reducers have to process all data that it needs to handle. That is, partitioning is independent of splits.

I think I answered your question on our internal site, re-pasting it here🙂

Your key/value types should not change due to Snappy. SequenceFile has a header that contains the information as to the types of Key/Value and also the codec being used. In short, your map function definition would not change.