Hey Tim,
You have an interesting problem. Have you tried creating a UDTF for your
case, so that you can possibly emit more than one record for each row of
your input?
http://wiki.apache.org/hadoop/Hive/DeveloperGuide/UDTF
Thanks and Regards,
Sonal
Sonal Goyal | Founder and CEO | Nube Technologies LLP
http://www.nubetech.co | http://in.linkedin.com/in/sonalgoyal
On Mon, Nov 8, 2010 at 2:31 AM, Tim Robertson <timrobertson100@gmail.com>wrote:
> Hi all,
>
> I am porting custom MR code to Hive and have written working UDFs
> where I need them. Is there a work around to having to do this in
> Hive:
>
> select * from
> (
> select name_id, toTileX(longitude,0) as x, toTileY(latitude,0) as
> y, 0 as zoom, funct2(lontgitude, 0) as f2_x, funct2(latitude,0) as
> f2_y, count (1) as count
> from table
> group by name_id, x, y, f2_x, f2_y
>
> UNION ALL
>
> select name_id, toTileX(longitude,1) as x, toTileY(latitude,1) as
> y, 1 as zoom, funct2(lontgitude, 1) as f2_x, funct2(latitude,1) as
> f2_y, count (1) as count
> from table
> group by name_id, x, y, f2_x, f2_y
>
> --- etc etc increasing in zoom
> )
>
> The issue being that this does many passes over the table, whereas
> previously in my Map() I would just emit many times from the same
> input record and then let it all group in the shuffle and sort.
> I actually emit 184 times for an input record (23 zoom levels of
> google maps, and 8 ways to derive the name_id) for a single record
> which means 184 union statements - Is it possible in hive to force it
> to emit many times from the source record in the stage-1 map?
>
> (ahem) Does anyone know if Pig can do this if not in Hive?
>
> I hope I have explained this well enough to make sense.
>
> Thanks in advance,
> Tim
>