hadoop-pig-dev mailing list archives

[jira] Commented: (PIG-1195) InternalSortedBag should take care of sort order

Date

Tue, 19 Jan 2010 18:24:54 GMT

[ https://issues.apache.org/jira/browse/PIG-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802397#action_12802397
]
Alan Gates commented on PIG-1195:
---------------------------------
The sorting algorithm in DefaultComparator does not match the sorting algorithm in DefaultTuple.compare.
The algorithm used here first compares the values of each column, and only considers the overall
size of the tuples once one tuple has run out of fields. The algorithm used in DefaultTuple.compare
first compares tuple size, then individual column values. So in this algorithm (5, 3) >
(4, 3, 1), but in DefaultTuple's algorithm (5, 3) < (4, 3, 1). We should use the same
algorithm in both places.
> InternalSortedBag should take care of sort order
> ------------------------------------------------
>
> Key: PIG-1195
> URL: https://issues.apache.org/jira/browse/PIG-1195
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.6.0
> Reporter: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-1195-1.patch, PIG-1195-2.patch
>
>
> InternalSortedBag always use ascending order. We shall obey the sort order as specified
in the script.
> For example, the following script does not do the right thing if we turn off secondary
sort (which means, we will rely on InternalSortedBag to sort):
> {code}
> A = load 'input' as (a0:int);
> B = group A ALL;
> C = foreach B {
> D = order A by a0 desc;
> generate D;
> };
> dump C;
> {code}
> If we run it using the command line "java -Xmx512m -Dpig.exec.nosecondarykey=true -jar
pig.jar 1.pig".
> The sort order for D is ascending.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.