On 21/12/2008, at 10:48 PM, Paul Davis wrote:
> On Sun, Dec 21, 2008 at 1:20 AM, Antony Blakey > wrote:
>>
>> On 21/12/2008, at 3:25 PM, Paul Davis wrote:
>>
>>> Hmm. I don't think so. Technically we could change the
>>> implementation
>>> to feed reduce functions a single map-row at a time which would be
>>> O(N). Might end up violating some (incorrect) expectations of reduce
>>> input, but I'm pretty certain it shows that it's O(N).
>>
>> But you have to combine it with previously reduced input, which means
>> fetching a number of previous results, with a cost that isn't clearly
>> related to N. Isn't the number of reductions needed to be combined
>> is going
>> to be related to key distribution?
>>
>
> Good point. I just woke up so the brain is working kinda slow, but are
> we describing best case and worst case scenarios too? If each key had
> exactly one row and it was group=true, then it'd be O(N). If it was
> group=false and every key was identical it'd be log(N). And then
> normal use cases are somewhere in the middle?
Where the original N was new rows, versus this N being total rows?
e.g. average case is O(new rows) + O(log old rows)
My complexity maths is weak.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
In anything at all, perfection is finally attained not when there is
no longer anything to add, but when there is no longer anything to
take away.
-- Antoine de Saint-Exupery