The issue here is that while we sort runs within a single reducer by split number, partitions are calculated using map tasks, not by splits. Instead, we should use split information to decide which map output goes to which reduce task.