repartitionAndSortWithinPartitions

重分区，并保证分区内元素的顺序。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

// first we will do range partitioning which is not sortedvalrandRDD=sc.parallelize(List((2,"cat"),(6,"mouse"),(7,"cup"),(3,"book"),(4,"tv"),(1,"screen"),(5,"heater")),3)valrPartitioner=neworg.apache.spark.RangePartitioner(3,randRDD)valpartitioned=randRDD.partitionBy(rPartitioner)defmyfunc(index:Int,iter:Iterator[(Int, String)]):Iterator[String]={iter.toList.map(x=>"[partID:"+index+", val: "+x+"]").iterator}partitioned.mapPartitionsWithIndex(myfunc).collectres0:Array[String]=Array([partID:0, val:(2,cat)],[partID:0, val:(3,book)],[partID:0, val:(1,screen)],[partID:1, val:(4,tv)],[partID:1, val:(5,heater)],[partID:2, val:(6,mouse)],[partID:2, val:(7,cup)])// now lets repartition but this time have it sortedvalpartitioned=randRDD.repartitionAndSortWithinPartitions(rPartitioner)defmyfunc(index:Int,iter:Iterator[(Int, String)]):Iterator[String]={iter.toList.map(x=>"[partID:"+index+", val: "+x+"]").iterator}partitioned.mapPartitionsWithIndex(myfunc).collectres1:Array[String]=Array([partID:0, val:(1,screen)],[partID:0, val:(2,cat)],[partID:0, val:(3,book)],[partID:1, val:(4,tv)],[partID:1, val:(5,heater)],[partID:2, val:(6,mouse)],[partID:2, val:(7,cup)])