OK, next guess: Your partitioner isn’t a real partitioner. In particular, if join gets partitioned RDDs, it’s going to choose one of the partitioners for the output, repartition the second RDD to the first RDD’s partitioner, and then do the join, shuffling only one side.