When RDD.reduceByKey is called on a RDD created from Dataset, the RDD.partitions method will be triggered, which requires a transaction in order to get splits from the underlying Dataset. If there is no explicit transaction, a new implicit transaction will be started and left open. The transaction will be committed when the Spark job that involves the reduceByKey RDD completed with an action. However, when the RDD.saveAsDataset is called, it detects that there is already a transaction opened, so it will reuse it. In here, the correct logic should be changing the current transaction to be committed when the saveAsDataset method completed, rather than the Spark job completed, because we need to execute the onSuccess method on the PartitionedFileSet transactionally.