Abstract: We have developed a highly efficient, high fidelity approach for parallel volume rendering that is called permutation warping. Permutation warping may use any one pass filter kernel, an example of which is trilinear reconstruction, an advantage over the shear warp approach. This work discusses experiments in improving permutation warping using data dependent optimizations to make it more competitive in speed with the shear warp algorithm. We use a linear octree on each processor for collapsing homogeneous regions and eliminating empty space. Static load balancing is also used to redistribute nodes from a processor's octree to achieve higher efficiencies. In studies on a 16384 processor MasPar MP-2, we have measured improvements of 3 to 5 times over our previous results. Run times are 73 milliseconds, 29 Mvoxels/second, or 14 frames/second for 1283 volumes, the fastest MasPar volume rendering numbers in the literature. Run times are 427 milliseconds, 39 Mvoxels/second, or 2 frames/second for 2563 volumes. The performance numbers show that coherency adaptations are effective for permutation warping. Because permutation warping has good scalability characteristics, it proves to be a superior approach for massively parallel computers when image fidelity is a required feature. We have provided further evidence for the utility of permutation warping as a scalable, high fidelity, and high performance approach to parallel volume visualization.