Optimization of access patterns using collective I/O imposes the overhead of exchanging data between processes. In a multi-core-based cluster the costs of inter-node and intra-node data communication are vastly different, and heterogeneity in the effciency of data-exchange poses both a challenge and opportunity for implementing effcient collective I/O. The opportunity isto effectively exploit fast intra-node communication. We propose to improve communication locality for greater data exchange eciency. However, such an effort is at odds with improving access locality for I/O effciency, which can also be critical to collective-I/O performance. To address this issue we propose a framework, Orthrus, that can accommodate multiple collective-I/O implementations, each optimized for some performance aspects, and dynamically select the best performing one accordingly to current workload and system patterns. We have implemented Orthrus in the ROMIO library. Our experimental results with representative MPI-IO benchmarks on both a small dedicated cluster and a large production HPC system show that Orthrus can significantly improve collective I/O's performance under various workloads and system scenarios.