<p>Mesh partitioning for homogeneous systems has been studied extensively; however, mesh partitioning for distributed systems is a relatively new area of research. To ensure efficient execution on a distributed system, the heterogeneities in the processor and network performance must be taken into consideration in the partitioning process; equal size subdomains and small cut set size, which results from conventional mesh partitioning, are no longer the primary goals. In this paper, we address various issues related to mesh partitioning for distributed systems. These issues include the metric used to compare different partitions, efficiency of the application executing on a distributed system, and the advantage of exploiting heterogeneity in network performance. We present a tool called PART, for automatic mesh partitioning for distributed systems. The novel feature of PART is that it considers heterogeneities in the application and the distributed system. Simulated annealing is used in PART to perform the backtracking search for desired partitions. While it is well-known that simulated annealing is computationally intensive, we describe the parallel version of simulated annealing that is used with PART. The results of the parallelization exhibit superlinear speedup in most cases and nearly perfect speedup for the remaining cases. Experimental results are also presented for partitioning regular and irregular finite element meshes for an explicit, nonlinear finite element application, called WHAMS2D, executing on a distributed system consisting of two IBM SPs with different processors. The results from the regular problems indicate a 33 to 46 percent increase in efficiency when processor performance is considered as compared to the conventional even partitioning. The results indicate a 5 to 15 percent increase in efficiency when network performance is considered as compared to considering only processor performance; this is significant given that the optimal improvement is 15 percent for this application. The results from the irregular problem indicate up to 36 percent increase in efficiency when processor and network performance are considered as compared to even partitioning.</p>