ABSTRACT:As the parallelism of computational science applications grows, with the target of running on very large multicore systems, it will become increasingly difficult for linear solvers to scale. To address this problem, we use hybrid MPI/threaded algorithms to solve these systems, exploiting the underlying shared memory on the node and increasing the parallel efficiency of the algorithm. For this approach to yield scalable linear solvers, we need efficient threaded triangular solvers (important for preconditioning) to run on the multicore nodes. We briefly describe such a threaded triangular solver and present numerical results. For the integration of these hybrid MPI/threaded linear solvers into existing large-scale scientific simulations to be painless, we advocate using MPI methods for shared memory allocation on the multicore node. Here, we give an example of how MPI shared memory allocation can be used in PCG to reduce the number of iterations without significantly altering the basic algorithm.