This paper describes our new hybrid parallelization of the Finite Element Tearing and
Interconnecting (FETI) method for the multi-socket and multi-core computer clusters.
This is an essential step in our development of the Hybrid FETI solver were a
small number of neighboring subdomains is aggregated into clusters and each cluster
is processed by a single compute node.

In our previous work the FETI solver was implemented using MPI parallelization
into our ESPRESO solver. The proposed hybrid implementation provides better utilization
of resources of modern HPC machines using advanced shared memory runtime
systems such as Cilk++ runtime. Cilk++ is an alternative to OpenMP which is
used by the ESPRESO for shared memory parallelization.

We have compared the performance of the hybrid parallelization to MPI-only parallelization.
The results show that we have reduced both solver runtime and memory
utilization. This allows the solver to use a larger number of smaller sub-domains and
in order to solve larger problems using a limited number of compute nodes. This
feature is essential for users with smaller computer clusters.