Mixing python API and MPI (or MPI4PY)

I wrote a couple hand-writing C extensions for python, but one of the them is really slow so I would like to make it faster using MPI (newbie) but I have no idea how to implement the code. I google for options but I couldn't find anything about mix python API and MPI. Any GOOD reference? ideas? Thanks!

In my case, I know there's a much faster algorithm to find matrix determinant than the recursive cofactors approach. I'm just trying to learn about openMPI.

Have you used the best algorithms? Have you factored common subexpressions? Do you compute unnecessary values? Are you solving the right problem? Did you profile the code to determine the critical parts? Are these eligible for parallel execution?