I've been working in Python 3 on an an embarrassingly parallel task of parsing and importing data into Postgres. Once I've done single-threaded implementation I looked around for parallelizing the program and after a few tries I managed to get everything running in parallel.

This is a quick note on how to get a thread pool working in Python 3.6

In the code block above, data_source function is an iterator that generates one value at a time. pool.submit calls process_data with value as a parameter of process_data in parallel until thread pool limited in size by MAX_THREADS is exhausted, then the program waits for a thread to become available and fetches the next value, until data_sources generator is exhausted.

In this example pool.submit passes single parameter value to process_data, but pool.submit can pass any number of parameters required for the callee function i.e.

I really like this threading implementation because it's really simple, there's no need to write any code to manage threading pool, single- and multi- threaded implementation can live side-by-site if there's ever a need to debug any logic in process_data function.

Still this threading library implemented in python, whic is not true threading and multi-threaded code is a subject of Global Interpreter Lock so the CPU-bound tasks will not benefit from this.