The changes above didn't implement any multiprocessing yet. For what I have read and experienced myself, most importantly is to solve as many inefficiencies in your non-parallel code as possible before trying any multiprocessing (unless you have plenty of experience on it as programmer).

Otherwise you will bring your inefficiencies with you to the parallel code that in the best case scenario would mean extra work, considering parallelism is not easy.

However, parallelism is for sure an extraordinary way to improve your code even more by assigning more resources to your program.

@GoldbergData Great!!! I will let you know for sure. What I am going to do is to upload the data to Kaggle and we could work on specific analyses that will be related to the projects?

The data in Kaggle won't be the whole data to be used in the projects I have in mind, so your analyses will be more dedicated to tune up the analytical tools we are going to use for the project as a whole. As soon as I get the complete data I will share that with you for the full analysis.

PEOPLE

Why multiprocessing should be taken with care? Because forking and spawning processes using multiprocessing actually makes copies of an existing parent. If the parent is big, the children will be big too.

Additionally each new process will open Python again, and each Python process consumes memory (in my computer, iPython without libraries, just IDE: 17KB). If you open too many processes even with no much data but many libraries, you could add too much overload to memory.

@RobertCC18 I hope you found what you were looking for in the::point_left: HelpDataViz channel? Asking at the forum or even using Google or the forum's internal search engine is very very advisable too. I personally can't help because I haven't done any of the exercises.