2009/10/18 Brian Granger <ellisonbg.net@gmail.com>
> Looks like you have been making progress...some comments:
>> * Something quite odd is going on. While it would be nice if you could get
> 2.4-2.7 speedup on a dual core
> system, I don't think that result is real. I am not sure why you are
> seeing this, but it is *extremely* rare
> to see a speedup greater than the number or cores. It is possible, but I
> don't think you problem has
> any of the characteristics that would make it so.
> * From your description of the problem, ipython should be giving you nearly
> 2x speedup, but it is quite
> lower.
>> The combination of these things makes me think there is an aspect of all of
> this we are not understanding yet.
> I am suspecting that the method you are using to time your code is not
> accurate. I have seen this type of
> thing before. Can you time it using a more accurate approach? Some thing
> like:
>> from timeit import default_timer as clock
>> t1 = clock()
> ....
> t2 = clock()
>> It is possible that IPython is slower than multiprocessing in this case,
> but something else is going on here.
>> Cheers,
>>Here are new benchmark results (in seconds) using your suggested timing
approach:
0-) Duration using the linear processing: 1048.07685399
1-) Duration using TaskClient and 2 Engines: 701.550107956
2-) Duration using MultiEngineClient and 2 Engines: 663.629260063
3-) I can't get timings using this method when I use multiprocessing module.
I will send my 4 scripts to your email for further investigations. So far,
the results don't seem much different than what were they in original.
> Brian
>>> On Sun, Oct 18, 2009 at 2:01 PM, Gökhan Sever <gokhansever@gmail.com>wrote:
>>>>>>> On Sun, Oct 18, 2009 at 2:34 PM, Gökhan Sever <gokhansever@gmail.com>wrote:
>>>>>>>> Moreeeeee speed-up :)
>>>>>> Next step is to use multiprocessing module.
>>>>>>> I did two tests since I was not sure which timing to believe:
>>>> real 6m37.591s
>> user 10m16.450s
>> sys 0m4.808s
>>>> real 7m22.209s
>> user 11m21.296s
>> sys 0m5.540s
>>>> which in result I figured out real is what I want to see. So the
>> improvement with respect to original linear 18m 5s run is 2.4 to 2.7X
>> speed-up in a Dual Core 2.5 Ghz laptop using Python's multiprocessing
>> module, which is great only adding a few line of code and slightly modifying
>> my original process_all wrapper script.
>>>> Here is the code:
>>>>>> #!/usr/bin/env python
>>>> """
>> Execute postprocessing_saudi script in parallel using multiprocessing
>> module.
>> """
>>>> from multiprocessing import Pool
>> from subprocess import call
>> import os
>>>>>> def find_sea_files():
>>>> file_list, path_list = [], []
>> init = os.getcwd()
>>>> for root, dirs, files in os.walk('.'):
>> dirs.sort()
>> for file in files:
>> if file.endswith('.sea'):
>> file_list.append(file)
>> os.chdir(root)
>> path_list.append(os.getcwd())
>> os.chdir(init)
>>>> return file_list, path_list
>>>>>> def process_all(pf):
>> os.chdir(pf[0])
>> call(['postprocessing_saudi', pf[1]])
>>>>>> if __name__ == '__main__':
>> pool = Pool(processes=2) # start 2 worker processes
>> files, paths = find_sea_files()
>> pathfile = [[paths[i],files[i]] for i in range(len(files))]
>> pool.map(process_all, pathfile)
>>>>>> The main difference is to change map call since Python's original map
>> supports only one iterable argument. This approach also shows execution
>> results on the terminal screen unlike IPython's. I am assuming like
>> IPython's, multiprocessing module should be able to run on external nodes.
>> Which means once I can set a few fast external machines I can perform a few
>> more tests.
>>>> --
>> Gökhan
>>>>
--
Gökhan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-user/attachments/20091018/a997d36c/attachment-0001.html