As expected most operations in the application run significantly faster with the optimized version. However, one operation dominated by READing a large file is actually 2x SLOWER on the the optimized version than on the debug version.

Anybody got any ideas what could be going on here? One thing, I've always noticed the IVF's file IO is much slower than Lahey's LF95 (IVF is faster in everything else).

Al Greynolds www.ruda.com

Tue, 09 Mar 2010 01:48:20 GMT

Dr Ivan D. Rei#2 / 16

IVF 10: IO speed slower with full optimizations than with a debug build

Quote:

> As expected most operations in the application run significantly > faster with the optimized version. However, one operation dominated > by READing a large file is actually 2x SLOWER on the the optimized > version than on the debug version.

I think you'd need to show us the code before we could begin to make even educated guesses. There are _many_ ways to read a file!

IVF 10: IO speed slower with full optimizations than with a debug build

Quote:

> /traceback /QxW /CU /Qtrapuv /CB /Od

> As expected most operations in the application run significantly > faster with the optimized version. However, one operation dominated > by READing a large file is actually 2x SLOWER on the the optimized > version than on the debug version.

Really the best thing to do is to contact support and supply an example of your program. As a side comment, you may as well remove / Qtrapuv as it doesn't do anything useful.

Steve

Tue, 09 Mar 2010 04:37:57 GMT

Al Greynold#4 / 16

IVF 10: IO speed slower with full optimizations than with a debug build

Quote:

> Thought I'd ask here first (before going to Intel) to see if anybody > has seen this odd behavior with IVF 10.0. I create a fully optimized > verision of my application with:

> As expected most operations in the application run significantly > faster with the optimized version. However, one operation dominated > by READing a large file is actually 2x SLOWER on the the optimized > version than on the debug version.

> Anybody got any ideas what could be going on here? One thing, I've > always noticed the IVF's file IO is much slower than Lahey's LF95 (IVF > is faster in everything else).

Elapsed times in seconds on a 150MB file (3 million lines up 73 characters long)

Optimimzed Debug Non-standard 10.9 3.1 Standard 23.2 14.9

Tue, 09 Mar 2010 05:54:09 GMT

Steve Lione#5 / 16

IVF 10: IO speed slower with full optimizations than with a debug build

Quote:

> Elapsed times in seconds on a 150MB file (3 million lines up 73 > characters long)

> Optimimzed Debug > Non-standard 10.9 3.1 > Standard 23.2 14.9

Interesting. You could simply use the variable line instead of line(:l) It won't change the meaning. I'll play with this and see what I can find.

Steve

Tue, 09 Mar 2010 07:15:50 GMT

Al Greynold#6 / 16

IVF 10: IO speed slower with full optimizations than with a debug build

Quote:

> > Elapsed times in seconds on a 150MB file (3 million lines up 73 > > characters long)

> > Optimimzed Debug > > Non-standard 10.9 3.1 > > Standard 23.2 14.9

> Interesting. You could simply use the variable line instead of > line(:l) It won't change the meaning. I'll play with this and see > what I can find.

> Steve

Actually, replacing "line(:l)" with just "line" produces a significant slowdown (I suspect due to having to initialize line(l+1:1836) especially when l<<1836)

Al

Tue, 09 Mar 2010 08:40:24 GMT

Sjouke Burr#7 / 16

IVF 10: IO speed slower with full optimizations than with a debug build

Quote:

>>> Elapsed times in seconds on a 150MB file (3 million lines up 73 >>> characters long) >>> Optimimzed Debug >>> Non-standard 10.9 3.1 >>> Standard 23.2 14.9 >> Interesting. You could simply use the variable line instead of >> line(:l) It won't change the meaning. I'll play with this and see >> what I can find.

>> Steve

> Actually, replacing "line(:l)" with just "line" produces a significant > slowdown (I suspect due to having to initialize line(l+1:1836) > especially when l<<1836)

> Al

You forgot the smiley.....:)

Tue, 09 Mar 2010 11:32:52 GMT

Steve Lione#8 / 16

IVF 10: IO speed slower with full optimizations than with a debug build

Quote:

> Actually, replacing "line(:l)" with just "line" produces a significant > slowdown (I suspect due to having to initialize line(l+1:1836) > especially when l<<1836)

Yes, you are right regarding my suggested change. However, so far I have been unable to reproduce the behavior you are describing. Please do file a report with Intel Premier Support and we will be glad to look at it in more detail.

Steve

Wed, 10 Mar 2010 04:58:37 GMT

Al Greynold#9 / 16

IVF 10: IO speed slower with full optimizations than with a debug build

> Yes, you are right regarding my suggested change. However, so far I > have been unable to reproduce the behavior you are describing. Please > do file a report with Intel Premier Support and we will be glad to > look at it in more detail.

> Steve

To be sure I ran the cases on both the original Dell Pentium-4 Xeon workstation and an Apple Core 2 Duo laptop (both running XP-Pro). I got the same odd behavior.

I tracked it down to the /Qparallel option on the optimized version. If I remove it, the optimized version is now slightly faster than the debug, as expected. Do you link in a different set of runtimes with the /Qparallel option?

Al

Wed, 10 Mar 2010 06:56:36 GMT

Al Greynold#10 / 16

IVF 10: IO speed slower with full optimizations than with a debug build

> > Yes, you are right regarding my suggested change. However, so far I > > have been unable to reproduce the behavior you are describing. Please > > do file a report with Intel Premier Support and we will be glad to > > look at it in more detail.

> > Steve

> To be sure I ran the cases on both the original Dell Pentium-4 Xeon > workstation and an Apple Core 2 Duo laptop (both running XP-Pro). I > got the same odd behavior.

> I tracked it down to the /Qparallel option on the optimized version. > If I remove it, the optimized version is now slightly faster than the > debug, as expected. Do you link in a different set of runtimes with > the /Qparallel option?

> Al

The above also applies to the /Qopenmp option. If I remove both the / Qparallel and /Qopenmp options, one IO bound part of my application runs 3 times faster, but of course another conputationally intensive part that uses OpenMP directives now runs more than 2 times slower on my 2 processor box.

Your multi-threaded runtime must no be as fast as your single-threaded at this particular IO. Is there a way to build my application so that some parts are compiled for OpenMP while other parts aren't?

Al

Wed, 10 Mar 2010 07:18:12 GMT

Wade War#11 / 16

IVF 10: IO speed slower with full optimizations than with a debug build

>> > Yes, you are right regarding my suggested change. However, so far I >> > have been unable to reproduce the behavior you are describing. Please >> > do file a report with Intel Premier Support and we will be glad to >> > look at it in more detail.

>> > Steve

>> To be sure I ran the cases on both the original Dell Pentium-4 Xeon >> workstation and an Apple Core 2 Duo laptop (both running XP-Pro). I >> got the same odd behavior.

>> I tracked it down to the /Qparallel option on the optimized version. >> If I remove it, the optimized version is now slightly faster than the >> debug, as expected. Do you link in a different set of runtimes with >> the /Qparallel option?

>> Al

> The above also applies to the /Qopenmp option. If I remove both the / > Qparallel and /Qopenmp options, one IO bound part of my application > runs 3 times faster, but of course another conputationally intensive > part that uses OpenMP directives now runs more than 2 times slower on > my 2 processor box.

> Your multi-threaded runtime must no be as fast as your single-threaded > at this particular IO. Is there a way to build my application so that > some parts are compiled for OpenMP while other parts aren't?

> Al

And what does the above have to do with fortran? -- Wade Ward "I apparently do have time to bleed. Unusual luxury."

Wed, 10 Mar 2010 18:40:59 GMT

Steve Lione#12 / 16

IVF 10: IO speed slower with full optimizations than with a debug build

Quote:

> Your multi-threaded runtime must no be as fast as your single-threaded > at this particular IO. Is there a way to build my application so that > some parts are compiled for OpenMP while other parts aren't?

Ah, I should have asked you what you meant by "optimized". The thread safe libraries clearly need to protect themselves against operations in other threads. This involves synchronization primitives that do take extra time. I suppose one option is to make sure that all your I/ O is done in a single thread and link against the non-thread safe libraries.

Please do submit a report to Intel Premier support and provide all of the details that you have listed here. I know there has been some recent work done on optimizing threaded libraries which has not yet been released. I don't know whether your program would show any improvement with this work, so please do submit your test case.

Out of curiosity: you said that I/O using Lahey was faster. Was this also with parallel?

Steve

Wed, 10 Mar 2010 20:39:55 GMT

Al Greynold#13 / 16

IVF 10: IO speed slower with full optimizations than with a debug build

Quote:

> > Your multi-threaded runtime must no be as fast as your single-threaded > > at this particular IO. Is there a way to build my application so that > > some parts are compiled for OpenMP while other parts aren't?

> Ah, I should have asked you what you meant by "optimized". The thread > safe libraries clearly need to protect themselves against operations > in other threads. This involves synchronization primitives that do > take extra time. I suppose one option is to make sure that all your I/ > O is done in a single thread and link against the non-thread safe > libraries.

> Please do submit a report to Intel Premier support and provide all of > the details that you have listed here. I know there has been some > recent work done on optimizing threaded libraries which has not yet > been released. I don't know whether your program would show any > improvement with this work, so please do submit your test case.

> Out of curiosity: you said that I/O using Lahey was faster. Was this > also with parallel?

> Steve

Will do on the Intel Premier subsmission. One last point, the exact same application processing the exact same large file but on a Dual PowerMac G5 using the IBM XLF 8.1 compiler sees vitually no speed difference in the IO part whether compiled with or without OpenMP (time is comparable to non-OpenmMP IVF executable on Dell dual Pentium-4 Xeon workstation). So its definitely possible to write "fast" multi-threaded IO libraries.

Al

Al

Wed, 10 Mar 2010 21:13:38 GMT

Steve Lione#14 / 16

IVF 10: IO speed slower with full optimizations than with a debug build

Quote:

> Will do on the Intel Premier subsmission. One last point, the exact > same application processing the exact same large file but on a Dual > PowerMac G5 using the IBM XLF 8.1 compiler sees vitually no speed > difference in the IO part whether compiled with or without OpenMP > (time is comparable to non-OpenmMP IVF executable on Dell dual > Pentium-4 Xeon workstation). So its definitely possible to write > "fast" multi-threaded IO libraries.

But do you know if the IBM compiler actually has thread safe I/O libraries? Just because OpenMP is supported that doesn't mean that their I/O library protects against I/O from multiple threads.

Steve

Wed, 10 Mar 2010 21:47:55 GMT

Al Greynold#15 / 16

IVF 10: IO speed slower with full optimizations than with a debug build

Quote:

> > Will do on the Intel Premier subsmission. One last point, the exact > > same application processing the exact same large file but on a Dual > > PowerMac G5 using the IBM XLF 8.1 compiler sees vitually no speed > > difference in the IO part whether compiled with or without OpenMP > > (time is comparable to non-OpenmMP IVF executable on Dell dual > > Pentium-4 Xeon workstation). So its definitely possible to write > > "fast" multi-threaded IO libraries.

> But do you know if the IBM compiler actually has thread safe I/O > libraries? Just because OpenMP is supported that doesn't mean that > their I/O library protects against I/O from multiple threads.

> Steve

I use xlf95_r for OpenMP compiling which according to the IBM docs uses the "thread-safe" libraries.