Fortran DLL slows down 2 times when called from Visual C++ in comparison to calling from Fortran

Fortran DLL slows down 2 times when called from Visual C++ in comparison to calling from Fortran

There is a subroutine in Fortran code that has been given to me. This Fortran code is acually a simulation model of a physical process, so there is a criteria of simulation speed. I compiled the Fortran code into a DLL file.

Then the DLL subroutine is called from a Fortran program by using Windows API functions LoadLibrary and GetprocessAdress, it gives a the speed ratio of 770 which means that it takes 1 second to simulate the physical process in 770 seconds real time

Next, i do the same thing but this time in C++, and this gives me a speed ratio of 380 which is 2 times slower than the calling from Fortran program

i have tried to get through the problem by setting the Runtime library option when compiling C++ program to "Multithreaded DLL" which was set to the DLL but it did not resolve the problem

Please show the Fortran and C++ code used to call the routine. In particular, knowing how the input and output arguments are declared would be useful. Changing the runtime library option is not relevant. My guess is that you're not calling with the same inputs.

I have called simulation dlls from Intel Fortran AND Microsoft C++ (both 32 bit) and found no significant time difference in performance of the simulation.
What i am guessing is that somehow an input parameter or an option for the simulation is different between your Fortran call and your C++ call.
This would show itself in different results of the simulation between the Fortran call and the C++ call.
Are the simulation results identical between the two different calls?
If not, fix things so you are comparing the same simulation results across the two calls.
If so, then please post up the calling routines for Fortran and C++ and it may reveal the reason for the speed degradation.

In the absence of compiler options to the contrary, a Fortran default logical in ifort uses four bytes. In the absence of compile options to the contrary, a MS VC++ bool is one byte. It is quite probable that your fortran dll goes and stomps on the C++ program's stack. Your C++ program probably gets quite upset about this. Depending on how the C++ or Fortran compiler feels about packing structures you may have a related problem with the inlet structure (and the outlet structure too, in a slightly different scenario).

Consider using Fortran 2003's C interoperability features to make the link between Fortran and your C++ code robust to this sort of issue. While you are at it you can get rid of the non standard structure business (make them BIND(C) types).

lanH, I have trid to put bool variables of the C++ program into "int" which has 4bytes as "logical" type of ifort, but nothing changed

Citazione:

IanH ha scritto:

In the absence of compiler options to the contrary, a Fortran default logical in ifort uses four bytes. In the absence of compile options to the contrary, a MS VC++ bool is one byte. It is quite probable that your fortran dll goes and stomps on the C++ program's stack. Your C++ program probably gets quite upset about this. Depending on how the C++ or Fortran compiler feels about packing structures you may have a related problem with the inlet structure (and the outlet structure too, in a slightly different scenario).

Consider using Fortran 2003's C interoperability features to make the link between Fortran and your C++ code robust to this sort of issue. While you are at it you can get rid of the non standard structure business (make them BIND(C) types).

With the simulation in Visual Fortran, i set the runtime library of the dll project to Multithreaded, the speed ratio downs to about 380 (knowing that it was about 780 when runtime library is set to Multithreaded DLLs ).

Then, with the option Multithreaded, i add to the Additional dependancies the Libifcoremd.lib, The simulation spead ups to the the maximum (about 780)Does this make sense to anyone?

You have found that a compiler not so well known for optimization (dev-C) produces code that runs slower than code output by a Fortran compiler known for its optimization (Intel Fortran). The other tweaking that you did, with compiler options, different RTLs, etc., is probably not going to make much of a difference. There is always Amdahl's Law to consider in explaining how multi-threaded programs behave.

In such circumstances, one may rejoice that the Fortran program is "fast", or lament that the C program is "slow", or take a position somewhere in between.

Your quoting "real" time/simulation times is misleading, because it suggests that the ratio is of some significance. In solving a heat diffusion problem, for example, one may change "real" time by changing the diffusion coefficient, without causing any change to the run time of the simulation. Similarly, simple models of climate change can run a simulation of the entire (known) life of the Earth in a few hours.

In fact, the case im working on is involving mathemathical optimization, and there is iterative call to the simulation model, so the simulation speed is really something that matters. I dont know much about numerical modeling, so can not tell if in my case i can change any coefficient to change "real" time as in the example that you mentionned.

My last comment was to ask why only by changing the runtime library option from MD to MT, the simulation speed of my fortran code decreased two times? which would (i think) answer the initial questionCitazione:

mecej4 ha scritto:

You have found that a compiler not so well known for optimization (dev-C) produces code that runs slower than code output by a Fortran compiler known for its optimization (Intel Fortran). The other tweaking that you did, with compiler options, different RTLs, etc., is probably not going to make much of a difference. There is always Amdahl's Law to consider in explaining how multi-threaded programs behave.

In such circumstances, one may rejoice that the Fortran program is "fast", or lament that the C program is "slow", or take a position somewhere in between.

Your quoting "real" time/simulation times is misleading, because it suggests that the ratio is of some significance. In solving a heat diffusion problem, for example, one may change "real" time by changing the diffusion coefficient, without causing any change to the run time of the simulation. Similarly, simple models of climate change can run a simulation of the entire (known) life of the Earth in a few hours.

it turned out that the problem comes from a line in fortran code: IMPLICIT INTEGER(i-n)i changed it to: IMPLICIT INTEGER*2(i-n) and this anwser to initial question. I hope this will help someone who might have the same problem as minethx

I can only repeat the advice to use IMPLICIT NONE and to declare every variable explicitly instead of using the automatic declaration feature of ancient Fortran. I made the experience the hard way some years ago with a similar error.

I do not understand what you mean by "initial question", but if adding IMPLICIT INTEGER*2(..) "answered" the question, your code has serious problems. If this change was required for the program to run correctly, the speed comparisons that you started out with are invalid because you were comparing a correctly running program with an incorrectly running one.

The Fortran program was probably developed to work on a 16-bit CPU, and code with INTEGER*2 is going to run more slowly on today's 64-bit CPUs than code with default INTEGER for the platform.

mecej, you're right, there is a bug when i add IMPLICIT INTEGER*2(i-n), the code gives NAN results. :(

I'd have used the emoticon ":)", instead, since the NaNs in the result give a strong warning that there is something definitely wrong.

Bugs are harder to track down and fix when they do not affect the results so much that questions of plausibility arise. There have been cases where such bugs stayed hidden for decades in highly used and well-reputed software.

I do not understand what you mean by "initial question", but if adding IMPLICIT INTEGER*2(..) "answered" the question, your code has serious problems. If this change was required for the program to run correctly, the speed comparisons that you started out with are invalid because you were comparing a correctly running program with an incorrectly running one.

The Fortran program was probably developed to work on a 16-bit CPU, and code with INTEGER*2 is going to run more slowly on today's 64-bit CPUs than code with default INTEGER for the platform.

The usage of 2-byte integers is a good indication. Such half- (or quarter-) word integers make sense in new code only if either (i) interfacing Fortran to hardware that needs to exchange 16-bit integers, or (ii) to pack more integers into memory than if natural size integers were used.

I stopped programming for 8-bit CPUs over 20 years ago, yet someone decided to invest in 16-bit floating point data types for the Intel architectures which are currently being released. presumably on the basis of demand from influential customers.

The 16-bit data types involve use of 32-bit registers, using the same arithmetic instruction set as for 32-bit data types. As mecej4 says, their use could be justified only if a compensating benefit can be demonstrated in reduced memory usage. Even the benefit from using approximate divide and sqrt is reduced in the Ivy Bridge CPU.

i have made a dll without anny input argument (something like "void DLL(void)"), then i simply call the function from both C/C++ and Fortran (there is no loops this time) the program C/C++ is always much slower than Fortran program

The fact that the factor between the two cases is very close to two is suspicious. Is the CPU_TIME result printe by the program consistent with the elapsed time measured by your wrist watch?

The source code for the DLL is incomplete - you've only provided one subroutine. That means readers of the forum can't compile your code and investigate things - they have to "head compile" what you've provided and hope that things that are called are inocuous. Speaking for myself - my head compiler is notoriously buggy at the best of times. If you don't want to post the full source, then chop it down yourself to a compilable and linkable subset that still exhibits the problem (as a debugging/diagnostic strategy you should be doing this anyway). Otherwise we're all guessing.