***************
Performance Tools for Software Developers
libmmdd.dll is dependent on msvcrtd.dll which is no longer distributed.

Symptom(s):
Note: This only applies to the compilers for Intel® Extended Memory 64 Technology and for the Itanium® Architecture.
Applications or DLL's that are built with /MDd or directly link against Intel's libmmdd.dll may emit the runtime error.
This application has failed to start because msvcrtd.dll was not found. Re-installing the application may fix this problem.

Cause:
The Platform SDK distributed with Microsoft* Visual Studio* 2005 does not contain msvcrtd.dll. Using /MDd links against the Intel math library libmmdd.dll which has a dependency on msvcrtd.dll.

Solution:
This is a known issue that may be resolved in a future product release. As a work-around, use the msvcrtd.dll distributed with the Microsoft* Platform SDK available at http://www.microsoft.com/downloads/details.aspx?FamilyId=0BAF2B35-C656-4969-ACE8-E4C0C0716ADB&displaylang=en † .
***************

May need to get msvcrtd.dll from somewhere to be put into c"\windows\system32"

To check the stack size of a program.
Run "dumpbin /headers executable_file", and you can see the "size of stack reserve" information in "optional header values".

To enlarge the stack of a program:
Run "editbin /STACK: program.exe"

Alternatively
http://www.atalasoft.com/cs/blogs/rickm/archive/2008/04/22/increasing-the-size-of-your-stack-net-memory-management-part-3.aspx
The Easiest Way ( .NET 2.0 )
In .NET 2.0 and newer you can simply specify thread size in a thread’s constructor. Unfortunately, this method is only compatible only with Windows XP and newer operating systems. You can specify this parameter on those platforms but it will have no effect; the stack size in the binary header will be used.
using System.Threading;
…
Thread T = new Thread(threadDelegate, stackSizeInBytes);
T.Start();

- Case Sensitive: Fortran is not, C/C++ is.
- Arrays are always passed by reference
- ATTRIBUTES for a routine may be: C, STDCALL, REFERENCE, VARYING
- ATTRIBUTES for an argument may be: VALUE, REFERENCE
- C or STDCALL makes passing all arguments by value, except arrays.
- the VALUE or REFERENCE argument options, overide the routine option
of C or STDCALL.
- for IA-32 system, need to put underscore for routine to be called by C.
- cannot call internal procedures from outside the program unit that contains them.
- To pass Variable number of arguments, need C and VARYING, not STDCALL

3. Inside the code, in addition to the include directives in step 2, need to include
some USE statements. For example, to use the random number generator rnun, we need:
i) use rnun_int; or
ii) use imsl_libraries; or
iii) use numerical_libraries

iii - is used to provide backward compatibility with previous IMSL libraries and Fortran77
version of the library. It may not be necessary to use iii and calling the functions as before
will continue to work.

Using ii provides access to all the IMSL functions, so individual use statements are not needed.
However, some may choose to use i because it shows explicitly which functions are called.

Using BLAS
1. Intel MKL Blas library used automatically when IMSL is linked with the
SMP (ie. multiprocessing) option.
2. See ia32 or ia64 Readme to link 3rd party blas with IMSL.

IMSL version 6.0
- IMSL is now THREAD SAFE
- Env Var - run ia32\bin\fnlsetup.bat .
- MUST remove old references, eg. include 'link_f90_dll.h' (because new headers have diff name)
- MUST rename directory of older installations of IMSL, so that any old env vars cannot
accidentally point to it.
- Add include statement in the relevant source files:
include 'link_fnl_shared.h' ! for dynamic dlls include 'link_fnl_shared_hpc.h' ! for dynamic dlls and SMP (OpenMP)
- Add include directory in VS.Net
Project - Properties - Fortran - Include Directories: $(FNL_DIR)\ia32\include\dll
- Add library directory in VS.Net
Project - Properties - Fortran - Library Directories: $(FNL_DIR)\ia32\lib
- Run the ASSURANCE tests provided by IMSL in ...\examples\eiat. Note that
in run_test.bat, need to use %LINK_FNL_STATIC%

Setup MKL
==========
Linking to MKL can be done either statically *.lib or dynamically *.dll

For ia32 apps, when linking statically, link to mkl_c.lib or mkl_s.lib
For ia32 apps, when linking dynamically, link to these STATIC libs:
mkl_c_dll.lib or mkl_s_dll.lib
that will provide interfaces to the correct DLLs.

- To use THREADED / PARALLEL / OPENMP Intel MKL, it is highly recommended to compile your code with the /MT
option. The compiler driver will pass the option to the linker and the latter will load
multi-thread (MT) run-time libraries.
- For multi-threading based on Intel OpenMP
Interface:
lib\mkl_intel_c_dll.lib Threading:
lib\mkl_intel_thread_dll.lib,
bin\mkl_intel_thread.dll, Computation:
lib\mkl_core_dll.lib,
(many many bins....) RTL:
lib\libguide40.lib, OR lib\libiomp5md.lib,
bin\libguide40.dll, OR bin\libiomp5md.dll

Manually add this to SYSTEM VARIABLE -> Lib from Control Panel
C:\Program files\MPICH2\LIB;%IFORT_COMPILER10%Ia32\Lib;%MSVS8%\VC\atlmfc\lib;%MSVS8%\VC\lib;%MSVS8%\VC\PlatformSDK\lib;%FNL_DIR%\IA32\lib;

Intel Fortran 11.0
===================
1. New: Floating Point Model, some are not compatible with Floating Point Speculation
2. New: OpenMP 3.0 standard included
3. New: Fortran 2003 features included
4. Some functions may fail -> use macro like CBAEXPMODTEST=1 to mark out certain things.
5. See Fortran User / Ref Guide -> Building Apps -> Using Libraries -> Using IMSL
6. IMSL Readme.txt -> KAPMR does not behave in thread safe manner.
Use OpenMP critical region around KAPMR to be safe.

Managed Code
=============
Mixed-Language Programming and Intel Visual Fortran Project Types
This version of Intel Visual Fortran produces only unmanaged code, which is architecture-specific
code. You cannot create an Intel Visual Fortran main program that directly calls a
subprogram implementing managed code. To call managed code, you can call an unmanaged
code subprogram in a different language that does support calling managed code.

BLAS, IMSL and MKL
===================
Blas is implemented by IMSL - details are found in Chapter 9: Basic Matrix/Vector Operations.
Blas is also implemented by the hardware vendor - in this case Intel - in Intel's MKL library,
which may be written in machine code.

The BLAS API, i.e. the calling convention of the routines, are the same whether they are
implemented by MKL or IMSL. For example, SDOT is the routine that finds the dot product of two
vectors.

To use different implementation, the program has to link with different libraries.
For IMSL: imslblas_dll.dll
For MKL: mkl_p4.dll

By default, when using link_f90_dll.h, it include's IMSL's BLAS (see section "Setup IMSL")
By default, when using link_f90_dll_smp.h, it include's MKL's BLAS (see section "Setup IMSL")

If we want to use MKL without the SMP (parallel processing) feature, then instead of using
link_f90_dll.h, we have to manually add the directives and point to the correct BLAS, eg:

The DLL (*.dll) can be placed anywhere the system knows of, eg:
c:\windows\system32\ mkl_def.dll, mkl_p3.dll, mkl_p4.dll
(IMSL provides these 3 dlls from the MKL package)

The mkl_ia32.lib contain STATIC INTERFACES to dlls including BLAS, cblas, FFTs, VML.
However, there is no corresponding single mkl_ia32.dll. Instead it is spread over a few DLLs,
such as mkl_def.dll, mkl_vml_def.dll, mkl_lapack32.dll, etc.

If a function (eg vsinv from VML package of MKL) is included in the library mkl_ia32.lib,
but the dll does not exist, then the code WILL COMPILE. But during runtime, a fatal error
would occur because it cannot find and use the dll.

NOTE: the IMSL dlls and libs are installed in
C:\Program Files\VNI\CTT5.0\CTT5.0\lib\IA32

Building DLLs (Fortran DLLs used in Fortran apps)
=================================================

Note:
When a DLL is built the output are two files:
1) *.dll - has the library's executable code
2) *.lib - the import library providing interface between the program and the dll.

The notes here presents two cases:
Case A: DLL to be created in its own separate VS solution, called solnA, in project projA.
The two generated output will be projA.dll and projA.lib
Case B: DLL to be created in a project (projB) in the same solution (solnB) , as the
application project (projC).(The application project contains the code that uses the DLL.)The two generated output will be projB.dll and projB.lib

1. Build DLL project in its own solution
- Say we call this Solution solnFoo, and Project projFoo

- put this subroutine by itself into a file (eg. hello.f90) or into
a module (eg hello_mod.f90)

- DLLEXPORT needed to expose the name of the routine
- alias is needed for compatibility with Intel Fortran and VS.NET environment

3. Build the DLL in VS.NET by:
- Build (menu) -> Build or Build Solution
- Copy the *.lib and *.dll files and put them into same directory as the
executable code for the application; i.e. same directory as projC.exe

CASE B only:
- Ensure that the dependencies eg projB is UNchecked in the Project Dependency dialog box of ProjC

- in the solution explorer in VS.NET, click on the application's
project name, eg projC.
- From the Project menu or right clicking on the project, go to "Add existing item ..."
- Browse and choose "projB.lib" to add. The lib file should appear under solution explorer.
- From the Project menu or right clicking on the project, go to "Project Dependencies..."
- Alternative to the "Add Existing item..." way is to specify through the linker by:
with the project name highlighted, go to Project menu -> Properties -> Linker
-> "Additional Library Directories" -> type in dir path where *.lib is located.

5. Add interface to DLL routine in the application.
- goto into the subroutine of the application and add the following:

- DO NOT ADD the interface on the top level, eg DO NOT add in the starting part of a module. Instead
add the interface inside the module's subroutine that makes the call to the DLL routine.

- compile and run. Ensure that building mode is RELEASE, not DEBUG.

!DEC$ ATTRIBUTES directives
============================
1. C vs STDCALL - for controlling the stack of passed variables.
- both of these will try to make variables pass by value, rather than the Fortran default of
passing by reference.
- arrays are always passed by reference
- C -> the calling routine controls the stack. larger code.
- C -> possible to call variable number of arguments, MUST use "C, VARYING" to let
Fortran know that multiple arguments are called.
- C -> is default in C/C++ code. to use with fortran code, either
i) change the c code to STDCALL; or
extern void __stdcall foo_f90(int n);
ii) change the f90 code to use C convention
!DEC$ ATTRIBUTES C :: foo_f90
- STDCALL -> the called routine controls the stack.

2. VALUE vs REFERENCE
- for fortran, C or STDCALL will change default to passing by value, except arrays which will
be passed by reference
- But, each argument of the subroutine can be declared with VALUE or REFERENCE to override the
default mode, eg:
subroutine foo(a, b)
!DEC$ ATTRIBUTES VALUE :: a
!DEC$ ATTRIBUTES REFERENCE :: b

2. Build and Copy the following files from the DLL build directory to the application directory.mod_foo.dll, mod_foo.lib, mod_foo.mod

3. In the application that uses 'foo', add the statement:use mod_foo

This technique is only useful when both application and DLL are written in Fortran. The variable
names will have leading underscore "_". This is transparen to the user who uses "use mod_foo".
Such DLL are not convenient for DLLs that are to be used with other languages because of the leading
underscore on variable names.

This technique allows other Fortran projects to make use of both data and functions in DLLs.
However, other languages will not be able to make use of the data directly (may need to have underscore
for variable names in the other languages calling this Fortran DLL).

Cryptic LNK errors
1. When using a function from another place, eg DLL, etc; ensure that an "interface" block is written
for at the code which calls the function.
2. Ensure the library path is defined. Eg. In VS.Net -> right click project -> Properties -> Linker
-> General -> "Additional Library Directories"

Access Violation
1. Passing integer*4 into a subroutine with parameter declared as integer*8
2. Subroutine A in a module is DLL exported. Another subroutine within the same project uses subroutine A from another module WILL cause a CONFLICT. Since it is being used within the same project, subroutine A need a wrapper which is NOT DLL exported. This wrapper can be called by other module subroutines within the same project.
3. When an ARRAY of derived type contains components which are also derived types, then it must be declared with fixed size (i.e. hardcode dimension) or the variable must be a dynamic array (i.e. declared ALLOCATABLE). It cannot be declared with size specified by a parameter.
eg.
function foo(a, b)
real :: NestedDerivedTypeA(4) ! GOOD
real, allocatable :: NestedDerivedTypeB(:) ! GOOD
real :: NestedDerivedTypeA(b) ! BAD
4. Crash pointing to problem with allocatable arrays which are used in OpenMP region. Message: "Subscript #x of the array has value xxxx which is greater than the upper bound of ..."Reason: Known bug in Intel Fortran Compiler that occurs when code compiled using the /check:pointer option (under the Runtime category in project properties).

Derived Data Type - Nested
1. Complicated derived data types that involves nested derived types will not be able to be displayed in the debuger / variable watch space. The displayed numbers are grossly in error.

DLL not exposed properly -
When calling a function in a dll, but that function has not been exposed, then the following error may occur:
"The procedure entry point ..... could not be located in the dynamic link library ....dll"

IMSL ErrorsMessage:Error: There is no matching specific subroutine for this generic subroutine call.Cause: IMSL documentation shows Fortran90 version with D_RNCHI, but unless using somehow, still obeyingFortran77. So use Fortran77 name which is DRNCHI.Solution:
Instead of using Fortran90 style -> D_RNCHIwe use -> DRNCHI

ThreadChecker Errors:
Problem Description:We recently received several problem reports. If the size of user's application is extreme big, the user complained that the application (launched by Thread Profiler) ran slowly.
Cause:Thread Profiler's engine uses 600MB (default) in the heap. If the application also needs to consume higher memory space in the heap and the user works on lower hardware (memory) configuration, it causes this problem
Resolution: Use Configure -> Modify -> "Execution" tab -> "Limit the size of the heap of the heap used by the analysis engine to [ ] MB", adjust to smaller number. Note when Thread Checker reaches the memory limit, it may discard older statistics, causing some loss of results.

How to Add Version and other Metadata to DLL or EXE
=====================================================
Assume platform is Intel Fortran 8.1 and VS.Net 2003, but may apply to later versions too.
1. Go to Solutions Explorer and right click on the project name.
2. Choose Add New Item. In the Add New Item dialog, choose resource. A resourceX.rc will be created in the "Resource Files" folder directly under the project directory. Perhaps if this file already exist, we can skip to the next step.
3. Double click to open the resourceX.rc file.
4. In the resourceX.rc file, right click on the name resourceX.rc and choose "Add resource..."
5. In the "Add Resource" dialog, choose Version.
6. Fill in the relevant versioning and metadata info that is required.
7. Then build the project.
8. Check by right-clicking on the dll or exe file.

2. Put this "/FIXED:NO" in:
VS.NET -> Project -> Properties -> Linker -> Command Line -> Additional Options
.... this is to ensure that VTune's Call Graph can be used. This only applies to the executable project.

3. Application to Launch - select and app or driver/dll that is already running.
Call Graph - must specify application to Launch.
Sampling and Counter may select "No App to launch"

4. Counter Monitor - Intel recommend using this first.
- uses native Windows performance counters, eg. processor queue, memory, processor time
- Has the following info:
- the Logged Data view
- the Legend
- the Summary view
- the Data Table - click on Logged Data View first to access
- Two main monitors to check are:
- %Processor Time: The closer to 100% the better. This is calculated by taking amount
of time spent in the Idle thread and subtracting from 100%
- System Processor Queue length - There is a single queue for processor time even on
multiprocessor systems. This counter should be less than 2. It measures how many threads are waiting to execute.
- Intel Tuning Advice - to get the advice, from the Logged Data View, highlight the
section of the graph of interest. Then click on the Tuning Assistant button.
- Drill Down to Correlated Sampling Data View.
- To use sampling data, need to collect sampling data when collecting counter data.

5. Sampling Mode
- Look at Samples or Events of CPU_CLK_UNHALTED.CORE --- CPU cycles when a core is active
This shows where most cpu cycles are used.

Definitions:
CPU_CLK_UNHALTED.CORE
Event Code: Counted by fixed counter number 1
Category: Basic Performance Tuning Events;Multi-Core Events;
Definition: Core cycles when core is not halted.
Description: This event counts the number of core cycles while the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios.
In mobile systems the core frequency may change from time to time. For this reason this event may have a changing ratio with regards to time. In systems with a constant core frequency, this event can give you a measurement of the elapsed time while the core was not in halt state by dividing the event count by the core frequency.

INST_RETIRED.ANY
Event Code: Counted by fixed counter number 0
Category: Basic Performance Tuning Events;
Definition: Instructions retired.
Description: This event counts the number of instructions that retire execution. For instructions that consist of multiple micro-ops, this event counts the retirement of the last micro-op of the instruction. The counter continues counting during hardware interrupts, traps, and inside interrupt handlers.

Clocks per Instructions Retired - CPI
Equation: CPU_CLK_UNHALTED.CORE / INST_RETIRED.ANY
Category: Basic Performance Tuning Ratios; Ratios for Tuning Assistant Advice;
Definition: High CPI indicates that instructions require more cycles to execute than they should. In this case there may be opportunities to modify your code to improve the efficiency with which instructions are executed within the processor. CPI can get as low as 0.25 cycles per instructions.

SAV = Sample After Value
This is the sampling frequency used for the sampling process. Typically it is 2,000,000.

Specifies the type of argument-passing conventions used for general arguments and for hidden-length character arguments.
Possible values are:
/iface:mixed_str_len_arg: The hidden lengths should be placed immediately after their corresponding character argument in the argument list, which is the method used by Microsoft* Fortran PowerStation.
/iface:nomixed_str_len_arg: The hidden lengths should be placed in sequential order at the end of the argument list. When porting mixed-language programs that pass character arguments, either this option must be specified correctly or the order of hidden length arguments changed in the source code.

See also Programming with Mixed Languages Overview and related sections.

3. Configuring to use MKL- at installation time, say yes to add vars to PATH, LIB, INCLUDE.- alternatively, run mklvars32.bat

4. Using Fortran95 BLAS or LAPACK
- Need to build from Intels sources, go to mkl\8.1.1\interfaces\blas95,lapack95- nmake PLAT=win32 lib -> a *.mod file will be created- or go to INCLUDE directory and: ifort -c mkl_lapack|blas.f90- Or to make it in the user's directory: 1. copy mkl\8.1.1\interfaces\blas95,lapack95 into 2. copy from INCLUDE to these files: mkl_lapack|blas.f90 3. run in the blas,lapack directories: nmake PLAT=win32 INTERFACE=mkl_blas|lapack.f90 lib
for 64 bit
- nmake can be found in C:\Program Files\Microsoft Visual Studio 8\VC\bin\- from the Start Menu, open Intel Visual Fortran Build Environment using Intel 64.- nmake PLAT=win32e lib- mod files will be automatically copied to ..../em64t

6. Errors
a) Compile error:
SortProj1 error LNK2019: unresolved external symbol _VSLNEWSTREAM referenced in function _MAIN__.L
Solution:
1. put the following in the code at start of module or program, NOT subroutine or function
use MKL_VSL_TYPE
use MKL_VSL
!dec$objcomment lib:'mkl_c_dll.lib'
!dec$objcomment lib:'mkl_ia32.lib'
2. Could also be sometimes need DLLIMPORT rather than DLLEXPORT, especially in RELEASE version????
3. If the function is a Fortran95 function, such as gemv, then the solution is to "call dgemv.." rather
than "call gemv..."

7. Prerequisite Directories - these need to be put in Project -> Properties or command line or etc...
1. Include Directories: C:\Program Files\Intel\MKL\8.1.1\include
2. Library Directories: C:\Program Files\Intel\MKL\8.1.1\ia32\lib
3. Put the following line in the start of one of the source code, before the program or module keyword. include 'mkl_vsl.fi' ! This is a full-fledged module by MKL
4. Put the following at the start of a module or program, not within a function or subroutine
use MKL_VSL_TYPE
use MKL_VSL
!dec$objcomment lib:'mkl_c_dll.lib'
!dec$objcomment lib:'mkl_ia32.lib' implicit none

DLL
- not needed because we will be using explicit interfaces.
- Also F95 lapack routines have optional arguments which REQUIRE interfaces (eg gesv).

LIB
- mkl_lapack95.lib needed (created once off by administrator or first user)
- Use in the code as:!dec$objcomment lib:'mkl_lapack95.lib'
!dec$objcomment lib:'mkl_c_dll.lib'
!dec$objcomment lib:'mkl_ia32.lib'
- Don't need!dec$objcomment lib:'mkl_lapack.lib'
- must be linked during compile time either
i) ifort ..... mkl_lapack95.lib; orii)specify the path in "Additional Library Directories"

MOD
- mkl95_lapack.mod needed (created once off by administrator or first user from mkl_lapack.f90)
- contains the collection of interfaces to be used in the code by having:USE MKL95_LAPACK
- must be present during compile time in the directory path of either:
i) same location as application source files.f90ii) INCLUDE directories as specified in VS.Net as "Additional Include Directories"

Mixed language programming
============================
Hi Clinton,
It looks like library format incompatibility problem. We adhere to microsft format.
Please follow following steps as a work-around ;
Once you generate .dll from intel FORTRAN compiler; follow the following steps,

1. D:\>pedump /exp MatlabFunctions.dll > MatlabFunctions.exp

D:\>notepad MatlabFunctions.exp (Edit this file and replace MATEXP with _MATEXP)

The /Gs0 option enables stack-checking for all functions.
The /Gsn option checks by default the stack space allocated for functions with more than 4KB.
The /Fn option sets the stack reserve amount for the program. The /Fn option passes /stack:n to the linker.

Enable Vectorization and Report
================================
To enable automatic vectorization, use these switches:
/Qx... or /Qax....
To enable report, use:
/Qvec-report....

or to Compile for single thread, use the preprocessor /Qfpp, but not the OpenMP /Qopenmp.

4. DO NOT USE /Qprof-genx with OpenMP - spurious errors like array out of bounds will result.

5. To use OpenMP functions like, omp_get_num_threads(), instead of using
include "omp_lib.h",
better to use:
external omp_get_num_threads
integer omp_get_num_threads

Using Thread Profiler
=======================
1. Compiler options to enable Thread Profiling:
a) /Zi - full debugging format
b) /fixed:no - linker option to make code relocatable
c) /MDd - option tells the linker to search for unresolved references
in a multithreaded, debug, dynamic-link (DLL) run-time library.
This is the same as specifying options /libs:dll /threads /dbglibs.
d) /Qopenmp-profile - enable profiling of OpenMP.
WARNING: this option should not be used with IMSL since IMSL will link to libguide or libguide40, but
this option creates code that will link to libguide_stats or libguide40_stats

Using Thread Checker
=====================

Add the following library path:
.....VTune\Analyzer\Lib without this compiling error occurs stating that libassuret40.lib was not found.

To run Intel Thread Checker, run VTune first in a NEW project. When the VTune is finished analysing, then run thread checker from the SAME project, by running as a NEW Activity.

Troubleshooting:
- ensure /Qtcheck is only on the EXECUTABLE, not other dlls.
- check working directory is correct.
- when EXECUTABLE has /Qtcheck, it cannot be run from console mode.

Profile Guided Optimization
============================
This is a 3 step process:
1. Compile with /Qprof-gen. Using /Qprof-genx allows Code Coverage tool to be used.
DO NOT USE /Qprof-genx WITH OPENMP.
Note: For Code Coverage, new option is /Qprof-gen:srcpos

To use code coverage which is available for Intel compilers, the code needs to be prepared during compilation, then the application need to be run. The following is the general method.

1. Compile source code with /Qprof-gen:srcpos option.
By default, pgopti.spi, a static profile file is created. This name can be changed using the -prof-file option.
2. Run the application. This will create multiple dyn files for dynamic profile information.
3. Use the profmerge tool to merge dyn files into pgopti.dpi file.
profmerge -prof_dpi
4. Run code coverage using both static and dynamic files
codecov -spi -dpi
5. The results are published into CODE_COVERAGE.HTML

Note that these commands should run in the same directory as the source code and execution directory.