This option is used to indicate that the host system's integers are 32-bits
wide, and longs and pointers are 64-bits wide. Not all benchmarks
recognize this macro, but the preferred practice for data model selection
applies the flags to all benchmarks; this flag description is a placeholder
for those benchmarks that do not recognize this macro.

This option is used to indicate that the host system's integers are 32-bits
wide, and longs and pointers are 64-bits wide. Not all benchmarks
recognize this macro, but the preferred practice for data model selection
applies the flags to all benchmarks; this flag description is a placeholder
for those benchmarks that do not recognize this macro.

This option is used to indicate that the host system's integers are 32-bits
wide, and longs and pointers are 64-bits wide. Not all benchmarks
recognize this macro, but the preferred practice for data model selection
applies the flags to all benchmarks; this flag description is a placeholder
for those benchmarks that do not recognize this macro.

This option is used to indicate that the host system's integers are 32-bits
wide, and longs and pointers are 64-bits wide. Not all benchmarks
recognize this macro, but the preferred practice for data model selection
applies the flags to all benchmarks; this flag description is a placeholder
for those benchmarks that do not recognize this macro.

This option is used to indicate that the host system's integers are 32-bits
wide, and longs and pointers are 64-bits wide. Not all benchmarks
recognize this macro, but the preferred practice for data model selection
applies the flags to all benchmarks; this flag description is a placeholder
for those benchmarks that do not recognize this macro.

This option is used to indicate that the host system's integers are 32-bits
wide, and longs and pointers are 64-bits wide. Not all benchmarks
recognize this macro, but the preferred practice for data model selection
applies the flags to all benchmarks; this flag description is a placeholder
for those benchmarks that do not recognize this macro.

This option is used to indicate that the host system's integers are 32-bits
wide, and longs and pointers are 64-bits wide. Not all benchmarks
recognize this macro, but the preferred practice for data model selection
applies the flags to all benchmarks; this flag description is a placeholder
for those benchmarks that do not recognize this macro.

This option is used to indicate that the host system's integers are 32-bits
wide, and longs and pointers are 64-bits wide. Not all benchmarks
recognize this macro, but the preferred practice for data model selection
applies the flags to all benchmarks; this flag description is a placeholder
for those benchmarks that do not recognize this macro.

This option is used to indicate that the host system's integers are 32-bits
wide, and longs and pointers are 64-bits wide. Not all benchmarks
recognize this macro, but the preferred practice for data model selection
applies the flags to all benchmarks; this flag description is a placeholder
for those benchmarks that do not recognize this macro.

This macro determines which file system interface will be used. Common file i/o calls like stat()
and readdir() return off_t data that may or may not fit within a 32bit data structure if this flag
is not used. With _FILE_OFFSET_BITS=64, types like off_t have a size of 64 bits. The truncation that
happens without _FILE_OFFSET_BITS=64 has been observed to yield intermittent failures.

Ex: RHEL7 distributions format partitions using xfs. Runtime errors are observed on such systems because
sometimes returned values will not fit into 32bit data types that are a mismatch for xfs.

This option is used to indicate that the host system's integers are 32-bits
wide, and longs and pointers are 64-bits wide. Not all benchmarks
recognize this macro, but the preferred practice for data model selection
applies the flags to all benchmarks; this flag description is a placeholder
for those benchmarks that do not recognize this macro.

This option is used to indicate that the host system's integers are 32-bits
wide, and longs and pointers are 64-bits wide. Not all benchmarks
recognize this macro, but the preferred practice for data model selection
applies the flags to all benchmarks; this flag description is a placeholder
for those benchmarks that do not recognize this macro.

This macro determines which file system interface will be used. Common file i/o calls like stat()
and readdir() return off_t data that may or may not fit within a 32bit data structure if this flag
is not used. With _FILE_OFFSET_BITS=64, types like off_t have a size of 64 bits. The truncation that
happens without _FILE_OFFSET_BITS=64 has been observed to yield intermittent failures.

Ex: RHEL7 distributions format partitions using xfs. Runtime errors are observed on such systems because
sometimes returned values will not fit into 32bit data types that are a mismatch for xfs.

This option is used to indicate that the host system's integers are 32-bits
wide, and longs and pointers are 64-bits wide. Not all benchmarks
recognize this macro, but the preferred practice for data model selection
applies the flags to all benchmarks; this flag description is a placeholder
for those benchmarks that do not recognize this macro.

This option is used to indicate that the host system's integers are 32-bits
wide, and longs and pointers are 64-bits wide. Not all benchmarks
recognize this macro, but the preferred practice for data model selection
applies the flags to all benchmarks; this flag description is a placeholder
for those benchmarks that do not recognize this macro.

This option is used to indicate that the host system's integers are 32-bits
wide, and longs and pointers are 64-bits wide. Not all benchmarks
recognize this macro, but the preferred practice for data model selection
applies the flags to all benchmarks; this flag description is a placeholder
for those benchmarks that do not recognize this macro.

This option is used to indicate that the host system's integers are 32-bits
wide, and longs and pointers are 64-bits wide. Not all benchmarks
recognize this macro, but the preferred practice for data model selection
applies the flags to all benchmarks; this flag description is a placeholder
for those benchmarks that do not recognize this macro.

This option is used to indicate that the host system's integers are 32-bits
wide, and longs and pointers are 64-bits wide. Not all benchmarks
recognize this macro, but the preferred practice for data model selection
applies the flags to all benchmarks; this flag description is a placeholder
for those benchmarks that do not recognize this macro.

Base Optimization Flags

Code is optimized for Intel(R) processors with support for AVX2 instructions.
The resulting code may contain unconditional use of features that are not supported
on other processors. This option also enables new optimizations in addition to
Intel processor-specific optimizations including advanced data layout and code
restructuring optimizations to improve memory accesses for Intel processors.

Do not use this option if you are executing a program on a processor that
is not an Intel processor. If you use this option on a non-compatible processor
to compile the main program (in Fortran) or the function main() in C/C++, the
program will display a fatal run-time error if they are executed on unsupported
processors.

Padding the size of certain power-of-two arrays to allow
more efficient cache use.

On IA-32 and Intel EM64T processors, when O3 is used with options
-ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler
performs more aggressive data dependency analysis than for O2, which
may result in longer compilation times.
The O3 optimizations may not cause higher performance unless loop and
memory access transformations take place. The optimizations may slow
down code in some cases compared to O2 optimizations.
The O3 option is recommended for applications that have loops that heavily
use floating-point calculations and process large data sets.

-no-prec-div enables optimizations that give slightly less precise results
than full IEEE division.

When you specify -no-prec-div along with some optimizations, such as
-xN and -xB (Linux) or /QxN and /QxB (Windows),
the compiler may change floating-point division computations into
multiplication by the reciprocal of the denominator.
For example, A/B is computed as A * (1/B) to improve the speed of the
computation.

However, sometimes the value produced by this transformation is
not as accurate as full IEEE division. When it is important to have fully
precise IEEE division, do not use -no-prec-div.
This will enable the default -prec-div and the result will be more accurate,
with some loss of performance.

Code is optimized for Intel(R) processors with support for AVX2 instructions.
The resulting code may contain unconditional use of features that are not supported
on other processors. This option also enables new optimizations in addition to
Intel processor-specific optimizations including advanced data layout and code
restructuring optimizations to improve memory accesses for Intel processors.

Do not use this option if you are executing a program on a processor that
is not an Intel processor. If you use this option on a non-compatible processor
to compile the main program (in Fortran) or the function main() in C/C++, the
program will display a fatal run-time error if they are executed on unsupported
processors.

Padding the size of certain power-of-two arrays to allow
more efficient cache use.

On IA-32 and Intel EM64T processors, when O3 is used with options
-ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler
performs more aggressive data dependency analysis than for O2, which
may result in longer compilation times.
The O3 optimizations may not cause higher performance unless loop and
memory access transformations take place. The optimizations may slow
down code in some cases compared to O2 optimizations.
The O3 option is recommended for applications that have loops that heavily
use floating-point calculations and process large data sets.

-no-prec-div enables optimizations that give slightly less precise results
than full IEEE division.

When you specify -no-prec-div along with some optimizations, such as
-xN and -xB (Linux) or /QxN and /QxB (Windows),
the compiler may change floating-point division computations into
multiplication by the reciprocal of the denominator.
For example, A/B is computed as A * (1/B) to improve the speed of the
computation.

However, sometimes the value produced by this transformation is
not as accurate as full IEEE division. When it is important to have fully
precise IEEE division, do not use -no-prec-div.
This will enable the default -prec-div and the result will be more accurate,
with some loss of performance.

Code is optimized for Intel(R) processors with support for AVX2 instructions.
The resulting code may contain unconditional use of features that are not supported
on other processors. This option also enables new optimizations in addition to
Intel processor-specific optimizations including advanced data layout and code
restructuring optimizations to improve memory accesses for Intel processors.

Do not use this option if you are executing a program on a processor that
is not an Intel processor. If you use this option on a non-compatible processor
to compile the main program (in Fortran) or the function main() in C/C++, the
program will display a fatal run-time error if they are executed on unsupported
processors.

Padding the size of certain power-of-two arrays to allow
more efficient cache use.

On IA-32 and Intel EM64T processors, when O3 is used with options
-ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler
performs more aggressive data dependency analysis than for O2, which
may result in longer compilation times.
The O3 optimizations may not cause higher performance unless loop and
memory access transformations take place. The optimizations may slow
down code in some cases compared to O2 optimizations.
The O3 option is recommended for applications that have loops that heavily
use floating-point calculations and process large data sets.

-no-prec-div enables optimizations that give slightly less precise results
than full IEEE division.

When you specify -no-prec-div along with some optimizations, such as
-xN and -xB (Linux) or /QxN and /QxB (Windows),
the compiler may change floating-point division computations into
multiplication by the reciprocal of the denominator.
For example, A/B is computed as A * (1/B) to improve the speed of the
computation.

However, sometimes the value produced by this transformation is
not as accurate as full IEEE division. When it is important to have fully
precise IEEE division, do not use -no-prec-div.
This will enable the default -prec-div and the result will be more accurate,
with some loss of performance.

Option standard-realloc-lhs (the default), tells the compiler that when the left-hand side of an assignment is an allocatable object, it should be reallocated to the shape of the
right-hand side of the assignment before the assignment occurs. This is the current Fortran Standard definition. This feature may cause extra overhead at run time. This option has
the same effect as option assume realloc_lhs.

If you specify nostandard-realloc-lhs, the compiler uses the old Fortran 2003 rules when interpreting assignment statements. The left-hand side is assumed to be allocated with the
correct shape to hold the right-hand side. If it is not, incorrect behavior will occur. This option has the same effect as option assume norealloc_lhs.

The align toggle changes how data elements are aligned. Variables and arrays are analyzed and memory layout can be altered. Specifying array32byte will look for opportunities to transform and reailgn arrays to 32byte boundaries.

Peak Optimization Flags

Instrument program for profiling for the first phase of
two-phase profile guided otimization. This instrumentation gathers information
about a program's execution paths and data values but does not gather
information from hardware performance counters. The profile instrumentation
also gathers data for optimizations which are unique to profile-feedback
optimization.

Instructs the compiler to produce a profile-optimized
executable and merges available dynamic information (.dyn)
files into a pgopti.dpi file. If you perform multiple
executions of the instrumented program, -prof-use merges
the dynamic information files again and overwrites the
previous pgopti.dpi file.
Without any other options, the current directory is
searched for .dyn files

Code is optimized for Intel(R) processors with support for AVX2 instructions.
The resulting code may contain unconditional use of features that are not supported
on other processors. This option also enables new optimizations in addition to
Intel processor-specific optimizations including advanced data layout and code
restructuring optimizations to improve memory accesses for Intel processors.

Do not use this option if you are executing a program on a processor that
is not an Intel processor. If you use this option on a non-compatible processor
to compile the main program (in Fortran) or the function main() in C/C++, the
program will display a fatal run-time error if they are executed on unsupported
processors.

Padding the size of certain power-of-two arrays to allow
more efficient cache use.

On IA-32 and Intel EM64T processors, when O3 is used with options
-ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler
performs more aggressive data dependency analysis than for O2, which
may result in longer compilation times.
The O3 optimizations may not cause higher performance unless loop and
memory access transformations take place. The optimizations may slow
down code in some cases compared to O2 optimizations.
The O3 option is recommended for applications that have loops that heavily
use floating-point calculations and process large data sets.

-no-prec-div enables optimizations that give slightly less precise results
than full IEEE division.

When you specify -no-prec-div along with some optimizations, such as
-xN and -xB (Linux) or /QxN and /QxB (Windows),
the compiler may change floating-point division computations into
multiplication by the reciprocal of the denominator.
For example, A/B is computed as A * (1/B) to improve the speed of the
computation.

However, sometimes the value produced by this transformation is
not as accurate as full IEEE division. When it is important to have fully
precise IEEE division, do not use -no-prec-div.
This will enable the default -prec-div and the result will be more accurate,
with some loss of performance.

Instrument program for profiling for the first phase of
two-phase profile guided otimization. This instrumentation gathers information
about a program's execution paths and data values but does not gather
information from hardware performance counters. The profile instrumentation
also gathers data for optimizations which are unique to profile-feedback
optimization.

Instructs the compiler to produce a profile-optimized
executable and merges available dynamic information (.dyn)
files into a pgopti.dpi file. If you perform multiple
executions of the instrumented program, -prof-use merges
the dynamic information files again and overwrites the
previous pgopti.dpi file.
Without any other options, the current directory is
searched for .dyn files

Code is optimized for Intel(R) processors with support for AVX2 instructions.
The resulting code may contain unconditional use of features that are not supported
on other processors. This option also enables new optimizations in addition to
Intel processor-specific optimizations including advanced data layout and code
restructuring optimizations to improve memory accesses for Intel processors.

Do not use this option if you are executing a program on a processor that
is not an Intel processor. If you use this option on a non-compatible processor
to compile the main program (in Fortran) or the function main() in C/C++, the
program will display a fatal run-time error if they are executed on unsupported
processors.

Padding the size of certain power-of-two arrays to allow
more efficient cache use.

On IA-32 and Intel EM64T processors, when O3 is used with options
-ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler
performs more aggressive data dependency analysis than for O2, which
may result in longer compilation times.
The O3 optimizations may not cause higher performance unless loop and
memory access transformations take place. The optimizations may slow
down code in some cases compared to O2 optimizations.
The O3 option is recommended for applications that have loops that heavily
use floating-point calculations and process large data sets.

-no-prec-div enables optimizations that give slightly less precise results
than full IEEE division.

When you specify -no-prec-div along with some optimizations, such as
-xN and -xB (Linux) or /QxN and /QxB (Windows),
the compiler may change floating-point division computations into
multiplication by the reciprocal of the denominator.
For example, A/B is computed as A * (1/B) to improve the speed of the
computation.

However, sometimes the value produced by this transformation is
not as accurate as full IEEE division. When it is important to have fully
precise IEEE division, do not use -no-prec-div.
This will enable the default -prec-div and the result will be more accurate,
with some loss of performance.

Code is optimized for Intel(R) processors with support for AVX2 instructions.
The resulting code may contain unconditional use of features that are not supported
on other processors. This option also enables new optimizations in addition to
Intel processor-specific optimizations including advanced data layout and code
restructuring optimizations to improve memory accesses for Intel processors.

Do not use this option if you are executing a program on a processor that
is not an Intel processor. If you use this option on a non-compatible processor
to compile the main program (in Fortran) or the function main() in C/C++, the
program will display a fatal run-time error if they are executed on unsupported
processors.

Padding the size of certain power-of-two arrays to allow
more efficient cache use.

On IA-32 and Intel EM64T processors, when O3 is used with options
-ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler
performs more aggressive data dependency analysis than for O2, which
may result in longer compilation times.
The O3 optimizations may not cause higher performance unless loop and
memory access transformations take place. The optimizations may slow
down code in some cases compared to O2 optimizations.
The O3 option is recommended for applications that have loops that heavily
use floating-point calculations and process large data sets.

-no-prec-div enables optimizations that give slightly less precise results
than full IEEE division.

When you specify -no-prec-div along with some optimizations, such as
-xN and -xB (Linux) or /QxN and /QxB (Windows),
the compiler may change floating-point division computations into
multiplication by the reciprocal of the denominator.
For example, A/B is computed as A * (1/B) to improve the speed of the
computation.

However, sometimes the value produced by this transformation is
not as accurate as full IEEE division. When it is important to have fully
precise IEEE division, do not use -no-prec-div.
This will enable the default -prec-div and the result will be more accurate,
with some loss of performance.

Code is optimized for Intel(R) processors with support for AVX2 instructions.
The resulting code may contain unconditional use of features that are not supported
on other processors. This option also enables new optimizations in addition to
Intel processor-specific optimizations including advanced data layout and code
restructuring optimizations to improve memory accesses for Intel processors.

Do not use this option if you are executing a program on a processor that
is not an Intel processor. If you use this option on a non-compatible processor
to compile the main program (in Fortran) or the function main() in C/C++, the
program will display a fatal run-time error if they are executed on unsupported
processors.

Padding the size of certain power-of-two arrays to allow
more efficient cache use.

On IA-32 and Intel EM64T processors, when O3 is used with options
-ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler
performs more aggressive data dependency analysis than for O2, which
may result in longer compilation times.
The O3 optimizations may not cause higher performance unless loop and
memory access transformations take place. The optimizations may slow
down code in some cases compared to O2 optimizations.
The O3 option is recommended for applications that have loops that heavily
use floating-point calculations and process large data sets.

-no-prec-div enables optimizations that give slightly less precise results
than full IEEE division.

When you specify -no-prec-div along with some optimizations, such as
-xN and -xB (Linux) or /QxN and /QxB (Windows),
the compiler may change floating-point division computations into
multiplication by the reciprocal of the denominator.
For example, A/B is computed as A * (1/B) to improve the speed of the
computation.

However, sometimes the value produced by this transformation is
not as accurate as full IEEE division. When it is important to have fully
precise IEEE division, do not use -no-prec-div.
This will enable the default -prec-div and the result will be more accurate,
with some loss of performance.

Instrument program for profiling for the first phase of
two-phase profile guided otimization. This instrumentation gathers information
about a program's execution paths and data values but does not gather
information from hardware performance counters. The profile instrumentation
also gathers data for optimizations which are unique to profile-feedback
optimization.

Instructs the compiler to produce a profile-optimized
executable and merges available dynamic information (.dyn)
files into a pgopti.dpi file. If you perform multiple
executions of the instrumented program, -prof-use merges
the dynamic information files again and overwrites the
previous pgopti.dpi file.
Without any other options, the current directory is
searched for .dyn files

Code is optimized for Intel(R) processors with support for AVX2 instructions.
The resulting code may contain unconditional use of features that are not supported
on other processors. This option also enables new optimizations in addition to
Intel processor-specific optimizations including advanced data layout and code
restructuring optimizations to improve memory accesses for Intel processors.

Do not use this option if you are executing a program on a processor that
is not an Intel processor. If you use this option on a non-compatible processor
to compile the main program (in Fortran) or the function main() in C/C++, the
program will display a fatal run-time error if they are executed on unsupported
processors.

Padding the size of certain power-of-two arrays to allow
more efficient cache use.

On IA-32 and Intel EM64T processors, when O3 is used with options
-ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler
performs more aggressive data dependency analysis than for O2, which
may result in longer compilation times.
The O3 optimizations may not cause higher performance unless loop and
memory access transformations take place. The optimizations may slow
down code in some cases compared to O2 optimizations.
The O3 option is recommended for applications that have loops that heavily
use floating-point calculations and process large data sets.

-no-prec-div enables optimizations that give slightly less precise results
than full IEEE division.

When you specify -no-prec-div along with some optimizations, such as
-xN and -xB (Linux) or /QxN and /QxB (Windows),
the compiler may change floating-point division computations into
multiplication by the reciprocal of the denominator.
For example, A/B is computed as A * (1/B) to improve the speed of the
computation.

However, sometimes the value produced by this transformation is
not as accurate as full IEEE division. When it is important to have fully
precise IEEE division, do not use -no-prec-div.
This will enable the default -prec-div and the result will be more accurate,
with some loss of performance.

Instrument program for profiling for the first phase of
two-phase profile guided otimization. This instrumentation gathers information
about a program's execution paths and data values but does not gather
information from hardware performance counters. The profile instrumentation
also gathers data for optimizations which are unique to profile-feedback
optimization.

Instructs the compiler to produce a profile-optimized
executable and merges available dynamic information (.dyn)
files into a pgopti.dpi file. If you perform multiple
executions of the instrumented program, -prof-use merges
the dynamic information files again and overwrites the
previous pgopti.dpi file.
Without any other options, the current directory is
searched for .dyn files

Code is optimized for Intel(R) processors with support for AVX2 instructions.
The resulting code may contain unconditional use of features that are not supported
on other processors. This option also enables new optimizations in addition to
Intel processor-specific optimizations including advanced data layout and code
restructuring optimizations to improve memory accesses for Intel processors.

Do not use this option if you are executing a program on a processor that
is not an Intel processor. If you use this option on a non-compatible processor
to compile the main program (in Fortran) or the function main() in C/C++, the
program will display a fatal run-time error if they are executed on unsupported
processors.

Padding the size of certain power-of-two arrays to allow
more efficient cache use.

On IA-32 and Intel EM64T processors, when O3 is used with options
-ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler
performs more aggressive data dependency analysis than for O2, which
may result in longer compilation times.
The O3 optimizations may not cause higher performance unless loop and
memory access transformations take place. The optimizations may slow
down code in some cases compared to O2 optimizations.
The O3 option is recommended for applications that have loops that heavily
use floating-point calculations and process large data sets.

-no-prec-div enables optimizations that give slightly less precise results
than full IEEE division.

When you specify -no-prec-div along with some optimizations, such as
-xN and -xB (Linux) or /QxN and /QxB (Windows),
the compiler may change floating-point division computations into
multiplication by the reciprocal of the denominator.
For example, A/B is computed as A * (1/B) to improve the speed of the
computation.

However, sometimes the value produced by this transformation is
not as accurate as full IEEE division. When it is important to have fully
precise IEEE division, do not use -no-prec-div.
This will enable the default -prec-div and the result will be more accurate,
with some loss of performance.

Code is optimized for Intel(R) processors with support for AVX2 instructions.
The resulting code may contain unconditional use of features that are not supported
on other processors. This option also enables new optimizations in addition to
Intel processor-specific optimizations including advanced data layout and code
restructuring optimizations to improve memory accesses for Intel processors.

Do not use this option if you are executing a program on a processor that
is not an Intel processor. If you use this option on a non-compatible processor
to compile the main program (in Fortran) or the function main() in C/C++, the
program will display a fatal run-time error if they are executed on unsupported
processors.

Padding the size of certain power-of-two arrays to allow
more efficient cache use.

On IA-32 and Intel EM64T processors, when O3 is used with options
-ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler
performs more aggressive data dependency analysis than for O2, which
may result in longer compilation times.
The O3 optimizations may not cause higher performance unless loop and
memory access transformations take place. The optimizations may slow
down code in some cases compared to O2 optimizations.
The O3 option is recommended for applications that have loops that heavily
use floating-point calculations and process large data sets.

-no-prec-div enables optimizations that give slightly less precise results
than full IEEE division.

When you specify -no-prec-div along with some optimizations, such as
-xN and -xB (Linux) or /QxN and /QxB (Windows),
the compiler may change floating-point division computations into
multiplication by the reciprocal of the denominator.
For example, A/B is computed as A * (1/B) to improve the speed of the
computation.

However, sometimes the value produced by this transformation is
not as accurate as full IEEE division. When it is important to have fully
precise IEEE division, do not use -no-prec-div.
This will enable the default -prec-div and the result will be more accurate,
with some loss of performance.

Option standard-realloc-lhs (the default), tells the compiler that when the left-hand side of an assignment is an allocatable object, it should be reallocated to the shape of the
right-hand side of the assignment before the assignment occurs. This is the current Fortran Standard definition. This feature may cause extra overhead at run time. This option has
the same effect as option assume realloc_lhs.

If you specify nostandard-realloc-lhs, the compiler uses the old Fortran 2003 rules when interpreting assignment statements. The left-hand side is assumed to be allocated with the
correct shape to hold the right-hand side. If it is not, incorrect behavior will occur. This option has the same effect as option assume norealloc_lhs.

The align toggle changes how data elements are aligned. Variables and arrays are analyzed and memory layout can be altered. Specifying array32byte will look for opportunities to transform and reailgn arrays to 32byte boundaries.

When running multiple copies of benchmarks, the SPEC config file feature submit is used to cause individual jobs to be bound to
specific processors. This specific submit command, using taskset, is used for Linux64 systems without numactl.
Here is a brief guide to understanding the specific command which will be found in the config file:

/usr/bin/taskset [options] [mask] [pid | command [arg] ... ]:
taskset is used to set or retreive the CPU affinity of a running
process given its PID or to launch a new COMMAND with a given CPU
affinity. The CPU affinity is represented as a bitmask, with the
lowest order bit corresponding to the first logical CPU and highest
order bit corresponding to the last logical CPU. When the taskset
returns, it is guaranteed that the given program has been scheduled
to a specific, legal CPU, as defined by the mask setting.

[mask]: The bitmask (in hexadecimal) corresponding to a specific
SPECCOPYNUM. The specific example above, computes this mask value in the variable $MYMASK.
The value of this mask for the first copy of a
rate run will be 0x00000001, for the second copy of the rate will
be 0x00000002 etc. Thus, the first copy of the rate run will have a
CPU affinity of CPU0, the second copy will have the affinity CPU1
etc.

$command: Program to be started, in this case, the benchmark instance to be started.

submit= numactl --localalloc --physcpubind=$SPECCOPYNUM $command

When running multiple copies of benchmarks, the SPEC config file feature submit is used to cause individual jobs to be bound to
specific processors. This specific submit command is used for Linux64 systems with support for numactl.
Here is a brief guide to understanding the specific command which will be found in the config file:

Launching a process with numactl --interleave=all sets the memory interleave policy so that memory will be allocated using round robin on nodes.
When memory cannot be allocated on the current interleave target fall back to other nodes.

KMP_STACKSIZE

Specify stack size to be allocated for each thread.

KMP_AFFINITY

Syntax: KMP_AFFINITY=[<modifier>,...]<type>[,<permute>][,<offset>]
The value for the environment variable KMP_AFFINITY affects how the threads from an auto-parallelized program are scheduled across processors.
It applies to binaries built with -qopenmp and -parallel (Linux and Mac OS X) or /Qopenmp and /Qparallel (Windows).
modifier:granularity=fine Causes each OpenMP thread to be bound to a single thread context.
type:compact Specifying compact assigns the OpenMP thread <n>+1 to a free thread context as close as possible to the thread context where the <n> OpenMP thread was placed.scatter Specifying scatter distributes the threads as evenly as possible across the entire system.
permute: The permute specifier is an integer value controls which levels are most significant when sorting the machine topology map. A value for permute forces the mappings to make the specified number of most significant levels of the sort the least significant, and it inverts the order of significance.
offset: The offset specifier indicates the starting position for thread assignment.

Please see the Thread Affinity Interface article in the Intel Composer XE Documentation for more details.

Example: KMP_AFFINITY=granularity=fine,scatter
Specifying granularity=fine selects the finest granularity level and causes each OpenMP or auto-par thread to be bound to a single thread context.
This ensures that there is only one thread per core on cores supporting HyperThreading Technology
Specifying scatter distributes the threads as evenly as possible across the entire system.
Hence a combination of these two options, will spread the threads evenly across sockets, with one thread per physical core.

Example: KMP_AFFINITY=compact,1,0
Specifying compact will assign the n+1 thread to a free thread context as close as possible to thread n.
A default granularity=core is implied if no granularity is explicitly specified.
Specifying 1,0 sets permute and offset values of the thread assignment.
With a permute value of 1, thread n+1 is assigned to a consecutive core. With an offset of 0, the process's first thread 0 will be assigned to thread 0.
The same behavior is exhibited in a multisocket system.

OMP_NUM_THREADS

Sets the maximum number of threads to use for OpenMP* parallel regions if no
other value is specified in the application. This environment variable
applies to both -qopenmp and -parallel (Linux and Mac OS X) or /Qopenmp and /Qparallel (Windows).
Example syntax on a Linux system with 8 cores:
export OMP_NUM_THREADS=8

Set stack size to unlimited

The command "ulimit -s unlimited" is used to set the stack size limit to unlimited.

Free the file system page cache

The command "echo 1> /proc/sys/vm/drop_caches" is used to free up the filesystem page cache.

Red Hat Specific features

Transparent Huge Pages

On RedHat EL 6 and later, Transparent Hugepages increase the memory page size from 4 kilobytes to 2 megabytes. Transparent Hugepages provide significant performance advantages on systems with highly contended resources and large memory workloads.
If memory utilization is too high or memory is badly fragmented which prevents hugepages being allocated, the kernel will assign smaller 4k pages instead.
Hugepages are used by default unless the /sys/kernel/mm/redhat_transparent_hugepage/enabled field is changed from its RedHat EL6 default of 'always'.

C States allow the processor to enter lower power states when idle. When set to Enabled (OS controlled)
or when set to Autonomous (if Hardware controlled is supported), the processor can operate in all
available Power States to save power, but my increase memory latency and frequency jitter.

C1E:

When set to Enabled, the processor is allowed to switch to minimum performance state when idle.

CPU Interconnect Bus Link Power Management:

When enabled, CPU interconnect bus link power management can reduce overall system power a
bit while slightly reducing system performance.

CPU Performance:

Maximum Performance is typically selected for performance-centric workloads where it
is acceptable to consume additional power to achieve the highest possible performance
for the computing environment. This mode drives processor frequency to the maximum
across all cores (although idled cores can still be frequency reduced by C-state
enforcement through BIOS or OS mechanisms if enabled). This mode also offers the
lowest latency of the CPU Power Management Mode options, so is always preferred.

Energy Efficient Policy:

The CPU uses the setting to manipulate the internal behavior of the processor and determines
whether to target higher performance or better power savings.

Energy Efficient Turbo:

Enables or disables the Energy Efficient Turbo.
Energy Efficient Turbo (EET) is a mode of operation where a processor's core frequency is adjusted within the turbo range based on workload.

Logical Processor:

Each processor core supports up to two logical processors. When set to Enabled, the BIOS
reports all logical processors. When set to Disabled, the BIOS only reports one
logical processor per core. Generally, higher processor count results in increased
performance for most multi-threaded workloads and the recommendation is to keep this enabled.
However, there are some floating point/scientific workloads, including HPC workloads, where
disabling this feature may result in higher performance.

Memory Patrol Scrub:

Patrol Scrubbing searches the memory for errors and repairs correctable errors to prevent
the accumulation of memory errors. When set to Disabled, no patrol scrubbing will occur.
When set to Standard Mode, the entire memory array will be scrubbed once in a 24 hour period.
When set to Extended Mode, the entire memory array will be scrubbed more frequently to further
increase system reliability.

PCI ASPM L1 Link Power Management:

When enabled, PCIe Advanced State Power Management (ASPM) can reduce overall system power
a bit while slightly reducing system performance.

NOTE: Some devices may not perform properly (they may hang or cause the system to hang)
when ASPM is enable, for this reason L1 will only be enabled for validated qualified cards.

System Profile:

When set to Custom, you can change setting of each option. Under Custom mode when C state is enabled,
Monitor/Mwait should also be enabled.

Sub NUMA Cluster:

When ENABLED, Sub NUMA Clustering (SNC) is a feature for breaking up the LLC into disjoint clusters based on address range,
with each cluster bound to a subset of the memory controllers in the system.
It improves average latency to the LLC.

Uncore Frequency:

Selects the Processor Uncore Frequency.
Dynamic mode allows processor to optimize power resources across the cores and uncore during runtime.
The optimization of the uncore frequency to either save power or optimize performance is influenced
by the setting of the Energy Efficiency Policy.

Virtualization technology:

When set to Enabled, the BIOS will enable processor Virtualization features and provide the virtualization
support to the Operating System (OS) through the DMAR table. In general, only virtualized environments
such as VMware(r) ESX (tm), Microsoft Hyper-V(r) , Red Hat(r) KVM, and other virtualized operating systems
will take advantage of these features. Disabling this feature is not known to significantly alter the
performance or power characteristics of the system, so leaving this option Enabled is advised for most cases.

Flag description origin markings:

Indicates that the flag description came from the user flags file.

Indicates that the flag description came from the suite-wide flags file.

Indicates that the flag description came from a per-benchmark flags file.