ABINIT/m_sg2002 [ Modules ]

Copyright (C) 2002-2007 Stefan Goedecker, CEA Grenoble
Copyright (C) 2014-2018 ABINIT group (XG)
This file is distributed under the terms of the
GNU General Public License, see ~abinit/COPYING
or http://www.gnu.org/copyleft/gpl.txt .

m_sg2002/sg2002_accrho [ Functions ]

Accumulates the real space density rho from the ndat wavefunctions zf
by transforming zf into real space and adding all the amplitudes squared
INPUTS:
ZF: input array (note the switch of i2 and i3)
real(F(i1,i3,i2,idat))=ZF(1,i1,i3,i2,idat)
imag(F(i1,i3,i2,idat))=ZF(2,i1,i3,i2,idat)
max1 is positive or zero ; m1 >=max1+1
i1= 1... max1+1 corresponds to positive and zero wavevectors 0 ... max1
then, if m1 > max1+1, one has min1=max1-m1+1 and
i1= max1+2 ... m1 corresponds to negative wavevectors min1 ... -1
i2 and i3 have a similar definition of range
idat=1,ndat
md1,md2,md3: Dimension of ZF
md2proc=((md2-1)/nproc_fft)+1 ! maximal number of small box 2nd dim slices for one proc
weight(ndat)= weight for the density accumulation
OUTPUTS:
RHOoutput(i1,i2,i3) = RHOinput(i1,i2,i3) + sum on idat of (Re(FFT(ZF))**2 *weight_r + weight_i*Im(FFT(ZF))**2
i1=1,n1 , i2=1,n2 , i3=1,n3
comm_fft: MPI communicator
nproc_fft: number of processors used as returned by MPI_COMM_SIZE
me_fft: [0:nproc_fft-1] number of processor as returned by MPI_COMM_RANK
n1,n2,n3: logical dimension of the transform. As transform lengths
most products of the prime factors 2,3,5 are allowed.
The detailed table with allowed transform lengths can
be found in subroutine CTRIG
nd1,nd2,nd3: Dimension of RHO
nd3proc=((nd3-1)/nproc_fft)+1 ! maximal number of big box 3rd dim slices for one proc
NOTES:
PERFORMANCE CONSIDERATIONS:
The maximum number of processors that can reasonably be used is max(n2/2,n3/2)
It is very important to find the optimal
value of NCACHE. NCACHE determines the size of the work array ZW, that
has to fit into cache. It has therefore to be chosen to equal roughly
half the size of the physical cache in units of real*8 numbers.
The optimal value of ncache can easily be determined by numerical
experimentation. A too large value of ncache leads to a dramatic
and sudden decrease of performance, a too small value to a to a
slow and less dramatic decrease of performance. If NCACHE is set
to a value so small, that not even a single one dimensional transform
can be done in the workarray zw, the program stops with an error message.

m_sg2002/sg2002_applypot [ Functions ]

Applies the local real space potential to multiple wavefunctions in Fourier space

INPUTS

ZF: Wavefunction (input/output) (note the switch of i2 and i3)
real(F(i1,i3,i2,idat))=ZF(1,i1,i3,i2,idat)
imag(F(i1,i3,i2,idat))=ZF(2,i1,i3,i2,idat)
max1 is positive or zero ; m1 >=max1+1
i1= 1... max1+1 corresponds to positive and zero wavevectors 0 ... max1
then, if m1 > max1+1, one has min1=max1-m1+1 and
i1= max1+2 ... m1 corresponds to negative wavevectors min1 ... -1
i2 and i3 have a similar definition of range
idat=1,ndat
md1,md2,md3: Dimension of ZF (input as well as output), distributed on different procs
md2proc=((md2-1)/nproc_fft)+1 maximal number of small box 2nd dim slices for one proc
POT: Potential
POT(cplex*i1,i2,i3)
cplex=1 or 2 , i1=1,n1 , i2=1,n2 , i3=1,n3
nd1,nd2,nd3: dimension of pot
comm_fft: MPI communicator
nproc_fft: number of processors used as returned by MPI_COMM_SIZE
me_fft: [0:nproc_fft-1] number of processor as returned by MPI_COMM_RANK
n1,n2,n3: logical dimension of the transform. As transform lengths
most products of the prime factors 2,3,5 are allowed.
The detailed table with allowed transform lengths can
be found in subroutine CTRIG
NOTES:
PERFORMANCE CONSIDERATIONS:
The maximum number of processors that can reasonably be used is max(n2/2,n3/2)
It is very important to find the optimal
value of NCACHE. NCACHE determines the size of the work array ZW, that
has to fit into cache. It has therefore to be chosen to equal roughly
half the size of the physical cache in units of real*8 numbers.
The optimal value of ncache can easily be determined by numerical
experimentation. A too large value of ncache leads to a dramatic
and sudden decrease of performance, a too small value to a to a
slow and less dramatic decrease of performance. If NCACHE is set
to a value so small, that not even a single one dimensional transform
can be done in the workarray zw, the program stops with an error message.

m_sg2002/sg2002_applypot_many [ Functions ]

Applies the local real space potential to multiple wavefunctions in Fourier space

INPUTS

ZF: Wavefunction (input/output) (note the switch of i2 and i3)
real(F(i1,i3,i2,idat))=ZF(1,i1,i3,i2,idat)
imag(F(i1,i3,i2,idat))=ZF(2,i1,i3,i2,idat)
max1 is positive or zero ; m1 >=max1+1
i1= 1... max1+1 corresponds to positive and zero wavevectors 0 ... max1
then, if m1 > max1+1, one has min1=max1-m1+1 and
i1= max1+2 ... m1 corresponds to negative wavevectors min1 ... -1
i2 and i3 have a similar definition of range
idat=1,ndat
md1,md2,md3: Dimension of ZF (input as well as output), distributed on different procs
md2proc=((md2-1)/nproc_fft)+1 maximal number of small box 2nd dim slices for one proc
POT: Potential
POT(cplex*i1,i2,i3)
cplex=1 or 2 , i1=1,n1 , i2=1,n2 , i3=1,n3
nd1,nd2,nd3: dimension of pot
comm_fft: MPI communicator
nproc_fft: number of processors used as returned by MPI_COMM_SIZE
me_fft: [0:nproc_fft-1] number of processor as returned by MPI_COMM_RANK
n1,n2,n3: logical dimension of the transform. As transform lengths
most products of the prime factors 2,3,5 are allowed.
The detailed table with allowed transform lengths can
be found in subroutine CTRIG
NOTES:
PERFORMANCE CONSIDERATIONS:
The maximum number of processors that can reasonably be used is max(n2/2,n3/2)
It is very important to find the optimal
value of NCACHE. NCACHE determines the size of the work array ZW, that
has to fit into cache. It has therefore to be chosen to equal roughly
half the size of the physical cache in units of real*8 numbers.
The optimal value of ncache can easily be determined by numerical
experimentation. A too large value of ncache leads to a dramatic
and sudden decrease of performance, a too small value to a to a
slow and less dramatic decrease of performance. If NCACHE is set
to a value so small, that not even a single one dimensional transform
can be done in the workarray zw, the program stops with an error message.

m_sg2002/sg2002_back [ Functions ]

CALCULATES THE DISCRETE FOURIER TRANSFORM in parallel using MPI/OpenMP
ZR(I1,I2,I3)= \sum_(j1,j2,j3) EXP(isign*i*2*pi*(j1*i1/n1+j2*i2/n2+j3*i3/n3)) ZF(j1,j3,j2)
Adopt standard convention that isign=1 for backward transform
INPUTS:
cplex=1 for real --> complex, 2 for complex --> complex
ZF: input array in G-space (note the switch of i2 and i3)
real(F(i1,i3,i2,idat))=ZF(1,i1,i3,i2,idat)
imag(F(i1,i3,i2,idat))=ZF(2,i1,i3,i2,idat)
i1=1,n1 , i2=1,n2 , i3=1,n3 , idat=1,ndat
OUTPUTS:
ZR: output array in R space.
ZR(1,i1,i2,i3,idat)=real(R(i1,i2,i3,idat))
ZR(2,i1,i2,i3,idat)=imag(R(i1,i2,i3,idat))
i1=1,n1 , i2=1,n2 , i3=1,n3 , idat=1,ndat
nproc_fft: number of processors used as returned by MPI_COMM_SIZE
me_fft: [0:nproc_fft-1] number of processor as returned by MPI_COMM_RANK
n1,n2,n3: logical dimension of the transform. As transform lengths
most products of the prime factors 2,3,5 are allowed.
The detailed table with allowed transform lengths can
be found in subroutine CTRIG
nd1,nd2,nd3: Dimension of ZF and ZR
nd2proc=((nd2-1)/nproc_fft)+1 maximal number of 2nd dim slices
nd3proc=((nd3-1)/nproc_fft)+1 maximal number of 3rd dim slices
NOTES:
The maximum number of processors that can reasonably be used is max(n2,n3)
It is very important to find the optimal
value of NCACHE. NCACHE determines the size of the work array ZW, that
has to fit into cache. It has therefore to be chosen to equal roughly
half the size of the physical cache in units of real*8 numbers.
The optimal value of ncache can easily be determined by numerical
experimentation. A too large value of ncache leads to a dramatic
and sudden decrease of performance, a too small value to a to a
slow and less dramatic decrease of performance. If NCACHE is set
to a value so small, that not even a single one dimensional transform
can be done in the workarray zw, the program stops with an error message.

ZR: input array
ZR(1,i1,i2,i3,idat)=real(R(i1,i2,i3,idat))
ZR(2,i1,i2,i3,idat)=imag(R(i1,i2,i3,idat))
i1=1,n1 , i2=1,n2 , i3=1,n3 , idat=1,ndat
OUTPUTS
ZF: output array (note the switch of i2 and i3)
real(F(i1,i3,i2,idat))=ZF(1,i1,i3,i2,idat)
imag(F(i1,i3,i2,idat))=ZF(2,i1,i3,i2,idat)
i1=1,n1 , i2=1,n2 , i3=1,n3 , idat=1,ndat
nproc_fft: number of processors used as returned by MPI_COMM_SIZE
me_fft: [0:nproc_fft-1] number of processor as returned by MPI_COMM_RANK
n1,n2,n3: logical dimension of the transform. As transform lengths
most products of the prime factors 2,3,5 are allowed.
The detailed table with allowed transform lengths can
be found in subroutine CTRIG
nd1,nd2,nd3: Dimension of ZR and ZF
nd2proc=((nd2-1)/nproc_fft)+1 maximal number of 2nd dim slices
nd3proc=((nd3-1)/nproc_fft)+1 maximal number of 3rd dim slices

NOTES

SHOULD describe nd1eff
SHOULD put cplex and nd1eff in OMP declarations
SHOULD describe the change of value of nd2prod
The maximum number of processors that can reasonably be used is max(n2,n3)
It is very important to find the optimal
value of NCACHE. NCACHE determines the size of the work array ZW, that
has to fit into cache. It has therefore to be chosen to equal roughly
half the size of the physical cache in units of real*8 numbers.
The optimal value of ncache can easily be determined by numerical
experimentation. A too large value of ncache leads to a dramatic
and sudden decrease of performance, a too small value to a to a
slow and less dramatic decrease of performance. If NCACHE is set
to a value so small, that not even a single one dimensional transform
can be done in the workarray zw, the program stops with an error message.

m_sg2002/sg2002_mpiback_wf [ Functions ]

Does multiple 3-dim backward FFTs from Fourier into real space
Adopt standard convention that isign=1 for backward transform
CALCULATES THE DISCRETE FOURIER TRANSFORM ZF(I1,I2,I3)=
S_(j1,j2,j3) EXP(isign*i*2*pi*(j1*i1/n1+j2*i2/n2+j3*i3/n3)) ZF(j1,j3,j2)
in parallel using MPI/OpenMP.
INPUTS:
icplexwf=1 if wavefunction is real, 2 if complex
ndat=Number of wavefunctions to transform.
n1,n2,n3: logical dimension of the transform. As transform lengths
most products of the prime factors 2,3,5 are allowed.
The detailed table with allowed transform lengths can be found in subroutine CTRIG
nd1,nd2,nd3: Leading Dimension of ZR
nd3proc=((nd3-1)/nproc_fft)+1 maximal number of big box 3rd dim slices for one proc
max1 is positive or zero; m1 >=max1+1
i1= 1... max1+1 corresponds to positive and zero wavevectors 0 ... max1
then, if m1 > max1+1, one has min1=max1-m1+1 and
i1= max1+2 ... m1 corresponds to negative wavevectors min1 ... -1
max2 and max3 have a similar definition of range
m1,m2,m3=Size of the box enclosing the G-sphere.
md1,md2,md3: Dimension of ZF given on the **small** FFT box.
md2proc=((md2-1)/nproc_fft)+1 maximal number of small box 2nd dim slices for one proc
nproc_fft: number of processors used as returned by MPI_COMM_SIZE
comm_fft=MPI communicator for the FFT.
ZF: input array (note the switch of i2 and i3)
real(F(i1,i3,i2,idat))=ZF(1,i1,i3,i2,idat)
imag(F(i1,i3,i2,idat))=ZF(2,i1,i3,i2,idat)
OUTPUTS
ZR: output array
ZR(1,i1,i2,i3,idat)=real(R(i1,i2,i3,idat))
ZR(2,i1,i2,i3,idat)=imag(R(i1,i2,i3,idat))
i1=1,n1 , i2=1,n2 , i3=1,n3 , idat=1,ndat

NOTES

The maximum number of processors that can reasonably be used is max(n2/2,n3/2)
It is very important to find the optimal
value of NCACHE. NCACHE determines the size of the work array ZW, that
has to fit into cache. It has therefore to be chosen to equal roughly
half the size of the physical cache in units of real*8 numbers.
The optimal value of ncache can easily be determined by numerical
experimentation. A too large value of ncache leads to a dramatic
and sudden decrease of performance, a too small value to a to a
slow and less dramatic decrease of performance. If NCACHE is set
to a value so small, that not even a single one dimensional transform
can be done in the workarray zw, the program stops with an error message.

m_sg2002/sg2002_mpiforw_wf [ Functions ]

Does multiple 3-dim backward FFTs from real into Fourier space
Adopt standard convention that isign=-1 for forward transform
CALCULATES THE DISCRETE FOURIERTRANSFORM
ZF(I1,I3,I2)=S_(j1,j2,j3) EXP(isign*i*2*pi*(j1*i1/n1+j2*i2/n2+j3*i3/n3)) ZR(j1,j2,j3)
in parallel using MPI/OpenMP.
INPUT:
ZR: input array
ZR(1,i1,i2,i3,idat)=real(R(i1,i2,i3,idat))
ZR(2,i1,i2,i3,idat)=imag(R(i1,i2,i3,idat))
i1=1,n1 , i2=1,n2 , i3=1,n3 , idat=1,ndat
NOTE that ZR is changed by the routine
n1,n2,n3: logical dimension of the transform. As transform lengths
most products of the prime factors 2,3,5 are allowed.
The detailed table with allowed transform lengths can
be found in subroutine CTRIG
nd1,nd2,nd3: Dimension of ZR
nd3proc=((nd3-1)/nproc_fft)+1 maximal number of big box 3rd dim slices for one proc
OUTPUT:
ZF: output array (note the switch of i2 and i3)
real(F(i1,i3,i2,idat))=ZF(1,i1,i3,i2,idat)
imag(F(i1,i3,i2,idat))=ZF(2,i1,i3,i2,idat)
max1 is positive or zero ; m1 >=max1+1
i1= 1... max1+1 corresponds to positive and zero wavevectors 0 ... max1
then, if m1 > max1+1, one has min1=max1-m1+1 and
i1= max1+2 ... m1 corresponds to negative wavevectors min1 ... -1
i2 and i3 have a similar definition of range
idat=1,ndat
md1,md2,md3: Dimension of ZF
md2proc=((md2-1)/nproc_fft)+1 maximal number of small box 2nd dim slices for one proc
nproc_fft: number of processors used as returned by MPI_COMM_SIZE
me_fft: [0:nproc-1] rank of the processor in the FFT communicator.
comm_fft=MPI communicator for parallel FFT.

NOTES

The maximum number of processors that can reasonably be used is max(n2/2,n3/2)
It is very important to find the optimal
value of NCACHE. NCACHE determines the size of the work array ZW, that
has to fit into cache. It has therefore to be chosen to equal roughly
half the size of the physical cache in units of real*8 numbers.
The optimal value of ncache can easily be determined by numerical
experimentation. A too large value of ncache leads to a dramatic
and sudden decrease of performance, a too small value to a to a
slow and less dramatic decrease of performance. If NCACHE is set
to a value so small, that not even a single one dimensional transform
can be done in the workarray zw, the program stops with an error message.

m_sg2002/sg2002_mpifourdp [ Functions ]

Conduct Fourier transform of REAL or COMPLEX function f(r)=fofr defined on
fft grid in real space, to create complex f(G)=fofg defined on full fft grid
in reciprocal space, in full storage mode, or the reverse operation.
For the reverse operation, the final data is divided by nfftot.
REAL case when cplex=1, COMPLEX case when cplex=2
Usually used for density and potentials.

INPUTS

cplex=1 if fofr is real, 2 if fofr is complex
nfft=(effective) number of FFT grid points (for this processor)
ngfft(18)=contain all needed information about 3D FFT, see ~abinit/doc/variables/vargs.htm#ngfft
ndat=Numbre of FFT transforms
isign=sign of Fourier transform exponent: current convention uses
+1 for transforming from G to r
-1 for transforming from r to G.
fftn2_distrib(2),ffti2_local(2)
fftn3_distrib(3),ffti3_local(3)
comm_fft=MPI communicator