GNU Astronomy Utilities

10.3.5.1 Generic data container (gal_data_t)

To be able to deal with any dataset (various dimensions, numeric data
types, units and higher-level structures), Gnuastro defines the
gal_data_t type which is the input/output container of choice for
many of Gnuastro library’s functions. It is defined in
gnuastro/data.h. If you will be using (‘# include’ing) those
libraries, you don’t need to include this header explicitly, it is already
included by any library header that uses gal_data_t.

Type (C struct): gal_data_t

The main container for datasets in Gnuastro. It can host data of any
dimensionality, with any numeric data type. It is actually a structure, but
typedef’d as a new type to avoid having to write the struct
before any declaration. The actual structure is shown below which is
followed by a description of each element.

This is the pointer to the main array of the dataset containing the raw
data (values). All the other elements in this data-structure are actually
meta-data enabling us to use/understand the series of values in this
array. It must allow data of any type (see Numeric data types), so it
is defined as a void * pointer. A void * array is not
directly usable in C, so you have to cast it to proper type before using
it, please see Library demo - reading a FITS image for a demonstration.

The restrict keyword was formally introduced in C99 and is used to
tell the compiler that at any moment only this pointer will modify what it
points to (a pixel in an image for example)132. This extra piece of
information can greatly help in compiler optimizations and thus the running
time of the program. But older compilers might not have this capability, so
at ./configure time, Gnuastro checks this feature and if the
user’s compiler doesn’t support restrict, it will be removed from
this definition.

The size of the dataset along each dimension. This is an array (with
ndim elements), of positive integers in row-major
order133 (based
on C). When a data file is read into memory with Gnuastro’s libraries, this
array is dynamically allocated based on the number of dimensions that the
dataset has.

It is important to remember that C’s row-major ordering is the opposite of
the FITS standard which is in column-major order: in the FITS standard the
fastest dimension’s size is specified by NAXIS1, and slower
dimensions follow. The FITS standard was defined mainly based on the
FORTRAN language which is the opposite of C’s approach to multi-dimensional
arrays (and also starts counting from 1 not 0). Hence if a FITS image has
NAXIS1==20 and NAXIS2==50, the dsize array must be
filled with dsize[0]==50 and dsize[1]==20.

The fastest dimension is the one that is contiguous in memory: to increment
by one along that dimension, just go to the next element in the array. As
we go to slower dimensions, the number of memory cells we have to skip for
an increment along that dimension becomes larger.

size_t size

The total number of elements in the dataset. This is actually a
multiplication of all the values in the dsize array, so it is not an
independent parameter. However, low-level operations with the dataset
(irrespective of its dimensionality) commonly need this number, so this
element is designed to avoid calculating it every time.

char *mmapname

Name of file hosting the mmap’d contents of array. If the
value of this variable is NULL, then the contents of array
are actually stored in RAM, not in a file on the HDD/SSD. See the
description of minmapsize below for more.

If a file is used, it will be kept in the hidden .gnuastro directory
with a randomly selected name to allow multiple arrays to be kept there at
the same time. When gal_data_free is called the randomly named file
will be deleted.

size_t minmapsize

The minimum size of an array (in bytes) to store the contents of
array as a file (on the non-volatile HDD/SSD), not in RAM. This can
be very useful for large datasets which can be very memory intensive and
the user’s hardware RAM might not be sufficient to keep/process it. A random
filename is assigned to the array which is available in the mmapname
element of gal_data_t (above), see there for more.

When this variable has a value of 0 (zero), any allocated
array will actually be in a file (not in RAM). When the value is
-1 (largest possible number in the unsigned types including
size_t) the array will be definitely allocated in RAM.

Please note that using a non-volatile file instead of RAM will
significantly increase the programs running time, especially on HDDs. So it
is best to give this option very large values (depending on how much memory
you will need for a given input). For example your processing might involve
a copy of the the input (possibly to a wider data type which takes more
bytes for each element), so take all such issues into
consideration. minmapsize is actually stored in each
gal_data_t, so it can be passed on to subsequent/derived datasets.

nwcs

The number of WCS coordinate representations (for WCSLIB).

struct wcsprm *wcs

The main WCSLIB structure keeping all the relevant information necessary
for WCSLIB to do its processing and convert data-set positions into
real-world positions. When it is given a NULL value, all possible
WCS calculations/measurements will be ignored.

uint8_t flag

Bit-wise flags to describe general properties of the dataset. The number of
bytes available in this flag is stored in the GAL_DATA_FLAG_SIZE
macro. Note that you should use bit-wise operators134 to check
these flags. The currently recognized bits are stored in these macros:

GAL_DATA_FLAG_BLANK_CH

Marking that the dataset has been checked for blank values. Therefore, the
value of the bit in GAL_DATA_FLAG_HASBLANK is reliable. Without
this bit, when a dataset doesn’t have any blank values (and this has been
checked), the GAL_DATA_FLAG_HASBLANK bit will be zero so a checker
has no way to know if this zero is real or if no check has been done yet.

GAL_DATA_FLAG_HASBLANK

This bit has a value of 1 when the given dataset has blank
values. If this bit is 0 and GAL_DATA_FLAG_BLANK_CH is
1, then the dataset has been checked and it didn’t have any blank
values, so there is no more need for further checks.

GAL_DATA_FLAG_SORT_CH

Marking that the dataset is already checked for being sorted or not and
thus that the possible 0 values in GAL_DATA_FLAG_SORTED_I and
GAL_DATA_FLAG_SORTED_D are meaningful.

GAL_DATA_FLAG_SORTED_I

This bit has a value of 1 when the given dataset is sorted in an
increasing manner. If this bit is 0 and GAL_DATA_FLAG_SORT_CH
is 1, then the dataset has been checked and wasn’t sorted
(increasing), so there is no more need for further checks.

GAL_DATA_FLAG_SORTED_D

This bit has a value of 1 when the given dataset is sorted in a
decreasing manner. If this bit is 0 and GAL_DATA_FLAG_SORT_CH
is 1, then the dataset has been checked and wasn’t sorted
(decreasing), so there is no more need for further checks.

The macro GAL_DATA_FLAG_MAXFLAG contains the largest internally used
bit-position. Higher-level flags can be defined with the bit-wise shift
operators using this macro to define internal flags for libraries/programs
that depend on Gnuastro without causing any possible conflict with the
internal flags discussed above or having to check the values manually on
every release.

int status

A context-specific status values for this data-structure. This integer will
not be set by Gnuastro’s libraries. You can use it keep some additional
information about the dataset (with integer constants) depending on your
applications.

char *name

The name of the dataset. If the dataset is a multi-dimensional array and
read/written as a FITS image, this will be the value in the EXTNAME
FITS keyword. If the dataset is a one-dimensional table column, this will
be the column name. If it is set to NULL (by default), it will be
ignored.

char *unit

The units of the dataset (for example BUNIT in the standard FITS
keywords) that will be read from or written to files/tables along with the
dataset. If it is set to NULL (by default), it will be ignored.

char *comment

Any further explanation about the dataset which will be written to any
output file if present.

disp_fmt

Format to use for printing each element of the dataset to a plain text
file, the acceptable values to this element are defined in Table input output (table.h). Based on C’s printf standards.

disp_width

Width of printing each element of the dataset to a plain text file, the
acceptable values to this element are defined in Table input output (table.h). Based on C’s printf standards.

disp_precision

Precision of printing each element of the dataset to a plain text file, the
acceptable values to this element are defined in Table input output (table.h). Based on C’s printf standards.

gal_data_t *next

Through this pointer, you can link a gal_data_t with other datasets
related datasets, for example the different columns in a dataset each have
one gal_data_t associate with them and they are linked to each other
using this element. There are several functions described below to
facilitate using gal_data_t as a linked list. See Linked lists (list.h)
for more on these wonderful high-level constructs.

gal_data_t *block

Pointer to the start of the complete allocated block of memory. When this
pointer is not NULL, the dataset is not treated as a contiguous
patch of memory. Rather, it is seen as covering only a portion of the
larger patch of memory that block points to. See Tessellation library (tile.h) for a more thorough explanation and functions to help work with
tiles that are created from this pointer.