What is a GDG?

GDG stands for Generation Data Group and is a method used on the mainframe to allow a group of related files to be created that can be referenced individually or as a group.

To define a GDG you must first create a GDG base. This base defines the base portion of the dataset names used by the GDGs as well as defining how many generations (files) can be stored within the GDG. Once this maximum number of generations is reached, creating a new generation will result in the oldest generation being discarded. As many as 255 generations can be held in a GDG.

The files within a GDG are assigned names derived from the name of the base and have a generation number added to the end so as to give each file an unique name. For example if the GDG base is called PROD.ACCOUNTS.INPUT then the first file created within this GDG will be given the name PROD.ACCOUNTS.INPUT.G0001V00 and subsequent files will be named by incrementing the generation number giving filenames ending in G0002V00, G0003V00, etc. all the way to G9999V00 after which the numbering will start again from G0001V00. The two zeroes on the end of the name used to represent a volume number which is used when the file is stored on media requiring the use of multiple volumes. Since multi-volume storage is now rarely required the last two digits are now usually used for version control instead.

A file within a GDG may be referenced using this external name but they are more usually referenced using relative generation numbers. Generation 0 represents the current generation within the GDG at the time that the current job began execution [eg. PROD.ACCOUNTS.INPUT(0)] while -1 represents the immediately previous version [eg. PROD.ACCOUNTS.INPUT(-1)]. Even earlier versions can be referenced using -2, -3 etc. up to the maximum number of generations held within the GDG. If a new generation is added to the GDG during execution of the job then it will be the +1 generation. Referencing the files within a GDG by their relative names means that the actual filename does not need to be known.

It is also possible to reference the entire GDG as if it were a single file by using the base name instead of an absolute or relative file reference. All files within the GDG will then be processed starting from the oldest generation and ending with the latest one. Note that to be able to reference all of the generations at once in this way that all of the files need to be defined with the common Data Control Block information eg. the same record length. If the files within a GDG are never to be referenced this way then there is no requirement that they have any file attributes in common.

You can use the number on the end of the filename to create a replacement dataset within a GDG without overwriting the original file. For example if PROD.ACCOUNTS.INPUT.G0013V00 exists and you create PROD.ACCOUNTS.INPUT.G0013V01 then both files will exist with only the last one created being attached to the GDG. This is effectively the same as creating two ordinary datasets except that one of them also belongs to a GDG. Of course if you then delete and recreate PROD.ACCOUNTS.INPUT.G0013V00 then this will be the one belonging to the GDG (since it was the last one created) so you can't rely on the volume number to tell you which one belongs to the GDG. With a properly created multi-volume GDG (that is one created using a relative record number rather than a full name) all of the volumes belong to the GDG and the full name including the volume number is only used internally to GDG processing when creating the volumes in the first place and you only see it when you are reading the dataset back in and the system needs to ask for you to mount the next volume.