Make a copy of 'properties' and edit for your desired generation parameters. This includes specifying your subject system and source repository.

Execute forksim: " ./forksim myProperties myOutputDirectory "

Wait while forksim processes, this may take some time!

Your forks, and the generation log, are located in "myOutputDirectory".

Generation Parameters

system - Path to subject system to use as a base of the generated forks.

repository - Path to a collection of systems to extract source artifacts from for injection.

language - The language of the dataset to generate. Can be Java, C or C#.

numForks - The number of forks to generate.

numFiles - The number of files to inject into the forks. Must be >= 0.

numDirectories - The number of directories to inject into the forks. Must be >= 0.

numFragments - The number of functions to inject into the forks. Must be >= 0.

functionFragmentMinSize - Minimum size of functions to inject. Must be >= 1.

functionFragmentMaxSize - Maximum size of functions to inject. Must be >= functionFragmentMinSize.

maxInjectNum - The maximum number of forks to inject a particular function/file/directory into. Must be >= 1, but no greater than 'numForks',

injectionReptitionRate - Probability that the same injection location is used for all forks a particular function/file/directory is injected into. Specified as % (0-100).

fragmentMutationRate - Probability that a function is mutated before injection.

fileMutationRate - Probability that a file is mutated before injection.

dirMutationRate - Probability that a directory is mutated (the files it contains) before injection.

fileRenameRate - Probability that a file is renamed before injection (including files within an injected directory).

dirRenameRate - Probability that a directory is renamed before injection.

maxFileEdit - Maximum number of edits a file mutation will performed. Expressed as ratio of the size of the file to mutate. Specify as a % (0-100). 0% is interrupted as a maximum of 1 edit.

maxFunctionEdit - Maximum number of edits a function mutation will performed. Expressed as ratio of the size of the file to mutate. Specify as a % (0-100). 0% is interrupted as a maximum of 1 edit.

mutationAttempts - The number of times a mutation is attempted (and fails) before giving up on this. Must be >= 1. Best to leave as default (10).

OutputForkSim outputs a directory containing the generated dataset. A generated dataset of 5 forks would have the following structure:output/ 0/ ## Generated Fork 0 1/ ## Generated Fork 1 2/ ## Generated Fork 2 3/ ## Generated Fork 3 4/ ## Generated Fork 4 dirs/ ## A copy of the directories injected into the forks, and their mutants files/ ## A copy of the files injected into the forks, and their mutants. function_fragments/ ## A copy of the functions injected into the forks, and their mutants. log ## The generation log. originalSystem/ ## A copy of the subject system used. sourceRepository/ ## A copy of the source repository used.

The dirs, files, and function_fragments directories contain a folder for each directory/file/function chosen for injection. This folder contains a file/folder "original" with the original state of the source artefact, and then a copy of the particular version (subject to mutation/renaming) of the file/folder that was injected into the chosen forks (named after the fork).

For example. The 5th file injection will be stored in output/files/5. It will contain a file "original", which is a copy of the original file. If this file was injected into forks 0, 2, and 3 (possibly with mutations), then files "0", "2" and "3" will be in this folder. Their content will be the version of the file injected into their respective forks.

Generation Log FormatLinked here is a sample log. It reports the generation parameters, than lists the injected files, directories, and functions in injection order.

File Injection LoggingThe following is an example of a logged file injection:

The first line is its header, which describes the selected file from the source directory, and the general injection parameters chosen. The tabbed lines describe each injection into the generated forks.

The header has the following generic format:

#FileInjection {U | V} #Injections OriginalFile

Where:

#FileInjection - The number of the file injection.

U or V - If the injection locations in the individual forks are uniform or varied.

#Injections - The number of forks this file was injected into.

OriginalFile - Path to the original file.

Each injection has the following generic format:

#Fork {O | R} {O | M mutator #edits #type} injectedFile

Where:

#Fork - The id of the fork injected into.

O or R - If the file kept its original name, or was renamed before injection.

O or M - If the file was kept in its original state or mutated before injection.

mutator - The mutator used for the mutation.

#edits - The number of times the mutator was applied.

#type - The clone type of the mutator (1, 2 or 3).

injectedFile - Path to the injected file.

Directory Injection LoggingPlease see above linked log for an example (too long to include here). However, it has the following tab format:

Header For Selected Directory for Injection Header For Particular Injection of this Directory Description of Each File Injection as Part of the Directory Injection

The first line is its header, which describes the selected directroy from the source repository, and the general injection parameters chosen. The tabbed lines are headers for each injection of the directory. The double tabbed lines describe each file injection due to each directory injection.

The header has the following generic format:

#DirectoryInjection {U | V} #injections OriginalDirectory

Where:

#DirectoryInjection - The number of the directory injection.

U or V - If the injection locations in the individual forks are uniform or varied.

#injections - The number of forks this directory was injected into.

OriginalDirectory - Path to the original directory chosen from the source repository.

The directory injection headers have the following generic format:

#Fork {O | R} InjectedDirectory

Where:

#Fork - The id of the fork injected into.

O or R - If the directory kept its original name, or was renamed before injection.

InjectedDirectory - Path to the injected directory.

Each file injection has the following generic format:

{O | R} {O | M mutator #edits #type} originalFile;injectedFile

Where:

O or R - If the file kept its original name or was renamed before injection.

O or M - If the file was kept in its original state or mutated before injection.

mutator - The mutator used for the mutation.

#edits - The number of times the mutator was applied.

#type - The clone type of the mutator (1, 2 or 3).

originalFile - Path of the original file (from the original directory).

injectedFile - Path to the injected file.

Function Injection LoggingThe following is an example of a logged function injection:

The first line is its header, which describes the selected function from the source repository, and the generation injection parameters chosen. The tabbed lines describe each injection into the generated forks.