rcf Documentation

rcf is our data format to store and
analyze code clone data. It defines an extendible schema for code clone
data and provides a Java API that eases the development of analysis
code. The main ideas behind rcf:

Use standards:rcf defines a data model that covers the most
common entites reported by clone detectors. Among other entites it handles
clone pairs, clone classes (also often referred to as clone groups), fragments
(the cloned code areas). The clone data of multiple versions of a system
can be stored in one rcf file.
The predefined model works with our viewer application
cyclone.

Extend to your needs:
You propably need to store some specific data along with your clone data that
is not covered by the predefined data model. This requirement was one major
reason why we developed rcf. The predefined model
can be extended so that rcf stores additional data
you want to anntotate to your clones. Extensions will not break the compatibility
to the cyclone viewer.

Focus on your objective:
Analyzing clone data usually means to handle I/O, write a parser for
your clone report, define a data model, write analysis code, collect
further data etc. rcf was designed to do most
of these things so that you can focus on your objective. Once your data
has been converted to rcf—many
converters are included in the distribution—you can access your
data in an object-oriented fashion, without worrying with technical
issues. The conventions of rcf are implemented
into the API, so that you do not need to care about adding the data
the right way—rcf will help you with this.
You can also use our cyclone that reads
rcf clone data.

How rcf organizes clone data

Each rcf file is initialized with a predefined
schema, which models common aspects of code clones. We call this the
core schema. The following UML diagram shows a simplified version
this schema.

Each entity has an id attribute which serves as a unique key.
The clientId attributes can be used for own
id information. A Version has a basepath. All directories
contain their relative path to this basepath.

Extending the schema

The core schema can be extended as needed. Conceptionally rcf consists of three
different elements:

Relations

These model the entites that can be stored in an
rcf.
CloneClass is one example for such a relation.
You can think of a relation as a list of Entries.

Attributes

Relations have attributes, that can store values of a defined
type. In the core schema type is an attribute of
the relation CloneClass with the type int

.

Entries

An entry is one instance in an relation. While the relation
CloneClass describes the concept of a clone class, an
entry of that relation denotes one specific clone class. An entry
defines a value for each attribute defined by the relaiton's schema.

In rcf it is possible to add arbitrary
attributes and even relations. Attribute values can be of the primitive
types int, float, boolean and
String or reference another entry of any relation. An attribute
can hold a scalar value or a list of values of the aforementioned types.

The schema is stored in the rcf file itself.
This means that no schema definition or interfaces must be provided
togehter with an rcf file that contains an
extended schema.

Using the Java API

rcf can be accessed and modifed via our Java
library (see downloads section). The library implements classes and
access functions for all relations and attributes defined in the core
schema. It also provides generic functions to access relations and
attributes that were added by the user. See the API
docs for a complete reference.

The following example shows how an existing rcf
can be used to calculate the average token count of all fragments
over all versions.

Each Relation in the core schema is represented by its own class in
the API. The call to rcf.getVersions() returns an Object of
the type Versions which represents the version relation
. All relation objects are iterable. Using
for(Version v: rcf.getVersions()) will iterate over all
versions. Objects of the type Version represent one entry
of the version relation, that is one concrete version.

Accessing the attributes is straightforward. For all attributes in the
core schema get and set methods exist (e.g. f.getNumTokens()).
It might not always be suiteable to set every attribute for every entry.
Therefore attribute values can be unset. The attempt to access
an unset attribute value will raise a ValueNotSetException
which is a RuntimeException (which means that it does not have
to be explicitly caught). If it is not clear if an attribute value is
set for every entry it is possible to either catch the exception or to
use a variant of the get method, which takes a default value. In our
example: public int getNumTokens(int default). This will return
the given default value if the numTokens attribute is not set.

Besides holding scalar values, attributes can also contain lists of
values of the same type. The call of v.getCloneClasses()
accesses the list attribute cloneclasses of the
Version entry v. This returns a
List<CloneClass>.

Loading a RCF file

The second line will select a suitable PersistenceManager to load the
given file. The thrid line loads the file.

Adding & changing data

Most relations provide convenience methods to add new entries. You should use
these, beacuse they will ensure that your data complies with the
rcf conventions. The methods have the prefix add*.
For instance, to add a fragment it is necessary to find out if its file and
directory have been added to the rcf before,
SourcePositions need to be created and linked from the fragment etc.
All these things will be done automatically by calling Fragments.addFragment():

Nevertheless you can add entries manually using the relation's append() method.
To add a new entry to a relation, call the relation's append() method.
This will return a newly created entry with all values unset. To set
values call the set methods. The following example shows how a new Version is
added manually.

In general all relations, attributes and entries can be handled using
the generic class types Relation<Entry>, Attribute
and Entry. Accessing attribute values requires one to pass
the attribute as an Attribute object or as the attribute's
name. For relations and attributes that extend the core schema, these
generic classes must be used.