Versioning datastore user guide

Introduction=
h1>=20

The versioning postigis datastore allows to version enable one or more f=
eature types so that all edits are versioned the same way as in a source co=
de version control system. In fact, much of the external interface i=
s inspired by Subversion operations, whilst the internal design is more in =
line with ArcSDE way of doing things. The datastore has been designed=
in such a way that once a feature type has been versioned normal operation=
s keep on working just like usual, that is, normal reads hit the last versi=
on, and normal writes do create new versions, so you can keep on using exac=
tly the same code until you need to access some of the extra datastore capa=
bilites, such as logging, diffing or rollbacks.

=20

Ba=
sic interaction with the datastore

=20

VersionedPostgisDataStore works against a Postgis database. You can atta=
ch a normal postgis datastore, and it will work just fine in non versioned =
mode. The only difference is that a few extra tables will be created, and a=
new feature type, changesets, will appear, allowing to access=
the changelogs as normal features.

=20

Changesets

=20

The changesets features are created at each transaction com=
mit against versioned feature types, and contain the following information:=

=20

=20

revision: the revision number of the change;

=20

author: a string that identifies the change author;

=
=20

message: the commit message;

=20

date: date and time the revision has been created at;

=
=20

bbox: the lat/lon bbox that has been interested by the cha=
nges, represented as a polygon (it's always a rectangle).

=20

=20

Version ena=
bling features

=20

A feature type can be version enabled by calling setVersioned(type=
Name, versioned, author, message with versioned=3Dtrue,=
provided the primary key of the feature type is supported: at the time of =
writing single string column, multiple columns and serial type are the only=
supported kind of primary keys. Author and message should be provide=
d for the changelog. The same command can be used to version disable a feat=
ure type.isVersioned(featureType) can be used to check =
whether a feature type is versioned or not.

=20

Once a feature type is versioned any write against it will be versioned.=

=20

Reading data

=20

Data reading against version enabled feature types can be performed as u=
sual, and it will return the latest version unless a Query is build by popu=
lating the version parameter.

=20

At the moment the only way to express the version is a revision number, =
but in future point in time extractions may be supported as well, and branc=
hes will have to be included in version specification once the datastore aq=
uires branching capabilities:

Writing

=20

Writing again version enabled features can be performed as usual, and it=
will create new versions transparently, with two a drawback thought: no au=
thor and commit message will be included in the changelog. To make su=
re these are pupulated, create a transaction and load these informations as=
transaction properties:

The same transaction can be used to modify multiple feature types. Once =
the transaction is committed, the new revision number generated can be retr=
ieved by querying a transaction property, as well as the version:

At the time of writing, revision and version are the same, but that will=
change once branching support is added (version will become something like=
"branchId:5".

=20

Advanced interactions: diffs, logs and rollbacks

=20

The following are brand new operations that cannot be achieved with norm=
al datastore API. Only the most primitive operation, getModifie=
dFeatureFIDs, is provided at the datastore level, whilst the others =
can be thougth as compositions of simpler operations, and have been include=
d in the VersionedPostgisFeatureStore instead (at the time of =
writing, feature locking is not supported by the versioned data store).

=
=20

Finding features changed between two versions

=20

The following operation returns a set of feature ids for features that w=
here modified, created or deleted between version1 and version2 and that ma=
tched the specified filter at least in one revision between version1<=
/code> and version2:

This operation can be used to build filters and extract the state of the=
modified features at version1 or version2 and re=
nder both on a map to show changes, to perform diffs and to make rollbacks =
(thought the latter are so common operations that are provided out of the b=
ox in the VesionedPostgisFeatureStore class).

=20

The feature matching semantics is a little complex, so here is a more de=
tailed explaination:

=20

=20

A feature is said to have been modified between version1 and version2 i=
f a new state of it has been created after version1 and before or at versio=
n2 (included), or if it has been deleted between version1 and version2 (inc=
luded).

=20

Filter is used to match every state between version1 and version2, so a=
ll new states after version1, but also the states existent at version1 prov=
ided they existed also at version1 + 1.

=20

If at least one state matches the filter, the feature id is returned.=
li>=20

=20

The result is composed of three sets of feature ids:

=20

=20

Features created after version1 are included in the create=
d set

=20

Features deleted before or at version2 are included in the=
deleted set

=20

Features not included in the created/deleted sets are included in the m=
odified set

=20

=20

The following graph illustrates feature matching and set destination. Ea=
ch line represents a feature lifeline, with different symbols for filter ma=
tched states, unmatched states, state creation, expiration, and lack of fea=
ture existance.

Rolling back

=20

Rolling back changes can be performed by calling the VersionedPostgisFea=
tureStore.rollback method: {{ public void rollback(String toVersion, =
Filter filter)}} and specifying the target rollback revision, and whi=
ch feature have to be included in the rollback operation.

=20

This can be useful to roll back changes performed in a certain area, or =
to roll back specific feature changes. Rollback creates a new revisio=
n, does not remove rolled back changes from history, just like when rolling=
back with Subversion (which is done by merging a reverse diff and committi=
ng).

=20

Logging

=20

An equivalent of svn log can be performed with the following method on V=
ersionedPostgisFeatureStore:

The returned FeatureCollection contains all changeset featu=
re that are associated to feature changes between the specified versions an=
d matching the specified filter (with the same matching semantics described=
for getModifiedFeatureFIDs).

=20

Returned features are sorted on revision number, descending (just like s=
vn log).

=20

Diffing

The returned reader allows to gather a stream of FeatureDiff objects describing differences between the features at the two revision t=
hat do match the filter (with the same matching semantics described for getModifiedFeatureFIDs).

=20

A FeatureDiff object provides:

=20

=20

The feature ID

=20

A state change, which can be CREATED, DELETED or MODIFIED, depending on=
how the feature changed between fromVersion and toVersi=
on;

=20

A map from changed attributes to their value at toVersion.=
If the state if MODIFIED, the map contains only the modified attribu=
tes, whilst if it's CREATED it contains all attibutes.

=20

=20

This is the only command that does not swap versions before executing if=
fromVersion > toVersion. If a feature is created between fromVersion an=
d toVersion, a swapped request will mark it as deleted, and vice versa. Values for modified features will always be those found at toVersion.=
p>=20

Some un=
der the hood information

=20

Each time a feature type is version enabled, the underlying table struct=
ure gets modified. In particular:

=20

=20

a new column, revision, is added to track the revision in =
which the record has been created;

=20

another one, expired, is added to track the revision in wh=
ich the record expired (either a new version of the record has been created=
, or the feature has been deleted);

=20

the primary key is altered, if (pk1, ..., pkN) are the old primary key =
columns, the new primary key is (revision, pk1, ..., pkN)

=20

a new index, (expired, pk1, ..., pkN) is created as well.

=20

=20

Each time a feature gets modified, the current record is marked as expir=
ed an a new one is created, thus, the database contains a full copy of each=
feature state. In the future the versioned datastore may learn to sh=
ave off some of the old states, preserving only tagger revisions, but=
for the moment it does not seems necessary to, performance is still good e=
ven with lots of revisions inside the db. For further details, see th=
e datastore preliminary study and benchmark.=
p>=20

Of course this way of doing things prevents other applications to keep o=
n working transparently against the data. If reading is all that's ne=
eded, the following view may help:

Allowing writing could be done as well using "instead of" rule=
s against the above view, but that would replicate lots of the functionalit=
y already available in the datastore (which has been designed this way to m=
ake the creation of versioned datastores against other databases easy). If there is sufficient interest and someone sponsoring the work, this c=
ould be indeed turned into a PostGIS extension, in this case the datastore =
could become a tiny wrapper above functinality provided by triggers and sto=
red procedures.