Navigation

Paraphrasing the emacs documentation, let us say that hooks are an important
mechanism for customizing an application. A hook is basically a list of
functions to be called on some well-defined occasion (this is called running
the hook).

In CubicWeb, hooks are subclasses of the Hook
class. They are selected over a set of pre-defined events (and possibly more
conditions, hooks being selectable appobjects like views and components). They
should implement a __call__() method that will
be called when the hook is triggered.

There are two families of events: data events (before / after any individual
update of an entity / or a relation in the repository) and server events (such
as server startup or shutdown). In a typical application, most of the hooks are
defined over data events.

Also, some Operation may be registered by hooks,
which will be fired when the transaction is commited or rolled back.

The purpose of data event hooks is usually to complement the data model as
defined in the schema, which is static by nature and only provide a restricted
builtin set of dynamic constraints, with dynamic or value driven behaviours.
For instance they can serve the following purposes:

It is functionally equivalent to a database trigger, except that database
triggers definition languages are not standardized, hence not portable (for
instance, PL/SQL works with Oracle and PostgreSQL but not SqlServer nor Sqlite).

Operations are subclasses of the Operation class
that may be created by hooks and scheduled to happen on precommit,
postcommit or rollback event (i.e. respectivly before/after a commit or
before a rollback of a transaction).

Hooks are being fired immediately on data operations, and it is sometime
necessary to delay the actual work down to a time where we can expect all
information to be there, or when all other hooks have run (though take case
since operations may themselves trigger hooks). Also while the order of
execution of hooks is data dependant (and thus hard to predict), it is possible
to force an order on operations.

So, for such case where you may miss some information that may be set later in
the transaction, you should instantiate an operation in the hook.

Operations may be used to:

implements a validation check which needs that all relations be already set on
an entity

process various side effects associated with a transaction such as filesystem
udpates, mail notifications, etc.

Hooks are mostly defined and used to handle dataflow operations. It
means as data gets in (entities added, updated, relations set or
unset), specific events are issued and the Hooks matching these events
are called.

You can get the event that triggered a hook by accessing its event
attribute.

When called for one of these events, hook will have an entity attribute
containing the entity instance.

before_add_entity, before_update_entity:

On those events, you can access the modified attributes of the entity using
the entity.cw_edited dictionary. The values can be modified and the old
values can be retrieved.

If you modify the entity.cw_edited dictionary in the hook, that is before
the database operations take place, you will avoid the need to process a whole
new rql query and the underlying backend query (eg usually sql) will contain
the modified data. For example:

self.entity.cw_edited['age']=42

will modify the age before it is written to the backend storage.

Similarly, removing an attribute from cw_edited will cancel its
modification:

delself.entity.cw_edited['age']

On a before_update_entity event, you can access the old and new values:

old,new=entity.cw_edited.oldnewvalue('age')

after_add_entity, after_update_entity

On those events, you can get the list of attributes that were modified using
the entity.cw_edited dictionary, but you can not modify it or get the old
value of an attribute.

before_delete_entity, after_delete_entity

On those events, the entity has no cw_edited dictionary.

Note

self.entity.cw_set(age=42) will set the age attribute to
42. But to do so, it will generate a rql query that will have to be processed,
hence may trigger some hooks, etc. This could lead to infinitely looping hooks.

Hooks called on server start/maintenance/stop event (e.g.
server_startup, server_maintenance, before_server_shutdown,
server_shutdown) have a repo attribute, but their `_cw` attribute
is None. The server_startup is called on regular startup, while
server_maintenance is called on cubicweb-ctl upgrade or shell
commands. server_shutdown is called anyway but connections to the
native source is impossible; before_server_shutdown handles that.

Hooks called on backup/restore event (eg server_backup,
server_restore) have a repo and a timestamp attributes, but
their `_cw` attribute is None.

It is sometimes convenient to explicitly enable or disable some hooks. For
instance if you want to disable some integrity checking hook. This can be
controlled more finely through the category class attribute, which is a string
giving a category name. One can then uses the
deny_all_hooks_but() and
allow_all_hooks_but() context managers to
explicitly enable or disable some categories.

The existing categories are:

security, security checking hooks

worfklow, workflow handling hooks

metadata, hooks setting meta-data on newly created entities

notification, email notification hooks

integrity, data integrity checking hooks

activeintegrity, data integrity consistency hooks, that you should never
want to disable

accept if the relation type is in one of the sets given as initializer
argument. The goal of this predicate is that it keeps reference to original sets,
so modification to thoses sets are considered by the predicate. For instance

Hooks being appobjects like views, they have a __regid__ and a __select__
class attribute. Like all appobjects, hooks have the self._cw attribute which
represents the current connection. In entity hooks, a self.entity attribute is
also present.

The events tuple is used by the base class selector to dispatch the hook
on the right events. It is possible to dispatch on multiple events at once
if needed (though take care as hook attribute may vary as described above).

Operation may be instantiated in the hooks’ __call__ method. It always
takes a connection object as first argument (accessible as .cnx from the
operation instance), and optionally all keyword arguments needed by the
operation. These keyword arguments will be accessible as attributes from the
operation instance.

the transaction is being prepared for commit. You can freely do any heavy
computation, raise an exception if the commit can’t go. or even add some
new operations during this phase. If you do anything which has to be
reverted if the commit fails afterwards (eg altering the file system for
instance), you’ll have to support the ‘revertprecommit’ event to revert
things by yourself

revertprecommit:

if an operation failed while being pre-commited, this event is triggered
for all operations which had their ‘precommit’ event already fired to let
them revert things (including the operation which made the commit fail)

rollback:

the transaction has been either rolled back either:

intentionally

a ‘precommit’ event failed, in which case all operations are rolled back
once ‘revertprecommit’’ has been called

postcommit:

the transaction is over. All the ORM entities accessed by the earlier
transaction are invalid. If you need to work on the database, you need to
start a new transaction, for instance using a new internal connection,
which you will need to commit.

For an operation to support an event, one has to implement the <event
name>_event method with no arguments.

The order of operations may be important, and is controlled according to
the insert_index’s method output (whose implementation vary according to the
base hook class used).

Mix-in class to ease applying a single operation on a set of data,
avoiding to create as many as operation as they are individual modification.
The body of the operation must then iterate over the values that have been
stored in a single operation instance.

You should try to use this instead of creating on operation for each
value, since handling operations becomes costly on massive data import.

You can modify the containercls class attribute, which defines the
container class that should be instantiated to hold payloads. An instance is
created on instantiation, and then the add_data() method will add the
given data to the existing container. Default to a set. Give list if you
want to keep arrival ordering. You can also use another kind of container
by redefining _build_container() and add_data()

More optional parameters can be given to the get_instance operation, that
will be given to the operation constructor (for obvious reasons those
parameters should not vary accross different calls to this method for a
given operation).

Note

For sanity reason get_data will reset the operation, so that once
the operation has started its treatment, if some hook want to push
additional data to this same operation, a new instance will be created
(else that data has a great chance to be never treated). This implies:

We will use a very simple example to show hooks usage. Let us start with the
following schema.

classPerson(EntityType):age=Int(required=True)

We would like to add a range constraint over a person’s age. Let’s write an hook
(supposing yams can not handle this nativly, which is wrong). It shall be placed
into mycube/hooks.py. If this file were to grow too much, we can easily have a
mycube/hooks/… package containing hooks in various modules.

fromcubicwebimportValidationErrorfromcubicweb.predicatesimportis_instancefromcubicweb.server.hookimportHookclassPersonAgeRange(Hook):__regid__='person_age_range'__select__=Hook.__select__&is_instance('Person')events=('before_add_entity','before_update_entity')def__call__(self):if'age'inself.entity.cw_edited:if0<=self.entity.age<=120:returnmsg=self._cw._('age must be between 0 and 120')raiseValidationError(self.entity.eid,{'age':msg})

In our example the base __select__ is augmented with an is_instance selector
matching the desired entity type.

The events tuple is used specify that our hook should be called before the
entity is added or updated.

Then in the hook’s __call__ method, we:

check if the ‘age’ attribute is edited

if so, check the value is in the range

if not, raise a validation error properly

Now Let’s augment our schema with new Company entity type with some relation to
Person (in ‘mycube/schema.py’).

We would like to constrain the company’s bosses to have a minimum (legal)
age. Let’s write an hook for this, which will be fired when the boss relation
is established (still supposing we could not specify that kind of thing in the
schema).

classCompanyBossLegalAge(Hook):__regid__='company_boss_legal_age'__select__=Hook.__select__&match_rtype('boss')events=('before_add_relation',)def__call__(self):boss=self._cw.entity_from_eid(self.eidto)ifboss.age<18:msg=self._cw._('the minimum age for a boss is 18')raiseValidationError(self.eidfrom,{'boss':msg})

Like in hooks, ValidationError can be raised in operations. Other
exceptions are usually programming errors.

In the above example, our hook will instantiate an operation each time the hook
is called, i.e. each time the subsidiary_of relation is set. There is an
alternative method to schedule an operation from a hook, using the
get_instance() class method.

Here, we call set_operation() so that we will simply accumulate eids of
entities to check at the end in a single CheckSubsidiaryCycleOp
operation. Value are stored in a set associated to the
‘subsidiary_cycle_detection’ transaction data key. The set initialization and
operation creation are handled nicely by set_operation().

If your application consists of several instances, you may need some means to
communicate between them. Cubicweb provides a publish/subscribe mechanism
using ØMQ. In order to use it, use
add_subscription() on the
repo.app_instances_bus object. The callback will get the message (as a
list). A message can be sent by calling
publish() on repo.app_instances_bus.
The first element of the message is the topic which is used for filtering and
dispatching messages.

The zmq-address-pub configuration variable contains the address used
by the instance for sending messages, e.g. tcp://*:1234. The
zmq-address-sub variable contains a comma-separated list of addresses
to listen on, e.g. tcp://localhost:1234, tcp://192.168.1.1:2345.

You should never use the entity.foo = 42 notation to update an entity. It will
not do what you expect (updating the database). Instead, use the
cw_set() method or direct access to entity’s
cw_edited attribute if you’re writing a hook for ‘before_add_entity’ or
‘before_update_entity’ event.

When a hook which is responsible to maintain the consistency of the
data model detects an error, it must use a specific exception named
ValidationError. Raising anything but a (subclass of)
ValidationError is a programming error. Raising it
entails aborting the current transaction.

This exception is used to convey enough information up to the user
interface. Hence its constructor is different from the default Exception
constructor. It accepts, positionally:

Relations which are defined in the schema as inlined (see Relation type
for details) are inserted in the database at the same time as entity attributes.

This may have some side effect, for instance when creating an entity
and setting an inlined relation in the same rql query, then at
before_add_relation time, the relation will already exist in the
database (it is otherwise not the case).