Purpose of the Blackboard is management of SMILA record data during processing in SMILA component (Connectivity, Workflow Processor). Complete record data is stored only on a Blackboard which is not pushed through the workflow engine itself. The Blackboard hides handling of record persistence from the services.

+

−

Clients should generally manipulate records using Blackboard API methods in most cases so records will be completely under control of the Blackboard.

+

−

== Creation and Lifecycle of Blackboard ==

+

The blackboard holds the records while they are pushed through a pipeline. Pipelets are invoked with a blackboard instance and a list of IDs of records to process. The pipelet can then access the blackboard to get record metadata and attachments. The blackboard hides the handling of record persistence from the services. For example it can be configured to hold only the record metadata in memory, but to put the attachments in the [[SMILA/Documentation/Binary Storage|BinaryStorage service]] to save memory, if large attachments are used or many records are processed at the same time. Or it could get a record from a [[SMILA/Documentation/Record Storage|RecordStorage service]] on demand and write it back if processing is done (however, in SMILA 1.0 we do not use the RecordStorage by default anymore).

−

For Blackboard creation we use a BlackboardFactory service running as a declarative service.

+

The blackboard instance is released after the pipeline execution has been finished, for each pipeline execution a new blackboard instance is created. If the blackboard has storages attached is the choice of the creating component and can be configurable there, or it depends on if storage services are active in the SMILA application. For the user of the blackboard (the pipelet, usually), it should be not relevant, if the blackboard has storages attached or not.

−

* The factory can create Blackboard instances which are either "transient" (pure in-memory implementation, not using any storages) or "persisting" (linked to binary storage and optionally to record storage). The client selects which kind of blackboard it wants to use. A persisting blackboard can only be created successfully, if at least a binary storage is known. Creation of transient blackboards is always possible.

+

−

* For each "session" an own new blackboard instance is created that manages only those records worked on by this request. A session is for example:

+

−

** a single task list execution of a QueueWorker router or listener (i.e. add/delete one record in Connectivity, or processing one input record from a queue message and manage all additional records created by the invoked workflows)

+

−

** a single search request in the search service.

+

−

* After the session the blackboard instance is released completely, thus freeing any memory resources automatically without interfering with other blackboard sessions.

+

−

=== BlackboardFactory interfaces ===

+

== Blackboard Usage ==

−

<source lang="java">

+

For pipelet programmers, using the blackboard is usually trivial:

−

interface BlackboardFactory {

+

* Use <tt>getRecord(id)</tt> or <tt>getMetadata(id)</tt> to get the record metadata. Modify the returned object to change record metadata.

−

/**

+

* To access record attachments, you should use the <tt>get/setAttachment</tt> methods of the blackboard. The <tt>Attachment</tt> objects of the <tt>Record</tt> object returned by <tt>getRecord(id)</tt> may not allow access to the attachment content, if the content has been swapped out to BinaryStorage. It's recommended to use streaming methods for attachments to keep memory consumption low.

Notes are additional temporary data created by pipelets to be used in later pipelets in the same workflow, but not to be persisted in the storages. Notes can be either global or record specific (associated with a record ID). Record specific notes are copied on record splits and removed when the associated record is removed from the blackboard. Each Note has a String name and Serializable value.

−

+

−

Record can be put onto Blackboard with one of the following operations:

+

−

* create(Id);

+

−

*: Creates a new record with a given Id. No data is loaded from persistence. If record with this Id already exists in the storages it will be overwritten when the created record will be committed. E.g. used by Connectivity to initialize the record from incoming data.

+

−

* load(Id);

+

−

*: Loads record data for the given Id from persistence. Used by a client to indicate that it wants to process this record.

+

−

* split(Id, String);

+

−

*: Creates a fragment of a given record, i.e. the record content is copied to a new Id derived from the given by adding a frament name (see Id Concept for details).

+

−

* setRecord(Record);

+

−

*: Puts record on the Blackboard, saves record attachments to BinStorage and replaces actual record attachments values with null.

+

−

* synchronize(Record);

+

−

*: Assumes that record with the same Id as of given record already exists on Blackboard or in storage. Loads record from the storage if needed and updates it's properties with properties of the given record.

+

−

+

−

Record is removed from the blackboard with one of these operations:

+

−

* commit(Id);

+

−

*: Saves record and attachments to storages and removes record from the Blackboard.

+

−

* invalidate(Id);

+

−

*: Record is removed from the Blackboard. If the record was created new (not overwritten) on the Blackboard it will be removed completely.

+

−

+

−

=== Attachments management ===

+

−

There are following methods for working with Record attachments in the Blackboard:

+

−

* setAttachment(id, name, byte[]);

+

−

* setAttachmentFromStream(id, name, InputStream);

+

−

* byte[] getAttachment(id, name);

+

−

* InputStream getAttachmentAsStream(id, name);

+

−

* boolean hasAttachment(id, name);

+

−

+

−

Attachments are not stored anywhere in the Blackboard, they are saved to BinStorage directly and the actual attachment value in the corresponding Record is replaced with {{null}}. It is highly recommended to use only Stream methods to manage attachments because loading the whole attachments in memory will cause great memory consumption and can be cause for application crash.

+

−

+

−

=== Usage of Blackboard Notes ===

+

−

Notes is additional temporary data created by pipelets to be used in later pipelets in the same workflow, but not to be persisted in the storages. Notes can be either global or record specific (associated with a record Id). Record specific notes are copied on record splits and removed when the associated record is removed from the Blackboard. Each Note has a String name and Serialaizable value.

+

There are following methods for working with Notes:

There are following methods for working with Notes:

−

* boolean hasGlobalNote(name);

+

* <tt>boolean hasGlobalNote(name);</tt>

−

* Serializable getGlobalNote(name);

+

* <tt>Serializable getGlobalNote(name);</tt>

−

* setGlobalNote(name, value);

+

* <tt>setGlobalNote(name, value);</tt>

−

* boolean hasRecordNote(id, name);

+

* <tt>boolean hasRecordNote(id, name);</tt>

−

* getRecordNote(id, name);

+

* <tt>getRecordNote(id, name);</tt>

−

* setRecordNote(id, name, value);

+

* <tt>setRecordNote(id, name, value);</tt>

[[Category:SMILA]]

[[Category:SMILA]]

−

−

=== Usage of Path with Blackboard methods ===

−

Some methods of Blackboard accept Path as an argument, for example ''getAttributeNames(Id, Path)''. Path represents the attribute path in the Record. String format of Path looks like ''attributeName1[index1]/attributeName2[index2]/...''. The specification of index is optional and defaults to 0. Index can refer to a literal or a sub-object that depends on methods getting the argument.

−

−

Consider the following example Record structure:

−

−

<source lang="xml">

−

<Record>

−

<A n="AccessTreeExpanded">

−

<O>

−

<A n="account">

−

<O>

−

<A n="sub">

−

<O>

−

<A n="sid">

−

<L>

−

<V>Value1</V>

−

</L>

−

<L>

−

<V>Value2</V>

−

</L>

−

</A>

−

</O>

−

<O>

−

<A n="sid">

−

<L>

−

<V>Value3</V>

−

</L>

−

</A>

−

</O>

−

</A>

−

</O>

−

</A>

−

</O>

−

</A>

−

</Record>

−

</source>

−

−

The path to access first MObject (<O>) of the ''sub'' attribute is "''AccessTreeExpanded[0]/account[0]/sub[0]/''". Index in each step means the number of MObject inside the attribute. That is, to access second MObject of the ''sub'' attribute the path will be "''AccessTreeExpanded[0]/account[0]/sub[1]/''".

−

−

There are some cases when index of last step has a different meaning:

−

−

- in the ''getLiteral(Id, Path)'' method the index of last step means the number of literal inside the attribute. That is, path for accessing literal from ''sid'' attribute of second ''sub'' MObject (literal with value "Value3") will be "''AccessTreeExpanded[0]/account[0]/sub[1]/sid[0]''" and path for accessing second literal of ''sid'' attribute of first ''sub'' MObject (literal with value "Value2") will be: "'' AccessTreeExpanded[0]/account[0]/sub[0]/sid[1]''".

−

−

- in the ''getLiterals(Id, Path)'' method index of last step is irrelevant, that means this method will return all literals of the attribute found at the given path;

−

−

- in the ''setLiteral(Id, Path, Value)'' and ''addLiteral(Id, Path, Value)'' methods index of last step is irrelevant, that means that literal will be set or added to the attribute found on specified path

−

−

- in the methods that modify annotations to access root annotations of the record path should be null, "" (empty string), or empty Path

Revision as of 03:57, 9 January 2013

What is the blackboard?

The blackboard holds the records while they are pushed through a pipeline. Pipelets are invoked with a blackboard instance and a list of IDs of records to process. The pipelet can then access the blackboard to get record metadata and attachments. The blackboard hides the handling of record persistence from the services. For example it can be configured to hold only the record metadata in memory, but to put the attachments in the BinaryStorage service to save memory, if large attachments are used or many records are processed at the same time. Or it could get a record from a RecordStorage service on demand and write it back if processing is done (however, in SMILA 1.0 we do not use the RecordStorage by default anymore).

The blackboard instance is released after the pipeline execution has been finished, for each pipeline execution a new blackboard instance is created. If the blackboard has storages attached is the choice of the creating component and can be configurable there, or it depends on if storage services are active in the SMILA application. For the user of the blackboard (the pipelet, usually), it should be not relevant, if the blackboard has storages attached or not.

Blackboard Usage

For pipelet programmers, using the blackboard is usually trivial:

Use getRecord(id) or getMetadata(id) to get the record metadata. Modify the returned object to change record metadata.

To access record attachments, you should use the get/setAttachment methods of the blackboard. The Attachment objects of the Record object returned by getRecord(id) may not allow access to the attachment content, if the content has been swapped out to BinaryStorage. It's recommended to use streaming methods for attachments to keep memory consumption low.

Blackboard Notes

Notes are additional temporary data created by pipelets to be used in later pipelets in the same workflow, but not to be persisted in the storages. Notes can be either global or record specific (associated with a record ID). Record specific notes are copied on record splits and removed when the associated record is removed from the blackboard. Each Note has a String name and Serializable value.
There are following methods for working with Notes: