DiscoveryNet

From
DiscoveryNet overview:
(...)DiscoveryNet is building the software infrastructure
and tools for providing Knowledge Discovery Services allowing scientists
to conduct and manage complex data analysis and knowledge discovery activities
over data generated by modern high throughput sensors.
DiscoveryNet demonstrators include applications in life sciences,
environmental modelling and geo-hazard prediction.
The project aims to design, develop and implement
an advanced infrastructure to support real-time processing,
interpretation, integration, visualization and mining
of massive amounts of time critical data generated by high throughput devices.
(...)

Project home page: http://www.discovery-on-the.net/
and some papers http://www.discovery-on-the.net/new/documentation.php
(i could not find actual downloadable code)

Integrated Scientific Database Access: allowing the integration of
structured and semi-structured data from different data sources within
a discovery procedure using XML schemas.

Knowledge Discovery Process Management: including
DPML (Discovery Process Markup Language) as
a standard specification language
for constructing and managing knowledge discovery procedures,
as well as recording their history.

In one word, workflow represents the Knowledge of Action.
That is why we call the workflows in discovery informatics as Discovery Plans
(...)

And more on DPML: (...)

Discovery Process Mark-up Language, XML

Data integration and analysis pipeline definition language used for storage,execution and deployment

Directed graph of components or services

Can store execution constraints for scheduling

Not a business process work-flow language, i.e. does not model an activity but a function (deterministic execution).

Concerned with work-flow parameters

Acyclic DPML graph execution

Each component can be bound to a resource (by definition or by constraint)

Placement policy for free components based on parent component location

A component execution can be farmed out to a set of defined
D-Net managed resources or to a managed cluster (work with SGE)

Current work on streaming, cyclic graphs and multiple output components.

(...)

DiscoveryNet and Workflows: they use Discovery Process Markup Language (DPML)
"which allows the definition of data analysis tasks to be executed on distributed resources."
but i could not find more details obut DPML even though they call it standard ...
(http://www.bioinformaticsworld.info/biwspr03datamining.html
and http://www.discovery-on-the.net/new/documents/dnet_architecture.pdf).
Short example of DPML is in paper http://www.discovery-on-the.net/new/documents/kdd-DNET.pdf
DPML looks like data flow language and interesting quesiton is
how it comapres to WSFL? Description in
http://www.discovery-on-the.net/new/documents/DiscoveryProcesses.pdf
does nto go into details:
(...)
How scientists make use of computers to explore
data is of central importance. Existing methods are
largely ad-hoc using spreadsheets for data
manipulation and separate algorithms packages for
analysis. Where successful processes need to be
automated, the traditional bioinformatics approach
has been to create bespoke applications using
scripting languages such as Perl. Recent approaches
to representing discovery processes have been
limited to using workflow languages such as
WSFL[3] to define service composition for
execution. These methodologies are labour intensive
and problematic since the designer of the process is
rarely the person who implements it as an
application.
(...)
The example below shows a simple DPML task
where a microarray-generated gene expression data
set has been manipulated to derive a new attribute,
then passed to a K-means clustering node. In
comparison to a traditional workflow language it is
does not include any implementation specific details.
How nodes are mapped to actual components is left
as a matter for the execution environment, which
also performs verification of the process. Each
node?s operation is uniquely identified by an
element that acts as a parameterisation message. The
node?s inputs are determined by connection elements
that define the graph?s structure.
(...)

DPML looks like a mix of declartive language - they use SQL in XML?
And thsi exampel does not look easy to use so
as expected authors suggest that using tool will hide this complexity:
(...) Effective use of DPML relies on a graphical client
such as Kensington [1] that captures all details that
DPML can represent, allowing users to construct
tasks with a drag and drop visual programming
environment, and interpret results with a rich set of
visualisation modules.
(meta-mining) can assist users by finding common
patterns of activity and identifying useful processes
or relevant experts to deal with a given situation.
DPML and the architecture described above have
been implemented as part of the DiscoveryNet [2]
(...)
However in this case argument to abandon WSFL is valid when
workflow language is hidden by GUI anyway?