Project description

The noWorkflow project aims at allowing scientists to benefit from
provenance data analysis even when they don’t use a workflow system.
Also, the goal is to allow them to avoid using naming conventions to
store files originated in previous executions. Currently, when this is
not done, the result and intermediate files are overwritten by every new
execution of the pipeline.

noWorkflow was developed in Python and it currently is able to capture
provenance of Python scripts using Software Engineering techniques such
as abstract syntax tree (AST) analysis, reflection, and profiling, to
collect provenance without the need of a version control system or any
other environment.

Quick Installation

To install noWorkflow, you should follow these basic instructions:

If you have pip, just run:

$ pip install noworkflow[all]

This installs noWorkflow, PyPosAST, SQLAlchemy, python-future, flask,
IPython, Jupyter and PySWIP. The only requirements for running
noWorkflow are PyPosAST, SQLAlchemy and python-future. The other
libraries are only used for provenance analysis.

If you only want to install noWorkflow, PyPosAST, SQLAlchemy and
python-future please do:

$ pip install noworkflow

If you do not have pip, but already have Git (to clone our repository)
and Python:

Each new run produces a different trial that will be stored with a
sequential identification number in the relational database.

Verifying the module dependencies is a time consuming step, and
scientists can bypass this step by using the -b flag if they know that
no library or source code has changed. The current trial then inherits
the module dependencies of the previous one.

It is possible to collect more information than what is collected by
default, such as variable usages and dependencias. To perform a dynamic
program slicing and capture those information, just run

$ now run -e Tracer simulation.py data1.dat data2.dat

To list all trials, just run

$ now list

Assuming we run the experiment again and then run now list, the
output would be as follows. Note that 9 trials were extracted from the
demonstration.

This command has several options, such as -m to show module
dependencies; -d to show function definitions; -e to show the
environment context; -a to show function activations; and -f to show
file accesses.

Running

$ now show -a 1

would show details of trial 1. Notice that the function name is preceded
by the line number where the call was activated.

By default, the restore command only restores the script used for the
trial (“simulation.py”), even when it has imports and read files as
input. Use the option -l to restore imported modules and the option
-i to restore input files. The restore command track the evolution
history. By default, subsequent trials are based on the previous Trial
(e.g. Trial 2 is based on Trial 1). When you checkout a Trial, the next
Trial will be based on the checked out Trial (e.g. Trial 3 based on
Trial 1).

The remaining options of noWorkflow are diff, export and vis. The
diff option compares two trials, and the export option exports
provenance data of a given trial to Prolog facts, so inference queries
can be run over the database.

Included Software

Acknowledgements

We would like to thank CNPq, FAPERJ, and the National Science Foundation
(CNS-1229185, CNS-1153503, IIS-1142013) for partially supporting this
work.

License Terms

Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and associated documentation files (the
“Software”), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be included
in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.