History

- The '$HOME/.config/arvados/settings.conf' file needs to exist with appropriate keys. We need to discuss what kind of account this should be. For now this is my (abram) account.

- Until 'arv-run-pipeline-instance' is updated to use the 'settings.conf' The '$HOME/.config/arvados/settings.sh' file needs to exist that exports the appropriate Arvados API keys so that they get picked up by 'arv-run-pipeline-instance'.

- The pipeline consists of two 'legs', the first of which generates the GFF with the initial annotaitons and initial report, and the second of which 'refreshes' the report. As of 2015-02-20 this takes roughly 30mins on Google Compute Platform from start to finish. Considering 30mins is relatively quick, partial progress reporting is effectively disabled and Tapestry will show 'unknown' until the job has finished (either successfully or unsuccessfully).

- Pipeline submission requires the filename appended to the portable data hash. For example 'cafecafecafecafecafecafecafecafe+255/filetoprocess.tsv.bz2', as opposed to how it was previously (just requiring the portable data hash and nothing else).

- The source file download functionality now makes two calls to the Arvados API, first getting the manifest to get the file length then redirecting the 'arv-get' output for the download. If the 'input.locator' symlink file does not have a file name appeneded to a portable data hash it's smart enough to find the appropriate file anyway so that it will support the new 'input.locator' symlinks as well as the old.

- The '/home/trait/upload/<PDH>-out' directroy gets populated on successfull pipeline completion and after a 'status' call to GET-Evidence has been issued. This means the first 'status' call after pipeline completion might take a while to download the data and populate the directory.

In the second line, I'm pretty sure the explicit loading of settings.conf is unnecessary since all that seems to follow are a few calls to arv-get, which will discover that file automatically.

- the hardcoding of the pipeline template uuid in public_html/submit_GE_pipeline seems suboptimal. Maybe that should go in a configuration parameter?

- finally; maybe you should add a small README that explains what the dependencies are for this functionality. It should mention that a .config/arvados/settings.conf file is needed, the arv tools need to be installed.

In the second line, I'm pretty sure the explicit loading of settings.conf is unnecessary since all that seems to follow are a few calls to arv-get, which will discover that file automatically.

Fixed

- the hardcoding of the pipeline template uuid in public_html/submit_GE_pipeline seems suboptimal. Maybe that should go in a configuration parameter?

So in addition to settings.conf and settings.sh we have yet another config file? It presumably also lives in $HOME/.config/arvados? Is it a shell script? A JSON file that gets parsed? Should we make it more general for when we scrap the current PHP GE for something newer or should it be a one-off?

My opinion is that since the main motivation is to get some small group of genomes through Tapestry/GET-Evidence to give back to participants for approval, making the nice, more general solution can be delayed until we have a clearer vision of how to re-organize GET-Evidence. Until then, keeping a hard-coded pipeline template is not ideal but better than having a config file stuffed in at the last minute.

- finally; maybe you should add a small README that explains what the dependencies are for this functionality. It should mention that a .config/arvados/settings.conf file is needed, the arv tools need to be installed.