Submitting a workflow to Condor via SOAP using Java

In Condor, a workflow is called a DAG. DAG stand for directed acyclic graph. DAGs in Condor are managed by processes called condor_dagman. condor_dagman is a program that takes a description of a DAG as input and walks it; submitting jobs and monitoring their progress. The condor_dagman process itself is often submitted to Condor and managed by the Schedd like any other job. The only thing special about condor_dagman jobs is that they are run on the same machine as the Schedd. Typically in Condor’s Scheduler Universe. Historical note: the first job submitted to Condor via SOAP was a DAG.

The Schedd’s Submit() function takes a ClassAd representing a job. What condor_submit does is take a submit file and convert it into a ClassAd. To look at a submit file for a DAG run condor_submit_dag -no_submit diamond.dag and read diamond.dag.condor.sub. That will give you a hint at what kind of environment condor_dagman wants to run in. Note: remove_kill_sig, arguments, environment and on_exit_remove.

However, diamond.dag.condor.sub is not a ClassAd. To see the ClassAd you can run diamond.dag.condor.sub through condor_submit. Do so with condor_submit -dump diamond.dag.condor.sub.ad diamond.dag.condor.sub. Have a look at diamond.dag.condor.sub.ad and note RemoveKillSig, Arguments, Env and OnExitRemove.

Now you have the basis for what a DAG job needs to run. In the example code I used a little extra knowledge to generate the ClassAd, for brevity. You can use the Schedd’s CreateJobTemplate function to help generate the ClassAd for you instead. Extending arrays is kinda annoying in Java, but cake in python.

This configuration lets anyone talk to your Schedd and submit jobs as any user. For deployment, you should restrict this access by narrowing the ALLOW_SOAP and by setting QUEUE_ALL_USERS_TRUSTED to FALSE. Note: changing QUEUE_ALL_USERS_TRUSTED requires that clients can authenticate themselves via SSL.

This configuration also gives you a fixed port, 1984, for the condor_schedd. Otherwise the port is ephemeral, and you’ll have to query the Collector to find it.

You can watch the DAG run with condor_q, and condor_q -dag. Don’t be afraid of the —????— for the DAG itself, that’s just because I pruned the job ad to the bare minimum to run. condor_q expects a few extra attributes to be present. If you use CreateJobTemplate you’ll get all the attributes condor_q wants.