Universal Worker Service (UWS) RFC (Version 1.0)

This document is a "Request for Comment" (RFC) for the Proposed Recommendation "Universal Worker Service V1.0". The latest version of the specification(10-02-2010) can be found at
http://www.ivoa.net/Documents/UWS/20100210/

IVOA Review Period: 05 Oct 2009 - 05 Nov 2009.

TCG Review Period: 17 Feb 2010 - 19 Mar 2010.

In order to add a comment to the document, please edit this page and add your comment to the list below in the format used for the example (include your WikiName so that authors can contact you for further information). When the author(s) of the document have considered the comment, they will provide a response after the comment.

Additional discussion about any of the comments or responses can be conducted on the GWS WG mailing list, grid@ivoa.net. However, please be sure to enter your initial comments here for full consideration in any future revisions of this document

Comments from the community

Reusing Roy's (unanswered) comment on the TAP's RFC: "Isn't there a requirement for implementations or prototypes before a standard can go to RFC? Please can somebody post the service URLs of these, so that I can try out this new standard for real?" I understand (and apologise for it) that it is very cheap on me (not developing any of that) saying so, but it is important to follow the established process. Thanks, Alberto Micol

I've a few issues that have come up in attempting to implement UWS in TAP. I'll forego the GET/POST issue lest it obscure what I think are simple issues that can easily be clarified. I've rewritten the comments I put on the DAL and GRID lists given the different context here of commenting on the UWS standard rather than reflecting issues implementing it.

Would it be possible to allow the service to be created and started in the same request? I'd expect this to be the normal mode of operation, but currently -- as I read the standard -- they require separate requests. This would considerably simplify clients.

PaulHarrison - 03 Nov 2009 It is central to the design that the two stage process be supported by compliant services - however, as discussed in the email thread it would be allowable for implementing service descriptions to allow additionally that jobs can be created into a "EXECUTING" (or possibly QUEUED) by specifying PHASE=RUN at the job creation step - the text should be amended to allow this possibility.

The standard should clarify when you can start a request. I believe the idea is that the phases have a specific order but that is never made explicit.

PaulHarrison - 03 Nov 2009 The ordering is as in section 2.1.3, but an extra sentence can be added to make the ordering

It would be desirable if the standard would specify what should happen when I cannot create a job -- not even an error job.

PaulHarrison - 03 Nov 2009 for the REST binding the fallback is always to HTTP conventions - so a 500 status is probably appropriate in this case. The document will

The idea of naming results is unclear. There appears to be contradictory language regarding when this is mandatory. It is also not clear how the name is conveyed in the job description. I assume -- from the example -- that it is in the id of the elements, but I don't think that is stated (maybe it's in the schema?).

PaulHarrison - 03 Nov 2009 agreed the text is unclear on this point - the name is the id, which is separate to the URL for a particular result.

The relationship if any between the existence of an error document and the phase of the job should be stated. In what phases is an error document required. In what phases is it allowed? Is it possible to have null place holder error documents? This would probably be nice for the results document too. Does the existence of a element in the results document imply that the associated result exists? E.g., in an EXECUTING job if I point to a URL is it invalid to give a 404 when the user attempts to retrieve it. Or am I allowed to give a 'forward reference' to a result that I hope will eventually be there.

PaulHarrison - 03 Nov 2009 An error document is never actually required by UWS - it always has an "informational" nature. I think that for the results it is probably good practice to have the results list only show results that are available at the time of the request to see the result list. However, a generic UWS client cannot make any assumptions about the the presence of a particular result/error document until the job has reached a "finished" state (COMPLETED, ERROR or ABORTED), and should retry at that stage, so If a client asks for a result/error URL early, then a 404 status is fair enough.

In the example Job section 2.2.2.2 of the UWS proposal, the id attribute is used in the parameter elements where it gives the name of the parameter and in the result elements where it names the results. This seems like an inappropriate use of the id attribute since it's potentially a structural element of the XML (used in pointers) so that overloading it with this semantic content seems dangerous.

E.g., the current usage will cause the document to be malformed if there happens to be a parameter with the same name as a result since (I believe) id's need to be unique throughout the document. I'd recommend using 'name' rather than 'id' in both places. If this is left as is, then the document should at least note the lack of independence of the two name spaces.

Also there has been a considerable discussion in the mailing list regarding the safety of GET operations for dynamic URLs (which is pretty much all UWS resources). The upshot seemed to be that providers need to ensure that GET request not be cached.

I have found errors in section 2.2.2.2 and appendix B in both cases there is typo "xlmns" instead of "xmlns" The xsd on the http://www.ivoa.net/xml/UWS/v1.0 is syntactically OK.

fixed in latest version

Suggestions to facilitate the comfortable interaction between UWS service and web browser (like a client front-end). It would be nice to describe the job by the more detailed tags that the client could use to present either in the single job description (root-URI/jobs/jobid) or in jobs list (root-URI/jobs).The example of this is a name of user (although there is a "owner" in UML schema and "ownerId" in the "JobSummary" tag - it may be e.g. full name of the user) or a additional description of a particular job (e.g. in computation of stellar models, the name of a star and additional description like method of computation and some remarks ... This the user enters into the web-based job creation form and might be nice to keep this for housekeeping purposess e.g. on the list of completed jobs as well as on detailed "root-URI/jobs/jobid" info. So the question is how to extend the tags to allow some additional information to embed in job (e,g. make the "jobinfo" an complexType) ? Then when the service returns jobs list - the element "jobs" - it is now said it returns "uws:ShortJobsDescription" but it is in fact only the "phase" plus "JobIdentifier". Why not return the whole "JobSummary" and let the client to extract the information neccessary for display of nice job roaster (e.g. using xslt style sheet) ?

there is already the uws:jobInfo element that allows the UWS implementation to add arbitrary information to the uws:job element.

The issue of paging of long list of jobs (as noted in xsd comments). Here we see the solution to let the client ask either for list of active jobs (e.g. root-URI/jobs?phase=executing) or all (default root-URI/jobs) or those already completed etc ... Simply it may be requested by the client to make the selection for different purposes and so the server should do it and return what he wants.

this will be re-considered for a future version of UWS - was decided to avoid the need for paging mechansims to retain simplicity - by having the relatively small amount of data for each job in the job list (see last answer) it means that there is of order 100 bytes per job in the list - assuming that ~1MByte is a reasonable maximum download size for the job list, it means that there can be 10,000 jobs in the list before the listing time starts to become intolerable.

Another issue is the meaning of "quote". Instead of using crystal ball to estimate when the job is expected to finish, it would be better to have some idea of allowed priority on different servers - like e.g. the nice value for different queues. So if checking among servers it would use that one with higher priority queue. But it may be on the client decision to modify this behaviour. E.g. the higher priority queues might impose shorter "ExecutionDuration".

concepts such as quotas and queues will be considered for inclusion in a future version of UWS.

The concept of Destruction Time is not clear enough. I already understand that this should be the time when the results will be removed from the storage space (it does not concern the allocation of processor) - but suppose you want to use the UWS server as your "external notebook" of your work - e.g. in the concept of "PDA-supercomputing". So many of your experiments are stored on the server and you decide to rerun the particular one with different parameters but basically same data sets (e.g. spectra, images ...) So it's up to you what experiment will be removed (if you go over quota you are not allowed the create new jobs).However after some warning time the jobs will be removed anyway (maybe you should receive some warning by the scheduler first to allow you to copy the data).

There is nothing stopping you from implementing your UWS based service in a way that allows rather long destruction times to allow for a form of "long term" storage of the results. You could even operate the quota system that you describe with current UWS - all that is missing from the current standard is some form of semantics for expressing in a uniform way why new jobs are not being allowed to be created so that the client understands that they must destroy some jobs to be able to run new ones. Again, I would not want to delay v1.0 to try to sort these issues out, but I think that it would be possible to perhaps add the concept of quotas to a future version of UWS.

The possibility of restarting jobs with same datasets but different control parameters. So the client (in a web browser) might have the button for resubmitting the same computation with different number of iterations (change of some control parametrs that may be re-edited in a browser) As the job is in fact described by its set of parameters it might be possible to use the whole "job" element to change particular parametrs and resubmit

There is no problem with a particular UWS implementation providing such functionality as described above, as this is extended behaviour beyond what must be provided to be UWS compliant, as long as such functionality does not break any of the mandatory UWS behaviour.

VO Query Language (Pedro Osuna, Yuji Shirasaki)

VOTable (Francois Ochsenbein)

Standard and Processes (Francoise Genova)

Data Curation & Preservation (Bob Hanisch)

The preamble lacks the usual language about terms “should”, “must”, “may”, etc., and it is not clear in the main document just how the “should”s, etc., are to be interpreted. added

Sections 4 and 5 are labeled as “informative”, suggesting that the rest of the document is “normative”. However, the Introduction (which gives a nice explanation of the general background) would appear to be “informative” as well. altered

In Section 1.1, first item in bulleted list, “...times out at” should be “...times out and” fixed

I suggest removed the Section heading 1.2 and simply merging the text into 1.1, with something like this transition sentence: “The following examples illustrate situations in the VO in which synchronous, stateless services are inadequate.” Done

In item 3 following the above, VOSpace needs a reference. Done

In Section 1.3, 2nd paragraph, “Most of special...” should be “Most of the special...” Fixed

In Section 1.4, change “E.g.” to “For example” (just seems bad form to start a sentence with an abbreviation). And later in that paragraph, CEA appears for the first time and is not referenced. It is referenced two paragraphs later, but should be referenced on the first occurrence. Done

Section 2.1.3, semicolon at end of introductory clause should be a colon. fixed

Section 2.1.5, ditto. fixed

Section 2.1.6, how is a service to supply a “don’t know” answer? How is this to be encoded? with negative or nil value

Section 2.1.7 mentions an optional errorSummary element, but this is not shown in the UML diagram in Section 2.1. the error summary is part of the error object - have changed a word "object" to "element" to remove possible ambiguity

Section 2.1.11, the first sentence is not very clear. Who/what is reading the parameter list? rewoded

Section 2.2.1, the UML diagram uses JobList as the outermost object, but now it seems to be called “jobs”. difference between object/uri/xml representation - the equivalences are given in table in 2.2.1

Section 2.2.2.2, the first sentence does not scan.

Section 4.2, last word “emit” might be better as “return”.

Section 4.3, check for missing periods (there are at least two).

Section 5, first sentence should end with a period, not a semicolon.

Appendix B has a number of casual remarks suggesting that the proposal is not very stable.

Primary concern: The first paragraphs of Section 4 note that the document does not define “two essential parts of the service contract.” The examples “are neither formal nor complete. The intention is to show a range of ways that the pattern can be applied without burdening the reader with the level of detail needed for a standard implementation.”

Well, when I read this I wonder just what is being defined at all, and how this document advances the cause of the IVOA. If it does not provide a full definition of how to implement and manage asynchronous jobs, what are software designers and implementers supposed to do with this? What exactly are we recommending, in the sense of promoting this to a REC? How can we judge having interoperable implementations when there is no detailed specification? I read the comment above from Alberto Micol, and do not find the responses from Pat Dowler and Paul Harrison very satisfying in this regard.

rjh, 19 Oct 2009

In response to the primary concern expressed above - the document is describing a pattern of use rather than an actual service. This form of document is in common with several other standard documents from the Grid and Web services group - e.g. SSOVOSI, where there is no whole service defined, but part of a service behaviour is defined. Perhaps this intention should be made clearer - even renaming he title to be "The Universal Worker Service Pattern" or something similar.

I am not sure what is not satisfying in the response to Alberto - he was merely asking if there were working implementations which is a condition of going to PR - Pat and I responded with URLs of working services. It might not be so trivial to test these services in a point-and-clicky way since the UWS standard does not say how to create a job (that is specific to the service implementation that is using UWS, and both of the services listed have different job creation mechanisms). however, it is possible to test the job control aspects of existing jobs (querying job metadata, getting results etc.) given the /(joblist) URL of each service. It might be easier to test the UWS aspects of the prototype TAP services as they come on stream, as there will be (hopefully) clients to test the end-to-end aspects of a complete TAP invocation.

Theory (Herve Wozniak, Claudio Gheller)

Comments from TCG during TCG Review (17 Feb 2010 - 19 Mar 2010)

Applications (Tom McGlynn, Mark Taylor)

I rather like the changes to section 2.1 with its more explicit discussion of the progression of states, but I'm not sure I've seen any discussion of the issues that drove identification of three new
states (UNKNOWN, HELD and SUSPENDED). What other changes are there in the document. It seems to me that in general it would be desirable that when there are significant changes (anything beyond typo fixes) between the RFC document and the final TCG approval reviewers be given an easy way of identifying the changes. E.g., the change bars that were available in the
TAP documents were very helpful. Alternatively these changes could be enumerated. Appendix A seemed to be set up for that, but I don't see any mention of these issues or other changes made.

The 3 "new" states had in fact been defined in the schema for some time - e.g http://www.ivoa.net/Documents/UWS/20090827/WD-UWS-1.0-20090827.html and earlier working drafts also - they were carried over from the CEA schema for a similar construct. It was pointed out http://www.ivoa.net/forum/grid/1002/0780.htm that they had not been described in the main body of the document. It was decided that at this late stage it was better simply to add the description to the main text rather than to change the schema, which would have been potentially highly disruptive to implementations that had already used the schema. -- PaulHarrison - 25 Mar 2010

In section 4.3 there is a phrase "parameters are allowed to be name files" which I don't understand. Is that supposed to be "allowed to
be file names"? [I'd assume a typo but given the amount of jargon
this could easily be another term of art that I'm unfamiliar with.]

it is a typo and the "allowed to be file names" meaning is what is intended.-- PaulHarrison - 25 Mar 2010

Sections 4.1,4.2, and 4.3 reference "named" and "unnamed" results, but I couldn't find any definition of what this means. I think that should be clarified if possible [I looked for all references to the string 'name' without finding any clarification. It seems to me this was clearer in earlier versions -- though the requirements on naming were a bit confusing.

Actually it is a mistake to have any distinction remaining in the text between named and unnamed results - all results must be given an identifier according to the schema - the real distinction here is that an implementation such as TAP may choose to have a fixed identifier string for a particular result. I will try to clear up the text in this regard.-- PaulHarrison - 25 Mar 2010

These last two issues are not serious though I hope they can be addressed and with these minor caveats I approve the document.
The issue of showing changes is more appropriately addressed in the context of the approval process generally than in this particular cycle.

Although there are not any change bars on the document, the edit history can easily be seen as the document is stored in the Volute google code svn. For instance the changes that were made as a result of RFC can be seen here-- PaulHarrison - 25 Mar 2010

Several other documents have been authored using this google code project, and I believe that it would be good if more were encouraged to do so - in this respect I think that the recent change from html to pdf being the required document format was a rather retrograde stop - however, as you say this should be discussed elsewhere.
-- PaulHarrison - 25 Mar 2010

Data Access Layer (Patrick Dowler, Mike Fitzpatrick)

I approve the document.
I suggest to summarize the implementation experience in having just a wiki page recapitulating some typical various implementations of this service.
The goal here would be to identify contact people with appropriate experience when building up a new service.
-- MireilleLouys, 07 June 2010

citations in the text to references of the form [label] are mostly italicized; however, the following citations are only have half-italicized:

s1.1, numbered example list, #1: "[std:adql]"

s1.3, 2nd para:

s3, 1st para: "[std:ssoauth]"

s4.1, 1st para: "[std:siap]"

s4.3, 3rd para: "[harrison05]"

All above fixed

Note that it was a little confusing in the discussion in the Applications section above which comments were Tom's and which were Paul's response. You might try being a bit more explicit in the labeling.

Semantics (Sebastien Derriere, Norman Gray)

I approve the document. SD

VOEvent (Rob Seaman, Alasdair Allan)

We perceive minimal overlap with VOEvent goals for the near and mid-future. No objections otherwise.

VO Query Language (Pedro Osuna, Yuji Shirasaki)

VOTable (Francois Ochsenbein)

Standards and Processes (Francoise Genova)

There is supposed to be strong dependencies between TAP and UWS.

The first PR version of UWS was dated from 9 September 2009. As already remarked, there is no information in the current PR version about the changes between the 9 September 2009 and the current 10 February 2010 versions.

The last TAP PR version is dated 25 February 1010, and if I remember well there has been significant changes in TAP between 9 September 2010 (first version of UWS PR) and the REC. I would like to know whether the TAP updates have affected TAP dependencies with UWS, and if yes if the required changes in UWS have been made.

Françoise Genova, 20 May 2010

I do not believe that there were any UWS relevant changes to TAP in last version, and so there were no requirements on UWS to change.

again, though not as easy to read as the change bars (though does show deleted items) the changes since first PR to now can be seen here