Abstract

This proposal provides an HTTP client interface for XPath
2.0. It defines one extension function to perform HTTP
requests, and has been designed to be compatible with XQuery 1.0
and XSLT 2.0, as well as any other XPath 2.0 usage.

Appendix

1 Introduction

1.1 Namespace conventions

The module defined by this document does define one function in the namespace
http://www.expath.org/mod/http-client. In this document, the
http prefix, when used, is bound to this namespace URI.

1.2 Error management

Error conditions are identified by a code (a QName.) When such
an error condition is reached in the evaluation of an expression, an dynamic
error is thrown, with the corresponding error code (as if the standard XPath
function error had been called.) TODO: Have not been defined yet.

2 The http:send-request function

This module defines an XPath extension function that sends an
HTTP request and return the corresponding response. It supports
HTTP multi-part messages. Here is the signature of this function:

http:send-request($uri as xs:string?,
$request as element(http:request)?,
$content as item()?,
$serial as item()?) as item()+

$uri is the HTTP or HTTPS URI to send the request to. It is an
xs:anyURI, but is declared as a string to be able to pass literal
strings (without requiring to explicitly cast it to an xs:anyURI.)

$content is the request body content, for HTTP methods that can
contain a body in the request (POST and PUT.) This is an error if
this param is not the empty sequence for other methods (DELETE, GET,
HEAD and OPTIONS.)

$request contains the various parameters of the request, for instance
the HTTP method to use or the HTTP headers. Among other things, it
can also contain the other param's values: the URI, the content and
the serialization option. If they are not set as parameter to the
function, their value in $request, if any, is used instead. See the
following section for the detailed definition of the http:request
element.

$serial defines the serialization option to serialize the content
to the HTTP request. It can be either a serialization method (a string,
either 'xml', 'html', 'xhtml' or 'text',) the name of an output definition
(a string, which is the name of a named xsl:output instruction,) or an xsl:output
element itself. The content is then serialized accordingly to the chosen method
or xsl:output regarding [Serialization].

Besides the 4-params signature above, there are 3 other signatures that
are convenient shortcuts (corresponding to the full version in which
corresponding params have been set to the empty sequence.) They are:

http:send-request($request as element(http:request)) as item()+http:send-request($uri as xs:string?,
$request as element(http:request)?) as item()+http:send-request($uri as xs:string?,
$request as element(http:request)?,
$content as item()?) as item()+

3 Sending a request

The functions defined in this module make one able to send a request to an
HTTP server and receive the corresponding response. Here is how the request
is represented by the parameters to this function, and how they are used
to generate the actual HTTP request to send.

3.1 The request elements

The http:request element represents all the needed
information to send the HTTP request. So it is always possible
to create such an element that will carry over all the needed info
for a particular request. For some of those values though, you
can use an additional param instead. For instance, some signatures
define the parameter $uri. If the value of this parameter
is not the empty sequence, it will then be used instead of the value
of the attribute href on the http:request
element.

method is the HTTP verb to use, as GET, POST, etc. It is case
insensitive

href is the URI the request has to be sent to. It can be overridden
by the parameter $uri

status-only control how the response will look like; if it is
true, only the status code and the headers are returned, the content is not
(no http:body nor http:multipart, nor the interpreted additional value in the
returned sequence, see hereafter.)

username, password, auth-method
and send-authorization are used for authentication (see section
below.)

override-content-type is a MIME type. It can be used only with
http:request, and will override the Content-Type header returned
by the server.

follow-redirect control whether an HTTP redirect is automatically
followed or not. If it is false, the HTTP redirect is returned as the response.
If it is true (the default) the function tries to follow the redirect, by sending
the same request to the new address (including body, headers, and authentication
credentials.) Maximum one redirect is followed (there is no attempt to follow
a redirect in response to following a first redirect.)

http:header represent an HTTP header, either in the
http:request or in the http:response elements, as
defined below.

http:multipart represents a multi-part body, either in a request
or a response, as defined below.

http:body represents a multi-part body, either in a request
or a response, as defined below. It can be overridden by the parameter
$content (the way $content is used to build the
body can be controlled by the parameter $serial, see section
below for details.)

<http:header name = string
value = string/>

The http:header element represents an HTTP header, either in a request
or in a response.

The http:body element represents the body of either an HTTP request
or of an HTTP response (in multi-part requests and responses, it represents
the body of a single one part.)

The content-type and encoding attributes are used to
control the way the content of this element is used to create the HTTP request
(how it is serialized to the request content.) See section below for details.
The id attribute specifies the value of the HTTP header
Content-ID and description the value of the HTTP header
Content-Description. The href attribute can be used
in a request to set the body content as the content of the linked resource instead
of using the children of the http:body element (children of this
element and the href attribute are mutually exclusive.)

The http:multipart element represents an HTTP multi-part request
or response. The content-type attribute is the media type of the
whole request or response, and has to be a multipart media type (that is, its
main type must be multipart.) The boundary attribute
is the boundary marker used to separate the several parts in the message (the
value of the attribute is prefixed with "--" to form the actual
boundary marker in the request; on the other way, this prefix is removed from
the boundary marker in the response to set the value of the attribute.)

3.2 Serializing the request content

If the request can have content (one body or several body parts,) it can be
specified by the http:multipart element, the http:body
element, and/or the parameter $content. If $content
is not the empty sequence, it replaces the value of the http:body
element (in multipart, if there are several bodies, exactly one http:body
must be empty.) For each body, the content of the HTTP body is generated as follow.

The parameter $serial is used to control the way the content is
serialized. This parameter can be an xsl:output element, as defined
in [XSLT 2.0], and the serialization is defined in [Serialization].
$serial can also be a string, either 'xml', 'html',
'xhtml' or 'text' (other values are implementation-defined,
as explained in the above mentioned recommendations.) (Note: $serial should be
able to be a function item too, when EXPath will have defined the corresponding module.) If
$serial is the empty sequence, the default value for this parameter depends
on the content-type of the body: it is 'xml' if it is an XML
media type, 'html' if it is an HTML media type, 'xhtml' if
it is application/xhtml+xml or 'text' for any other case.

3.3 Authentication

HTTP authentication when sending a request is controlled by the attributes
username, password, auth-method and
send-authorization on the element http:request.
If username has a value, password and
auth-method must have a value too. And if any one of the three
other attributes have been set, username must be set too.

The attribute auth-method can be either "Basic" or
"Digest", but other values can also be used, in an
implementation-defined way. The handling of those attributes must be done
in conformance to [RFC 2617]. If send-authorization
is true (default value is false) and the authentication method supports
generating the header Authorization without challenge, the
request contains this header. The default value is to send a non-authenticated
request, and if the response is an authentication challenge, then only send
the credentials in a second message.

4 Dealing with the response

After having sent the request to the HTTP server, the function waits for
the response. It analyses it and returns a sequence representing this
response. This sequence has an http:response element as
first item, which is followed be an additional item for each body or
body part in the response.

4.1 The result element

This is the first item returned by the function defined in this module.
The status attribute is the HTTP status code returned by the
server, and message is the message coming with the status on the
status line. The http:header elements are as defined for the
request, but represent instead the response headers. The http:body
and http:multipart elements are also like in the request, but
http:body elements must be empty.

4.2 Representing the result content

Instead of being inserted within the http:response element, the
content of each body is returned as a single item in the return sequence.
Each item is in the same order (after the http:response element)
than the http:body elements. For each body, the way this item
is built from the HTTP response is as follow.

If the status-only attribute has the value true
(default is false,) the returned sequence will only contain the
http:response element (with the headers, but also the empty
http:body or http:multipart elements, as if
status-only was false,) and the following items, representing
the bodies content are not generated from the HTTP response.

For each body that has to be interpreted, the following rules apply in order to
build the corresponding item. If the body media type is a text media type, the
item is a string, containing the body content. If the media type is an XML
media type, the content is parsed and the item is the resulting document node.
If the media type is an HTML type, the content is tidied up and
parsed (this process is implementation-dependant) and the item is the resulting
document node. If this is a binary media type, the content is returned as a
base64Binary item. From the previous rules, a result item can then be either a
document node (from XML or HTML,) a string or a base64Binary.

If the attribute override-content-type is set on the request, its
value is used instead of the content-type returned by the HTTP server (TODO: how
does it fit with multipart responses?)

5 Content types handling

In both requests and responses, MIME type strings are used to choose the way the
entity content has to be respectively serialized or parsed. Four different kinds
of type are defined here, which are used in the above text about sending request
and receiving response. The intent is to provide the spirit of the entity content
handling regarding its content type, but an implementation is encouraged to deviate
from those rules if it is obvious that a particular type should be treated in a
specific way (normally, that would be the case only to treat a binary type as
another type.)

An XML media type has a MIME type of text/xml,
application/xml, text/xml-external-parsed-entity,
or application/xml-external-parsed-entity, as defined in
[RFC 3023] (except that application/xml-dtd is
considered a text media type.) MIME types ending by +xml are
also XML media types.

An HTML media type has a MIME type of text/html.

Text media types are the remaining types beginning with text/.

Binary types are all the other types. An implementation can treat some of
those binary types as either an XML, HTML or text media type if it is more
appropriate (this is implementation-defined.)

A References

The structure of most of the elements and most of the attributes used in this
candidate are inspired from the corresponding step in [XProc].