Forum OpenACS Development: Re: util_httppost

The solutions for http client functions might be divided into the following groups:

tcl-socket-based

tcl-implemented ns_socket-based

external

built-in

The tcl-socket based solutions (e.g. from tcllib) face the problem that these are select() based, and that tcl uses it's own notification management, which is not integrated with AOLserver/NaviServer. If one uses e.g. Tcl-threads, then notification management works well there, but that's sometime more work. One major limitation with select() is that it works only with up to 1024 file descriptors (on Linux; on other systems, the limit is sometimes lower). If one issues a select() with a fd above this limit one will experience a crash. Raising the limit is for most applications out of scope (requires a own C-library and kernel). Unfortunately, the ::xo::http has the same problem; one can say "who cares, we have just a few handles, this is not a real limit", but for scalability, it is. Some sites use e.g. NaviServer with web sockets in the large, there select() is not an option.

The tcl-implemented ns_socket-based solutions in AOLserver/NaviServer use the ns_* interface, these are limited to plain socket communications. There is ns_openssl_sockopen, but i can't comment about the state of that, i've never tried to use it. AOLserver uses on some places select() so the benefit over Tcl sockets is somewhat limited.

The external scripts/programs (such as wget/curl/...) have the most features, but these are not well integrated (error handling, encodings, ...), require exec (or better nsproxy), typically file-io, ...

The native commands ns_http (AOLserver and NaviServer) and ns_ssl (NaviServer module nsssl) are C-implemented and well integrated with the server. I can't say to much about AOLserver (except that it uses still on several places select() and that it is on a more or less frozen development state) since we switched all our installation to NaviServer several years ago. NaviServer is select()-free, and sees an active development. Also ns_http on NaviServer has more features than the AOLserver variant. on NaviServer one can e.g. specify to spool http client requests above a certain size to a file such it can be used for large files. With that functionality, i've implemented some time ago a NaviServer based reverse proxy, converting https to http, etc. I wouldn't like to write a reverse proxy based on wget/curl.

Concerning your questions: "is there now an interface allowing to upload multiple files and fields, by POST or GET, to a web server requiring authentication and using ssl and/or compression":

multiple files/fields: ns_http/ns_ssl work on the protocol layer, not the content layer. One can pass data to POST request, but it has to be encoded properly first.

If one passes the credentials in the GET/POST/... request, it handles the authentication. i'am not sure how much dialog one wants to have in a connection thread with other servers, since this has potential for a DOS attack.

compression: a server just sends content compressed, if so required in the GET/POST request. If the result is compressed the received content is compressed. For many applications this is the right thing (e.g. for the mentioned reverse proxy, which passed the content though). It is probably useful to add a flag to ns_http/ns_ssl to converted content automatically, maybe i could look into this the next days.

A common wrapper for ns_http+ns_ssl looks to me like a good idea.

My recommendation for NaviServer based installations is clear. For AOLserver, i would say that every solution with little effort is fine, like e.g. switching in the wrapper to xo::HttpRequest or to external programs.

ps: concerning gzip: strictly speaking, adding code to NaviServer to support gunzipping is not needed. Tcl 8.6 supports it gzi/gunzip natively, there are several add-ons libraries for earlier Tcl's around. However, it is better to combine gzip with the streaming facilities of NaviServer, such that the content can be incrementally guzipped, similar to what happens when sending content compressed. So adding support sounds useful for symmetry and convenience reasons, and might be useful for other commands as well.

Why is the force_ssl here? using "http" vs. "https" should be enough. i would see no big benefit on using e.g. "-url http://foo -force_ssl t" over "-url https://foo".

For large requests, allowing to spool to a file make sense (otherwise 2GB would be the upper limit, since evey Tcl variable can be at most 2 GB large). Additionally, this allows then still a small memory footprint.

I put force_ssl option because I clearly remembered about webservices having http url and requiring SSL. Strange enough...

I enhanced procs with file spooling, but on my installation it didn't work out... does it require some configuration to be enabled? I left the feature disabled by a single commentable line of code.

I have added an util::http::post proc to handle POSTing of form vars and/or files. Many parts of the old util_http_file_upload from Michael Cleverly came out very useful and I could conserve them in the new one. Some time ago I had already enhanced that very proc for my former company, so it could send more than one file, even for single form file fields allowing multiple values.

This is the new tcl file for http client functionalities. I leave it here for revision and approval.

The switches -files {/path/to/file /path/to/second-file ... } and -datas {$raw_data_1 $raw_data_2 ...}
are mutually exclusive. You can specify one or the other, but not
both. NOTE: it is perfectly valid to not specify either, in which
case no file is uploaded, but form variables are encoded using
multipart/form-data instead of the usual encoding (as
noted aboved).

If you specify either -files or -datas you
must supply a value for -names, which is
the list of names of the respective <INPUT TYPE="file" NAME="..."> form
tag.

Specify the -base64 switch if the file (or data) needs
to be base-64 encoded. Not all servers seem to be able to handle
this. (For example, http://mol-stage.usps.com/mml.adp, which
expects to receive an XML file doesn't seem to grok any kind of
Content-Transfer-Encoding.)

If you specify -files then -filenames is optional
(it can be infered from the name of the file). However, if you
specify -datas then it is mandatory.

If -mime_types is not specified then ns_guesstype
is used to try and find a mime type based on the filename.
If ns_guesstype returns */* the generic value
of application/octet-stream will be used.

Any form variables may be specified in one of four formats:

array (list of key value pairs like what [array get] returns)

formvars (list of url encoded formvars, i.e. foo=bar&x=1)

ns_set (an ns_set containing key/value pairs)

vars (a list of tcl vars to grab from the calling enviroment)

-headers specifies an ns_set of extra headers to send to
the server when doing the POST.

It is developing nicely. A few comments:
- the spoolsize options were added in aug last year, after the release of NaviServer 4.99.5; you can test spooling with the "tip" version of NaviServer from bitbucket, but one should wait for general use until 4.99.6 is released.
- there is already some redundancy between util::http::get and util::http::post. It would be better to implement a "util::http::request -method GET|POST|..." that does the heavy lifting, and maybe convenience methods for "get", "post" etc. on top of this when needed.
- one should use the Tcl expand operator rather than "eval".
- the result of the queue_cmd is not a queue, but a handle
- without requesting a gzipped content (via adding Accept-Encoding gzip), the result will never be gzipped.
- Currently, the list of options ot post is very long and not orthogonal. the data of the post request is either attribute/value pairs, or multipart variants "datas" or "files" if i see this correctly. I think, it would be conceptually nicer to have a "-data [util::http::data ... ]" which passes the raw data to the request. In many cases, "-data [form_vars -form ....]" will be sufficient, when the default encoding is set depending on data provided and multipart. Allowing a user to specify a raw data is certainly useful (e.g. for put requests, dav*, etc.)
- i am not sure, that the many ways specifying variables is needed (it should not part of "post" or "request".
- the "ns_zlib uncompress" does not a gunzip, the proper tcl command should be "zlib gunzip"; in case no decompressor is available, an error should be raised.