Net::Z3950 is a Perl module for writing Z39.50 clients.
(If you want to write a Z39.50 server,
you want the Net::Z3950::SimpleServer module.)

Its goal is to hide all the messy details of the Z39.50 protocol - at least by default - while providing access to all of its glorious power.
Sometimes,
this involves revealing the messy details after all,
but at least this is the programmer's choice.
The result is that writing Z39.50 clients works the way it should according my favourite of the various Perl mottos: ``Simple things should be simple,
and difficult things should be possible.''

If you don't know what Z39.50 is,
then the best place to find out is at http://lcweb.loc.gov/z3950/agency/ the web site of the Z39.50 Maintenance Agency.
Among its many other delights,
this site contains a complete downloadable soft-copy of the standard itself.
In briefest summary,
Z39.50 is the international standard for distributed searching and retrieval.

This complete program retrieves from the database called ``gils'' on the Z39.50 server on port 210 of indexdata.dk the first record matching the search ``mineral'', and renders it in human-readable form. Typical output would look like this:

Creates a new connection to the Z39.50 server on port 210 of the host indexdata.dk, noting that searches on this connection will default to the database called ``gils''. A reference to the new connection is stored in $conn.

$rs = $conn->search('mineral');

Performs a single-word search on the connection referenced by $conn (in the previously established default database, ``gils''.) In response, the server generates an result set, notionally containing all the matching records; a reference to the new connection is stored in $rs.

print "found ", $rs->size(), " records:\n";

Prints the number of records in the new result set $rs.

my $rec = $rs->record(1);

Fetches from the server the first record in the result set $rs, requesting the default record syntax (GRS-1) and the default element set (brief, ``b''); a reference to the newly retrieved record is stored in $rec.

print $rec->render();

Prints a human-readable rendition of the record $rec. The exact format of the rendition is dependent on issues like the record syntax of the record that the server sent.

Searches may be specified in one of two different syntaxes, both of which will be familiar to users of the Yaz toolkit. The default syntax is so-called Prefix Query Notation, or PQN, a bespoke format invented by Index Data to map simply to the Z39.50 type-1 query structure. The other is the Common Command Language, or CCL, an international standard query language often used in libraries.

CCL queries may be interpreted on the client side and translated into a type-1 query which is forwarded to the server; or it may be sent ``as is'' for the server to interpret as it may.

The interpretation of the search string may be specified by passing an argument of -prefix, -ccl or -ccl2rpn to the search() method before the search string itself, as follows:

Briefly, however, keywords begin with an @-sign, and all other words are interpreted as search terms. Keywords include the binary operators @and and @or, which join together the two operands that follow them, and @attr, which introduces a type=value expression specifying an attribute to be applied to the following term.

@and fruit @or fish chicken searches for records containing both ``fruit'' and at least one of ``fish'' or ``chicken''.

@or rock @attr 1=21 mineral searches for records either containing ``rock'' or ``mineral'', but with the ``mineral'' search term carrying an attribute of type 1, with value 21 (typically interpreted to mean that the search term must occur in the ``subject'' field of the record.)

CCL is formally specified in the international standard ISO 8777 (Commands for interactive text searching) and also described in section 4.1 (Query Syntax Parsers) of the Yaz toolkit documentation, YAZ User's Guide and Reference.

Briefly, however, there is a set of well-known keywords including and, or and not. Words other than these are interpreted as search terms. Operating grouping (precedence) is specified by parentheses, and the semantics of a search term may be modified by prepending one or more comma-separated qualifiers qualifiers and an equals sign.

So:

fruit searches for the term ``fruit'',

fruit and fish searches for records containing both ``fruit'' and ``fish'',

fish or chicken searches for records containing either ``fish'' or ``chicken'' (or both),

fruit and (fish or chicken) searches for records containing both ``fruit'' and at least one of ``fish'' or ``chicken''.

rock or su=mineral searches for records either containing ``rock'' or ``mineral'', but with the ``mineral'' search term modified by the qualifier ``su'' (typically interpreted to mean that the search term must occur in the ``subject'' field of the record.)

For CCL searches sent directly to the server (query type ccl), the exact interpretation of the qualifiers is the server's responsibility. For searches compiled on the client side (query side ccl2rpn) the interpretation of the qualifiers in terms of type-1 attributes is determined by the contents of a file called ### not yet implemented. The format of this file is described in the Yaz documentation.

Setting Search Defaults

As an alternative to explicitly specifying the query type when invoking the search() method, you can change the connection's default query type using its option() method:

By default, records are requested from the server one at a time; this can be quite slow when retrieving several records. There are two ways of improving this. First, the present() method can be used to explicitly precharge the cache. Its parameters are a start record and record count. In the following example, the present() is optional and merely makes the code run faster:

The second way is with the prefetch option. Setting this to a positive integer makes the record() method fetch the next N records and place them in the cache if the the current record isn't already there. So the following code would cause two bouts of network activity, each retrieving 10 records.

In asynchronous mode, present() and prefetch merely cause the records to be scheduled for retrieval.

Element Set

The default element set is ``b'' (brief). To change this, set the result set's elementSetName option:

$rs->option(elementSetName => "f");

Record Syntax

The default record syntax preferred by the Net::Z3950 module is GRS-1 (the One True Record syntax). If, however, you need to ask the server for a record using a different record syntax, then the way to do this is to set the preferredRecordSyntax option of the result set from which the record is to be fetched:

The record syntaxes which may be requested are listed in the Net::Z3950::RecordSyntax enumeration in the file Net/Z3950.pm; they include Net::Z3950::RecordSyntax::GRS1, Net::Z3950::RecordSyntax::SUTRS, Net::Z3950::RecordSyntax::USMARC, Net::Z3950::RecordSyntax::TEXT_XML, Net::Z3950::RecordSyntax::APPLICATION_XML and Net::Z3950::RecordSyntax::TEXT_HTML

(As always, option() may also be invoked with no ``value'' parameter to return the current value of the option.)

There are two broad approaches. One is just to display it to the user: this can always be done with the render() method, as used in the sample code above, whatever the record syntax of the record.

The more sophisticated approach is to perform appropriate analysis and manipulation of the raw record according to the record syntax. The raw data is retrieved using the rawdata() method, and the record syntax can be determined using the universal isa() method:

The raw data of GRS-1 records in the Net::Z3950 module closely follows the structure of physcial GRS-1 records - see Appendices REC.5 (Generic Record Syntax 1), TAG (TagSet Definitions and Schemas) and RET (Z39.50 Retrieval) of the standard more details.

The raw GRS-1 data is intended to be more or less self-describing, but here is a summary.

The raw data is a reference to an array of elements, each representing one of the fields of the record.

Each element is a Net::Z3950::APDU::TaggedElement object. These objects support the accessor methods tagType(), tagValue(), tagOccurrence() and content(); the first three of these return numeric values, or strings in the less common case of string tag-values.

The content() of an element is an object of type Net::Z3950::ElementData. Its which() method returns a constant indicating the type of the content, which may be any of the following:

Net::Z3950::ElementData::Numeric indicates that the content is a number; access it via the numeric() method.

Net::Z3950::ElementData::String indicates that the content is a string of characters; access it via the string() method.

Net::Z3950::ElementData::OID indicates that the content is an OID, represented as a string with the components separated by periods (``.''); access it via the oid() method.

Net::Z3950::ElementData::Subtree is a reference to another Net::Z3950::Record::GRS1 object, enabling arbitrary recursive nesting; access it via the subtree() method.

In the future, we plan to take you away from all this by introducing a Net::Z3950::Data module which provides a DOM-like interface for walking hierarchically structured records independently of their record syntax. Keep watchin', kids!

As with customising searching or retrieval behaviour, whole-session behaviour is customised by setting options. However, this needs to be done before the session is created, because the Z39.50 protocol doesn't provide a method for changing (for example) the preferred message size of an existing connection.

In the Net::Z3950 module, this is done by creating a manager - a controller for one or more connections. Then the manager's options can be set; then connections which are opened through the manager use the specified values for those options.

As a matter of fact, every connection is made through a manager. If one is not specified in the connection constructor, then the ``default manager'' is used; it's automatically created the first time it's needed, then re-used for any other connections that need it.

This is exactly equivalent to creating a ``vanilla'' manager with new Net::Z3950::Manager(), then setting the three options with the option() method.

Message Size Parameters

The preferredMessageSize and maximumRecordSize parameters can be used to specify values of the corresponding parameters which are proposed to the server at initialisation time (although the server is not bound to honour them.) See sections 3.2.1.1.4 (Preferred-message-size and Exceptional-message-size) and 3.3 (Message/Record Size and Segmentation) of the Z39.50 standard itself for details.

Both options default to one megabyte.

Implementation Identification

The implementationId, implementationName and implementationVersion options can be used to control the corresponding parameters in initialisation request sent to the server to identify the client. The default values are listed below in the section OPTION INHERITANCE.

Authentication

The user, password and groupid options can be specified for a manager so that they are passed as identification tokens at initialisation time to any connections opened through that manager. The three options are interpreted as follows:

If user is not specified, then authentication is omitted (which is more or less the same as ``anonymous'' authentication).

If user is specified but not password, then the value of the user option is passed as an ``open'' authentication token.

If both user and password are specified, then their values are passed in an ``idPass'' authentication structure, together with the value of group if is it specified.

By default, all three options are undefined, so no authentication is used.

The values of options are inherited from managers to connections, result sets and finally to records.

This means that when a record is asked for an option value (whether by an application invoking its option() method, or by code inside the module that needs to know how to behave), that value is looked for first in the record's own table of options; then, if it's not specified there, in the options of the result set from which the record was retrieved; then if it's not specified there, in those of the connection across which the result set was found; and finally, if not specified there either, in the options for the manager through which the connection was created.

Similarly, option values requested from a result set are looked up (if not specified in the result set itself) in the connection, then the manager; and values requested from a connection fall back to its manager.

This is why it made sense in an earlier example (see the section Set the Parameters) to specify a value for the preferredRecordSyntax option when creating a manager: the result of this is that, unless overridden, it will be the preferred record syntax when any record is retrieved from any result set retrieved from any connection created through that manager. In effect, it establishes a global default. Alternatively, one might specify different defaults on two different connections.

In all cases, if the manager doesn't have a value for the requested option, then a hard-wired default is used. The defaults are as follows. (Please excuse the execrable formatting - that's what pod2html does, and there's no sensible way around it.)

0 (This and the next four options provide flexible control for run-time details such as what record syntax to use when returning records. See sections 3.2.2.1.4 (Small-set-element-set-names and Medium-set-element-set-names) and 3.2.2.1.6 (Small-set-upper-bound, Large-set-lower-bound, and Medium-set-present-number) of the Z39.50 standard itself for details.)

1 indicating boolean true. This option tells the client to use a new result set name for each new result set generated, so that old ResultSet objects remain valid. For the benefit of old, broken servers, this option may be set to 0, indicating that same result-set name, default, should be used for each search, so that each search invalidates all existing ResultSets.

I don't propose to discuss this at the moment, since I think it's more important to get the Tutorial out there with the synchronous stuff in place than to write the asynchronous stuff. I'll do it soon, honest. In the mean time, let me be clear: the asynchronous code itself is done and works (the synchronous interface is merely a thin layer on top of it) - it's only the documentation that's not yet here.

This tutorial is only an overview of what can be done with the Net::Z3950 module. If you need more information that it provides, then you need to read the more technical documentation on the individual classes that make up the module - Net::Z3950 itself, Net::Z3950::Manager, Net::Z3950::Connection, Net::Z3950::ResultSet and Net::Z3950::Record.