This module is a SOAP-based (Web Services) client that can talk, and get data from an MRS server, a search engine for biological and medical databanks that searches well over a terabyte of indexed text. See details about MRS and its author Maarten Hekkelman in "ACKNOWLEDGMENTS".

Because this module is only a client, you need an MRS server running. You can install your own (see details in the MRS distribution), or you need to know a site that runs it. By default, this module contacts the MRS server at CMBI (http://mrs.cmbi.ru.nl/).

The main module is MRS::Client. It lets the user specify which MRS server to use, and few other global options. It also has a factory method for creating individual databanks objects. Additionally, it allows making query over all databanks. Finally, it covers all the SOAP communication with the server.

The URLs of the individual MRS servers, one providing searches (the main one), one running blast and one running clustal. Default values lead your searches to CMBI. If you have installed MRS servers on your own site, and you are using the default values coming with the MRS distribution, you create a client by (but see below parameter host for a shortcut):

Technical detail: These URLs will be used in the location field of the WSDL description.

Alternatively, you can specify these parameters by environment variables (because they will be probably same for most users from the same site). The parameters, however, still have precedence over the values of environment variables (even if they exist). The variables are: MRS_SEARCH_URL, MRS_BLAST_URL and MRS_CLUSTAL_URL.

The MRS servers are SOAP-based Web Services. Every Web Service has its own service name (the name used in the WSDL). You can change this service name if you are accessing site where they use non-default names. The default names - I guess almost always used - are: mrsws_search, mrsws_blast, mrsws_clustal.

You can also specify your own WSDL file, each one for each set of operations. It is meant more for debugging purposes because this MRS::Client module understands only current operations and adding new ones to a new WSDL does not magically start using them. These parameters may be useful when extending the MRS::Client.

The same names as the argument names described above can be used as method names to get or set the parameter value. A method without an argument gets the current value, a method with an argument sets the new value. For example:

The query (method next) returns entries sequentially, one databank after another. As with individual databanks, even here you can select maximum number of entries to deliver - the number is applied for each databank separately:

This package represents an MRS databank and allows to query it. Each databank consists of one or more files (represented by MRS::Client::Databank::File) and of indices (MRS::Client::Databank::Index).

A databank instance can be created by a new method but usually it is created by a factory method available in the MRS::Client:

my $db = $client->db ('enzyme');

The factory method, as well as the new method, creates only a "shell" databank instance - that is good enough for making queries but which does not contain any databank properties (name, indices, etc.). The properties will be fetched from the MRS server only when you ask for them (using the "getters" method described below).

The ranked queries (the ones achieved by and or or arguments) have assigned relevance score to their hits. The relevance score depends on the used algorithm. The available values for this arguments are defined in MRS::Algorithm:

The default format is 'plain'. The 'fasta' and 'sequence' formats are available only for databanks that have sequence data. For all formats, except for the 'header', the entries are returned as strings. For 'header', the entries are instances of MRS::Client::Hit.

Be aware that format is also a built-in Perl function, so better quote it when used as a hash key (it seems to work also without quotes except the emacs TAB key is confused if there are no surrounding quotes; just a minor annoyance).

This argument (eXtended format) enhances the format argument. It is used (at least at the moment) only for HTML format; for other formats, it is ignored.

Be aware, however, that the xformat depends on the structure of the HTML provided by the MRS. This structure is not defined in the MRS server API, so it can change easily. It even depends on the way how the authors write their parsing scripts. When the HTML output changes this module must be changed, as well. Caveat emptor.

The xformat is a hashref with keys that change (slightly or significantly) the returned HTML. Here are all possible keys (with a randomly picked up values):

MRS::XFormat::CSS_CLASS specifies a CSS-class name that will be added to all a tags in the returned HTML. It allows, for example, an easy post-processing by various JavaScript libraries. For example, if the original HTML contains:

<a href="entry.do?db=go&amp;id=0005576"></a>

it will become (using the value shown above):

<a class="mrslinks" href="entry.do?db=go&amp;id=0005576"></a>

MRS::XFormat::URL_PREFIX helps to keep the returned HTML independent on the machine where it was created. This option pre-pends the given prefix to the relative URLs in the hyperlinks that point to the data in an MRS web application. For example, if the original HTML contains:

Other hyperlinks - those not starting with query or entry - are not affected.

XFormat::REMOVE_DEAD deals with the fact that the MRS server creates hyperlinks pointing to other MRS databanks without checking that they actually exists in the local MRS installation. This may be fixed later (quoting Maarten) but before it happens this option (if with a true value) removes (from the returned HTML) all hyperlinks that point to the not-installed MRS databanks. For example, if the original HTML has these hyperlinks:

There is a small caveat, however. The MRS::Client needs to know what databanks are installed. It finds out by asking the MRS server by using the method db() (explained elsewhere in this document). This method returns much more than is needed, so it can be slightly expensive. Therefore, if your concern is the highest speed, you can help the MRS::Client by providing a list of databanks that you know you have installed. Actually, in most cases, you can create such list also by calling the db() method but depending on your code you can call it just ones an reuse it. For example, if you wish to keep hyperlinks only for 'uniprot' and 'embl', you specify;

xformat => { MRS::XFormat::REMOVE_DEAD() => ['uniprot', 'embl'] }

Finally, there is an option MRS::XFormat::ONLY_LINKS. It has a very specific function: to extract and return only the hyperlinks, not the whole HTML. It is, therefore, predestined for further post-processing. Note that all changes in the hyperlinks described earlier are also applied here (e.g. adding an absolute URL or a CSS class).

When this option is used, the whole method "$find->next" (or "db->entry") returns a reference to an array of extracted hyperlinks:

Each databank consists of one or more files. This method returns a reference to an array of MRS::Client::Databank::File instances. Each such instance has properties reachable by the following "getters" methods:

Each databank is indexed by (usually several) indices. This method returns a reference to an array of MRS::Client::Databank::Index instances. Each such instance has properties reachable by the "getters" method:

This object carries results of a query; it is returned by the find method, called either on a databank instance or on the whole client. Actually, in case of the whole client, the returned type is of type MRS::Client::MultiFind which is a subclass MRS::Client::Find.

Finally, a tiny object representing a hit, a result of a query before going to a databank for the full contents of a found entry. It contains the databank's ID (where the hit was found), the score that this hit achieved (for boolean queries, the score is always 1) and the ID and title of the entry represented by this hit.

The corresponding getters methods are db, score, id and title.

The next method (as shown above) returns just hits (instead of the full entries) when the format MRS::EntryFormat-HEADER> is specified.

The MRS servers provide sequence homology searches, the famous Blast program (namely the blastp program for protein sequences). An input sequence (in FASTA format) is searched against one of the MRS databanks. It can be any MRS databank whose method blastable returns true (e.g. uniprot). An input sequence and a databank are the only mandatory input parameters. Other common Blast parameters are also supported.

The invocation is asynchronous. It means that the run method returns immediately, without waiting for the Blast program to finish, giving back a job id, a handler that can be used later for polling for status, and, once status indicates the Blast finishes, for getting results (or an error message). This is the typical usage:

The main method that starts Blast with the given parameters and immediately returns an object MRS::Client::Blast::Job that can be used for all other important methods. If you plan to stop your Perl program and start it again later, you need to remember the job ID:

my $job = $blast->run (...);
print $job->id;

The job ID can be later used to re-create the same (well, similar) Job object (see method job below) that again provides all important methods (such as getting results).

The method run has following arguments (the Job object has the same "getter" methods), all given as a hash:

The returned Job object can be used to ask for the Job status, or for getting the Job results. There is one caveat, however. The re-created Job object is not that "rich" as was its original version: it does not know, for example, what parameters were used to start this blast job. Unfortunately, the MRS server keeps only the Job ID and nothing else. Fortunately, the parameters are needed only for the results in the XML format (see more about available formats below, in the method $job->results) - and you can add them (if you still have them), as a hash, to the job method when re-creating a new Job instance:

The Job object represents a single Blast invocation with a set of input parameters and, later, with results. It is also used to poll for the status of the running job. Instances of this objects are created by the run or job methods of the blast object. The Job's methods are:

Finally, the more interesting method. It returns an object of type MRS::Client::Blast::Result that can be either used on its own (see its "getter" method below), or converted to strings of one of the format predefined in MRS::BlastOutputFormat:

The format is the only parameter of this method. Default format is HITS. The conversion to the given format is done by overloading the double quotes operator, calling internally the method "as_string". You just print the object:

The module wrapping the multiple alignment program clustalw. The program is optional and, therefore, not all MRS servers may have it. Use the factory method for creating instances of MRS::Client::Clustal:

It returns a reference to an array of MRS::Client::Clustal::Sequence instances. Each of them has methods id and sequence. You can also just print the formatted alignment (it uses its own as_string method that overloads double quotes operator):

The MRS distinguishes between so-called ranked queries and boolean queries, and it recognizes also boolean filters. I probably need to learn more about their differences. That's why you may see some differences in query results shown by this module and the mrsweb web application (an application distributed together with the implementation of the MRS servers).

The contents of the search field in the mrsweb is first parsed in order to find out if it is a boolean expression, or not. Depending on the result it uses either a ranked or boolean query. It also splits the terms and combine them (by default) with the logical AND. For example, in mrsweb if you type (using the uniprot):

cone snail

you get 134 entries. You get the same number of hits by the MRS::Client module when using an and argument:

But you cannot just pass the whole expression as a query string (as you do in mrsweb):

print $client->db('uniprot')->find ('cone snail')->count;
0

You get zero entries because the MRS::Client considers the above as one term. And if you add a boolean operator:

print $client->db('uniprot')->find ('cone AND snail')->count;
4609

then the boolean query was used and, as explained by the MRS, the "query did not return an exact result, displaying the closest matches". But, fortunately, when you iterate over this result, you will get, correctly, just the 134 entries.

The MRS servers provide few more operations that are not-yet covered by this module. It would be useful to discuss which of those are worth to implement. They are:

This client module would be useless without having an MRS server (e.g. at http://mrs.cmbi.ru.nl/mrs-web/). The MRS stands for Maarten's Retrieval System and was developed (and is maintained) by Maarten Hekkelman at the CMBI (http://www.cmbi.ru.nl/), with the help and contributions from many others.

The MRS itself has also its own Perl module MRS.pm, called plugin and distributed together with the MRS, that accesses MRS server(s) directly, without using the SOAP Web Services protocol. The plugin was helpful to find out what the server might expect.

Additionally, the MRS distribution has few testing scripts that use SOAP protocol to access data in the same way as this MRS::Client module does. Therefore, this module can be seen as an extension of these testing scripts into a slightly more comprehensive and perhaps more documented package.

The MRS server provides Blast results that are not in XML. In order to make an XML output, this module uses, hopefully, the same format and conversion as found in the MRS web application mrsweb.