A shell to query database servers in the Virtual Observatory

TAP ("Table Access Protocol") is a protocol that lets you send database
queries and data to servers within the Virtual Observatory (VO). Like
the WWW, TAP is built on HTTP, but working it with a web browser is
tedious to impossible.

You can use the tapquery module from GAVO's votable package, but for
ad-hoc work and quick queries, something shell-like is much nicer;
therefore, we have wrapped tapquery into the TAP shell. It comes with
job and server management, SAMP integration, and command line
completion.

For installation instructions, refer to the tapsh distribution page.
Since tapsh does not include a results viewer, you should install
topcat alongside.

This document should suffice to get you up to speed with TAP. In
particular, you should never need to read IVOA's TAP specification.
It is aimed at implementors, not users, and cannot be understood without
largish body additional specifications. If you miss information on TAP
in this document, we consider this as a bug and will gladly fix it.

TAP transmits queries in the Astronomical Data Query Language (ADQL).
This document is not an introduction to ADQL. The IVOA ADQL
specification may be helpful to people already proficient in SQL.
Unfortunately, there is not yet a real "introduction" to ADQL; you could
peruse the lecture notes for a talk on ADQL; send us further
questions (and, of course, answers) to expand them into a real
introduction.

To see tapsh in action, open a terminal and type "tapsh" (without the
quotes). If the command cannot be found, you will need to amend your
shell's search path; if you cannot figure out how to do this locally,
you are welcome to ask us. Then, start topcat (the sequence is
not that critical, but it's easier this way).

After these preliminaries, type into the terminal window with tapsh's
prompt:

At any time, tapsh has at most one current server and at most one
current job. Both are persistent over sessions (where we do not
define the behavior if you run several tapshs in parallel; there should
be no reason to do so).

In tapsh, servers are identified by their IVOA identifiers; those are
URIs (in effect, strings) starting with ivo://. To set the current
server, use server command. Of course, nobody expects you to
memorize the rather clumsy identifiers, so tapsh has tab-completion.
Simply hit the tab key after server to see the servers known to tapsh;
type a few characters and hit tab again until the IOVA id is complete.

To create a new job, type in your query ("select ..."). This job then
becomes the current job. The job command lets you switch to another
job (again, you can use tab completion). To see a list of all jobs
known to tapsh at any given moment, use the ls command.

A TAP job has a phase on the server side. This phase is one of

PENDING -- the job has been created and can be manipulated. This is
when you add uploads or change the job's destruction time.

QUEUED -- the server waits for execution. At this point, most TAP
servers will deny attempts to manipulate it; in particular, they may
no longer let you raise the execution time limit or change the
destruction time.

EXECUTING -- the job is running.

ABORTED -- the job was aborted, either because you stopped it or
because the execution time limit has been reached

ERROR -- there was an error during the execution of the job.

COMPLETED -- the job has completed successfully; a result is
available.

TAP is a one-way protocol, meaning that all communication is initiated by the
client. In consequence, tapsh has to explicitely ask the server for the
phase of a job. It does so on certain commands (e.g., job and update).
The result is cached in the last known phase. This is what the ls
command shows, and it may be out of date. Use the job command to
obtain the real remote status at any given time.

The phases ABORTED, ERROR, and COMPLETED are final phases.

Jobs have server-side identifiers. Since these may be weird strings
(and might not be unique across servers), tapsh gives every job a
nickname. These are locally unique strings used as (mostly optional)
arguments to many of tapsh's commands. Tapsh auto-generates a hopefully
cute-sounding six-letter nickname when you create a job. You can use
the nick command to change the current job's nickname. This may be
convenient to assign mnemonic names to jobs, and it is particularly
useful in scripts. Nicknames must be composed of (ASCII) alphabetic
characters exclusively.

VO-enabled applications communicate using a protocol called SAMP
designed to let them exchange tables and other data. tapsh is a SAMP
client (and will start a hub if none is running yet). In conjunction
with tapsh, you should at least run topcat to have a way to
(conveniently) inspect the result tables.

Although many setups are conceivable, for now you should first start the
tapsh and then topcat. You can see everything is working properly if
you see an icon of a tap (ok, faucet) in topcat's SAMP panel. If you
don't, click the connection icon in the lower right corner of topcat's
main window, wait for the disconnection, and then click on it again to
reconnect. If then there's no tap logo, complain to us.

Simple Queries

The first step in crafting a query is figuring out which tables contain
data relevant to what you are interested in. Tapsh relies on a
supporting service in the GAVO data center to do this, the Global TAP
Schema or GloTS for short. The metasearch command opens a web
browser with a form letting you search for columns by UCD and table and column
keywords. The result table shows, among other things, the IVOA id of
the service containing this column. Use tapsh's server command
(either copying and pasting or tab completion) to set the current
server.

The GloTS result page has links to the table metdata that you will need
to craft your query; at any time, you can use the tables command to
get a browser window giving the tables available on the current server.

Simply type in a query to create a new job. This does not start your
query. After a select command, the job is is PENDING. This is the
time to execute the keepfor, limit, and upload commands.

To actually start a job use the start command to immediately regain a
prompt (e.g., to start further queries, and of course to check your
job's progress), or the run command to have tapsh block and poll the
remote status now and then and return when the job has reached a final
state.

When a job has successfully finished (i.e., ends up in the COMPLETED
phase), a result is made available on the server. This result always is
in VOTable format for jobs created by tapsh.

To view it, send it to SAMP client, typically topcat. You can use the
broadcast command to send it to all registered clients, or the send
command to send it to a specific client. In the typical case, you
would say send to topcat.

You can also save the result to a local file using the save command.
Note that the local file name must always be in double quotes; you would
thus say save "result table".

Since the select.../run/send to topcat sequence is so common, there
is a shortcut: Simply append a semicolon to the query, and tapsh will do
the run and send for you.

For simple result sets, changing to topcat (or some other SAMP client)
may be too tedious. In those cases, you can get an ASCII rendition
of the result set within tapsh using the dump command. Again,
there's a shortcut for select. If you append an exclamation mark rather
than a semicolon to the query, the result will be dumped rather than
sent.

Not that sending, saving or dumping a result does not consume it; it
needs to be retransferred from the server for each operation, though.

When done with a job, you should use the delete command to remove it
from both tapsh's job list and the server.

Most of the job manipulation commands optionally accept a nickname
argument and will then operate on the job selected.

Uploads

One of the most powerful features of TAP is the ability to upload data
into (temporary) database tables. Not all servers support this feature,
though.

Tapsh's upload command has the form upload <something> as
<tablename> and is issued after a select and before start or
run; the somethings are discussed below, tablename is an SQL
identifier (i.e., roughly an alphabetic character followed by
alphanumeric characters). The contents of the uploaded file will appear
to the query as tap_upload.<tablename>. You can upload more than
one file.

To upload data, it must be in VOTable format. This is always true for
tapsh's jobs, so -- and this probably is the most common use -- you can
upload the last job's (or the last jobs') result(s). To do this, say
upload result <nickname> as <tablename>.

Another source of VOTables is the interprocess communication SAMP. You
can, for example, come up with some data in VODesktop or manipulate a
table in topcat and then use SAMP to send it to tapsh. After that, you
can say upload samp as <tablename>. tapsh will always use the last
table sent to it in this form.

Many VOTables may have a (http) URL. In particular, all VO cone
searches end up having GETtable URLs. To upload from some URL, just
enter the URL like upload http://xxx as <tablename>. Only http URLs
are permitted at this point.

Finally, you can upload local data, where the file name needs to be
enclosed in double quotes: upload "<path to file>" as <tablename>.
If your data is something other than a VOTable, topcat will help you
convert it to VOTables. Note that no database server will handle
sexagesimal coordinates (minutes and seconds detached). If you have
those, convert them to decimal degrees first -- again, TOPCAT will
usually help you there with its hmsToRadians or dmsToRadians functions.

Since the expression for a simple, positional crossmatch is quite
verbose, you can type Escape-M (for "match") to insert a template. If
you want further macros of this sort, use the readline facilities.

You could, for example, add:

$if Python
Control-n: "select * from "
$endif

to your ~/.inputrc; this would make Control-N insert "insert select *
from into" tapsh's command line. Yes, you cannot now use different
bindings for the tapsh and the python interpreter. We'll retrofit that
on demand.

Tapsh supports a limited form of scripting, with the idea that you can
write your queries in files (e.g., for documentation purposes) and then
execute these files. In scripts, you can use all tapsh commands; in
addition, you can use continuation lines to split long command lines
for readability. Just have a backslash as the last character of
a line, like this:

select field1, field2, field3\
from foo\
join bar\
where field4 between 4 and 5\
and field6<field2

To execute the script, just pass its name to tapsh. If you insist,
you can even write:

#!/usr/bin/env tapsh

in the first line of your script, make it executable and then run the
tapsh script as a command.

Note that we will not add control structures (loops, selection, etc) to
tapsh. If you need this, you should use python. If you run tapsh, you
already have the votable tapquery module on your system that you can
use for that. Further scripting (e.g., of SAMP) could be effected by
using tapsh itself as a python module. We will document this operation
mode on request.

addserver command

To use non-registred servers, you can manually add them using their
access URL, which is the URL of the query endpoint without any sync or
async.

The server id can be any string not containing whitespace.

Note that manually added servers will be lost when you use the refreshServers
command. You could write a short script to add your preferred local
servers if you really wanted to, but the correct way is to bug the
server operators to register their servers. There's no documented way
to remove a server locally added except by removing them all using
refreshServers.

broadcast command

broadcast [<nick>] -- broadcasts a result to all connected SAMP clients.

The URL of the current or selected job's result will be sent to
all SAMP clients connected and listening to table.load.votable.
The function will error out if SAMP has been disabled or if no
result currently exists.

delete command

delete [<nick>] -- destroys a job server-side and locally

Without an argument, the current job is deleted, otherwise the named
job is deleted. If the job is running, it is aborted.

It is generally considered nice to delete a job when done with it. You
can use the purge command to delete many jobs at a time for convenience.

dump command

dump [<nick>] -- dumps the query result to the console

If a job has a result, dump retrieves it and formats it as a rough
ASCII table. The result is not beautiful, and no attempt is made
to accomodate to screen widths. Thus, for nontrivial result sets,
you should really use "send to topcat" or similar.

error command

error [<nick>] -- give error message of current or selected job

TAP servers give error messages under a special resource. The
error command retrieves that resource, tries to parse it as a
VOTable and returns TAP error messages contained in that VOTable
if successful. Maformed responses will be printed verbatim.

Without an argument, the error command shows the message for the
current job, otherwise for the job specified using its nickname.

The behavior of this command is unspecified if the job is not in
the ERROR phase.

exec command

exec <file name> -- creates a query from the content of file name

exec lets you prepare and edit queries in your favourite editor.

help command

help -- gives a short synopsis of available commands

Without an argument, help prints a synopsis of the commands available.
With an argument, it will print a longer explanation of what the
command does.

job command

job [<nick>] -- gets/sets current job

A TAP job has rich server-side metadata, in particular the query
issued. The job command shows a synopsis of this data, retrieved
from the server. It updates the "last known" phase information, too.

Without an argument, job shows the current job's metadata, otherwise
that of the named job.

Certain DSA servers produce malformed responses that lead tapsh to believe
the job was dead on the remote side. Don't worry about this, that's
just a harmless protocol mismatch.

keepfor command

keepfor <num> -- ask the server to keep the job for another <num> days

TAP servers only keep jobs around for a limited period. To find out
when they are scheduled for destruction, just the job command.

The keepfor command requests to change that destruction time to <num>
days in the future. TAP servers are free to ignore or modify your
request, which is why the command outputs the new destruction time.

limit command

limit <num> -- tell the server to let the job run for up to <num> seconds

TAP servers (usually) enforce limits on the execution time of queries,
such that a job is aborted after, say, an hour or a day. For
long-running queries, that period may be too short.

With the limit command, you can ask the server to raise (or lower)
this time limit to the number of seconds specified in the argument.
The server is free to discard or modify this request. Therefore,
the new time limit is printed by the command.

ls command

ls -- lists known jobs

tapsh keeps a list of jobs started but not deleted. The ls command
lets you inspect that list. From this list, you can see what
jobs you can make the current job again using the job command.

The phases given in ls are the phases it found when tapsh last
looked. To have ls show current remote phases, use the update
command.

metasearch by itself just opens a browser window with GloTS' form interface.
If you give argument(s), you will be sent to a result page. A
first argument that contains dots or a semicolon will be interpreted
as a UCD pattern (* and ? work as in shell patterns), anything else
will be column description words.

nick command

nick <nick name> -- sets the current job's nick name

Using nick, you can assign mnemonic names to the current job. This is
particularly useful in scripts, since it lets you refer to jobs previously
created in, e.g., upload commands.

If a job other than the current one already has the given nick, it is deleted
on the server. Again, this is what you want in scripts.

Nicknames must consist exclusively of ASCII alphabetic characters
(a-z, A-Z). The automatically assigned nicknames are always
six lowercase characters long, with consonants and vowels alternating
(in the hope that the result is pronounceable). It is a good idea
to use names constructed differently for your fixed nicks.

Example:

select * from table1
nick FirstRes
run
select * from table1 join tap_upload.uploaded
upload result FirstRes as uploaded
run

purge command

purge |failed|all -- deletes server-gone/failed/all jobs

Manually deleting jobs becomes tedious, in particular when
experimenting. The purge command lets you delete all job
removed on the server side (default; you will need to run update
to make tapsh aware of server-side destructions), or additionally
jobs in error phases (i.e., those in ERROR or ABORTED phases) when
the failed keyword is given.

purge all destroys all known jobs. This, in particular, means
you lose all results still stored server-side.

quit command

quit -- shuts down tapsh

You don't need help for this, do you?

refreshservers command

refreshservers -- update the list of TAP servers from the GAVO DC

tapsh relies on the resources of GAVO's global TAP schema (GloTS),
a registry of TAP services and the tables contained within them.
When run for the first time, it will retrieve a list of TAP
services known to GloTS and keep a local copy.

Since new TAP servers come around now and then, you should rund
refreshservers periodically to keep up to date. A clear sign
you should do so is when the server command complains about
unknown servers.

run command

run [<nick>] -- starts a job and blocks until finished

Without an argument, run starts the current job, otherwise the named
job is run. Tapsh will block until the job has reached a final
phase.

You can abort the job started by run by hitting ^C. There currently
is no way to "background" a running job. Pester us if you want this.

Use the start command to start a job and immediately have a prompt
again.

After run, the job will be in a final state. Use the job command
to see which it is.

save command

save "<file name>" [<nick>] -- save the job's result to a local file.

The save command lets you retrieve a completed job's result and store
it in a local file. You must enclose the local file name in double
quotes, giving full paths is legal.

select command

select <query tail> -- creates a new query on the current server

The select command creates a new job on the current server. This does
not start the execution of the query, it just creates a job. No syntax
checks are performed locally, and the servers will usually only check
the syntax when the query is actually executed.

The job created will be the new current job.

tapsh will automatically make up a nickname for the job. You can change
the local nickname using the nick command.

Since it is such a common operation, there is a shortcut to immediately
process the job and send its result to topcat on successful completion: Just
append a semicolon (;) to the query. This is equivalent to the sequence:

select foo from bar
run
send to topcat

Similarly, if you append a bang (!) to a query, the result is
immediately retrieved and dumped.

Some TAP servers run DSA. In late 2010, those still speak an early dialect
of ADQL. You will notice this when, on executing the select statement,
you will receive some error message talking about ADQL/X for valid ADQL.
Mostly, you can fix this by prefixing all column names with the table
name, like so:

select t.ra from dsaControlledTable as t where t.dec<-89

send command

send [<nick>] to <recpient> -- sends a result to a SAMP client <recipient>

Send works like broadcast, except that you can select the SAMP
client to send to. Try the tab key to see the client names
assigned by tapsh. The first topcat instance will always be
called topcat, though.

The client list is only updated every 10 seconds, so you may
have to wait a bit if you hit Tab and only then started the
destination client.

server command

server [<ivo-id>] -- gets/sets the current server (try TAB for ids)

tapsh almost always has a current server, the one new jobs are
created on. Only TAP servers known to tapsh can be set (i.e.,
you cannot just grab an ivo id from somewhere). Use the refreshservers
command to update tapsh's internal list of TAP servers from GAVO's
global TAP schema.

The ids used by tapsh are IVOA identifiers (except where you assigned
them yourself using addserver). These are, admittedly, somewhat
unwieldy, but that's what tab completion is for.

setpar command

setpar <parName> <value> -- sets a remote parameter.

Internally, TAP has a number of parameters. You can set them using
this command. However, since most parameters are actually managed by
other commands, you can break a lot here. So, only use this if you
know what you're doing.

A useful example could be setPar FORMAT fits to ask a TAP
server to return the result as a fits binary table. However, such
a result would then not work as an upload.

start command

start [<nick>] -- starts a job server-side and immediately returns

Without an argument, the current job is started, otherwise the named
job is started. Contrary to the the run command, you immediately
get a tapsh prompt again. You can check the job's status using the
job command.

tables command

GAVO's global TAP schema offers a web interface for inspecting the
tables on the current server. The table command opens a web browser
window that shows the tables available and lets you inspect the table
schemata.

update command

update [all] -- updates phases for non-finished or all jobs

The update command causes tapsh to query the servers of all jobs
not known to tapsh to be in a final phase (i.e., those in PENDING,
QUEUED, or EXECUTING) for their phases and update their metadata
accordingly.

With the optional all argument, even jobs in final phases are
updated. This is mainly to detect jobs deleted server-side (e.g.,
because their destruction time has passed). These will then go
to the (tapsh-internal) DESTROYED phase.

Note that tapsh will not monitor its jobs automatically, so the output
of the ls command may be out of date. The "known" phase of a job will
be updated by the run and job commands as well.

upload command

upload (result <nick>|samp|"local file"|URL) as <name> -- adds an upload to the current job

The upload command allows you to temporarily ingest your own data or
previous results into the server database. Not all TAP servers
support this.

The data uploaded must be in VOTable format and preferably only
contain a single table.

You can upload a previous job's result by nickname (try upload result
<Tab>). To upload a local file, put it in double quotes. HTTP URLs
can also be uploaded, but they must be accessible to the local host,
since it is downloaded from the source server and uploaded to the
TAP server (this is because it is far more likely that the remote server
cannot access a URL accessible to the client than vice versa).

The special upload samp form will upload the last table sent to
tapsh (or broadcast) by SAMP clients. This is a convenient way to
use data generated by other VO clients (e.g., VODesktop).

You must give the uploaded table a name consisting of alphabetic
characters and an underscore. In your queries, you refer to the
uploaded table as tap_upload.<tablename>.

You can upload more than one file in this way for complex joins.

wait command

wait <seconds> -- just waits for the specified number of seconds

The wait command is a cheap hack to allow scripts to pause a bit before
exiting. The purpose of this is to give SAMP clients a chance to complete
their dialogs with the tapsh. Admittedly a command wait_for_samp_to_complete
would be better, but even for SAMP transactions that tapsh has initiated
that is not trivial. Hence, this hack.