Search Guide

Our search engine tries to offer today's typical web searching
experience, as gained with popular search engines such as Google. The nature of bibliographic
searching differs from that of a web page searching, though. We
provide many extensions to enable a complex and precise structured
search, including an combined metadata, fulltext and reference search
in one go. This page lists several tips and tricks that you may find
useful to this effect.

The default search mode is simple search that
basically provides you with one input box where you can type your
query, followed by a possibility to choose one of the common indexes
to search within. You would usually simply type the keywords you are
interested in and hit return. For example, if you are interested in
documents on standard model that are written by (or mention)
Ellis, you would type:

and on the search results page you could further add/remove keywords
to get more precisely at what you are looking for, as is mentioned below.

The advanced search interface provides you with
explicit tools to play with: you can change the matching type from the
default word matching to phrase searching or the regular matching; you
can use boolean queries in several indexes, etc. For example, to find
all the documents written by Ellis, J spelled exactly that
way that contain either of the words muon or
neutrino in the title and that were published in
2001, you would type:

Note that Simple Search can provide you basically the same
functionality, if you make use of special syntax that is explained in
the text below. The simple-versus-advanced does not refer to the
functionality that is being provided but rather to the amount of
parametrization you can "tweak". We conform to the common
use of the simple/advanced terms as found in other search engines.

Much of what follows will deal with a question on "how a power user
would use the simple search interface". Recall that you can always go
to the Advanced Search for more query assistance.

After you submit your query, the search engine will analyze it and
will try to always guide you in case no exact match could be found.
For example, it would print you a list of closest indexed terms in
case of spelling troubles:

Alternative choices will be printed in red. The search engine
will similarly warn you when your search terms could not be
found, or when they could but your boolean query couldn't be met. The
search engine will also silently try to search for alternative forms
(e.g. remove punctuation), etc.

Thanks to multiple search stages and the guidance provided at each
stage, it is usually sufficient to simple type what you are looking
for and see what the system says in return. If you aren't satisfied,
you would then add/remove words from your query until the satisfactory
reply.

The default search mode is a search for words. This
means that any whitespace you type is not significant, but is rather
interpreted to mean "add an automatic boolean AND between words", like
Google does. For example, to find all records that contain both the
word ellis and the word muon anywhere in the record,
type:

The whitespace would be significant if you include it within quotes.
There are two phrase searching modes:

The double quotes instruct the search engine to search for
exact phrase. This phrase search mode will match if and
only if the given metadata field is exactly equal to the input
pattern. For example, to find all documents written by Ellis,
J spelled exactly that way, type:

The single quotes instruct the search engine to search for
partial phrase. Unlike the exact phrase search, this
mode allows for an extra text appearing before/after given
pattern. This is somewhat similar to the "phrase search mode"
common on Google and other fulltext engines that search for phrase
expressions inside Web pages. For example, to find all the titles
containing the expression muon decay regardless of the
position of the expression in the title, type:

Now you see how to search for an author spelled sometimes as
Ellis, J and sometimes as Ellis, Jonathan
Richard (and other authors, such as De Lellis, Jim)
at the same time:

The difference between exact and partial phrase searching modes may
not be obvious upon first look. While the latter is more similar to
what ``phrase search'' usually means in the context of web page search
engines, the former one is usually an order of magnitude faster if you
know the precise values you are looking for.

Another interesting searching mode besides the word and phrase
searches is the regular expression search, introduced
by slashes instead of quotes. For example, the above partial phrase
query 'muon decay' is fully equivalent to the regular
expression query /muon decay/. The regular expression
syntax is very powerful and permits you to construct very complex
queries. For more information, please consult the regular expression section of this guide.

We have already seen how whitespace adds a silent boolean AND in the
search for words. The other boolean operators include:

+AND

ellis +muon

matches all records that contain both the word
ellis and the the word muon

ellis muon

ditto, syntactic sugar

ellis and muon

ditto, syntactic sugar

-NOT

ellis -muon

matches all records that contain the word
ellis but that do not contain the word
muon

ellis not muon

ditto, syntactic sugar

|OR

ellis |muon

matches all records that contain at least one
of the words

ellis or muon

ditto, syntactic sugar

Logical operations are automatically chained from left to right.
For example, if you want to search for documents written by Ellis on
muons or kaons, write:

which looks for (muon or kaon) and ellis. Note that this
gives different results from:

which would search for (ellis and muon) or kaon.

The left-to-right chaining behaviour permits you to easily refine
your search by adding/removing words with and/not or +/- operators.
For example, to exclude the documents on decay from the above search,
append -decay:

to get a refined list. Keep adding/removing terms until you are
satisfied.

The word truncation is supported via asterisk (*) wildcard
character. The wildcard instructs the search engine to match any
number of characters in that place. For example, to find records
that contain words muon, muons, muonic
etc, type:

The wildcard query works both in prefix and infix position. For
example, to get all the words that start by CERN-TH and
end by 31, type:

Note that the wildcard will be ignored if you try to apply it to
very short words, such as a*:

The wildcard character can be used also in the phrase searching
mode. For example, to find all the documents whose title starts by
"Neutrino mass", type:

Recall that we have introduced exact and partial phrase search
modes. Actually, a partial phrase search mode launches an exact
search enclosed within wildcards: we could say that 'foo bar
baz' equals to "*foo bar baz*". Now you can
see why the partial phrase search is slow: due to the usage of two
asterisks in front and after the text, each and every title in the
database has to be looked up to determine whether it matches or
not. (There are currently no partial phrase indexes.)

Searching within various bibliograpic fields (such as title,
author) is supported via Google's "site:" like syntax.
If a search term is preceded by a field name and a colon, then the
term is searched for inside this field only. For example, to find
documents containing the word ellis within author index,
type:

To select documents written by Ellis that contain words
like muon, muons, muonic within title,
type:

To select documents written by the NA60 experiment from
the year 2001, type:

The most common fields you may want to use are
author, title,
reportnumber, abstract,
keyword, year, experiment,
fulltext, and reference.

The regular expression searching mode is mostly for the power users
acquainted with the traditional Unix/POSIX regexp syntax. In the
Simple Search interface you can trigger it by using slashes instead of
quotes:

while in the Advanced Search interface you can select the matching
type explicitely by using the selection box menu. The above example
will find all the titles that start by the letter E, followed
by any number of any characters, and end by the letter s.

Another example could be an author search for an author expressed
in the database as either Ellis, J or Ellis, John:

The regular expression search enables you to formulate very
specific word proximity queries. For example, let us find all titles
containing words dense and matter that are separated
by at most one word that doesn't contain the letter l:

Note that you can also use character intervals such as
[a-k] and occurrence counts such as {3}.
For example, let us find all preprints that do not follow the year
cataloguing policy, that is YYYY to denote year, optionally
followed by ? or by another -YYYY:

You can use also character classes such [:alnum:], so
that the above query is equivalent to:

It is possible to search in citation network by means
of citedby and refersto search operators.
For example, to find out who cites hep-th/0201100, you can type:

For example, to find out which papers are cited by Klebanov, you can type:

To set up a cite alert for new papers citing author I. Klebanov, you can type:

Note that refersto and citedby search
operators work on any regular query. For example, to find all
papers that cite papers that are tagged with the gravitino keyword,
type:

Note also that these operators can be freely combined with
regular metadata search. For example, to find papers authored by
Klebanov that are cited by Papadimitriou but that do not cite any
of Papadimitriou's papers themselves, type:

All the syntax mentioned above can be combined together in one
query. For example, to find documents that have the word
ellis inside author fields, that do not contain words like
muon, 'muonic' etc in any field, that contain the phrase
(or the substring, to be more precise) 'dense quark matter' inside
abstract fields, and that were published in year starting by digits
'200', type:

Note that the default "any field" global index does contain only the metadata terms,
not the citation nor fulltext terms. You have to explicitely mention fulltext
or reference index to search there. For example, to find the term Higgs
in either metadata, references or fulltext files, type:

This permits an interesting combination of metadata, fulltext and citation search in
the same query. For example, to get all documents written by
Lin whose fulltext files contain the words
Schwarzschild and AdS, and who cite journal
Adv. Theor. Math. Phys., type:

You can search for an author in many ways, each having its own
advantages and disadvantages.

First of all, note that searching for words isn't usually what you
would want here. If you choose to search for the words Ellis
J within the author index, it means that two queries (for the
words Ellis and J) are effected first and a
boolean AND is performed next:

Such a query would match also a document whose first author is
Ellis, R and the second author Finch, A J, which is
probably not what you wanted. While the search is very fast and you
would have found the results for the author you were looking for, such
a technique could have returned you many false positives, as the one
cited above. Instead of searching for words, a more suitable
technique to apply in this case is to search for phrases which will
permit you to achieve higher search precisions.

The author names are usually stored in a form containing initials
only, such as Ellis, J. To get the list of publications of
an author whose name is spelled exactly that way, type:

This way of searching gives you the highest precision and no false
positives. (Assuming there are no other authors whose names are
spelled Ellis, J, an assumption that is often false*.) The search is very fast.

Sometimes an author's first name may be spelled abbreviated on
some documents (such as Ellis, J) and sometimes full on
others (such as Ellis, John; eventually also with the middle
name: Ellis, John Rolfe). To get the list of publications
for all these forms at the same time, you could use a boolean OR
query:

This way of searching still keeps the highest precision and no
false positives. (Assuming there are no other authors whose names are
spelled Ellis, J or Ellis, John, an assumption that
is often false*.) The search is
fast.

To match all of the above forms in a single search term, you can
try to use a wildcard query:

It would match all author names that start by the text
Ellis, J, i.e. not only the wanted forms Ellis,
J and Ellis, John, but also Ellis, Jim, or
Ellis, John Rolfe, or Ellis, Jonathan Richard.

This way of searching returns you more results, which may be
suitable in case you don't know how the names are spelled in the
database. But you also risk the eventuality of getting false
positives. The search is relatively fast.

Yet another, the most general alternative is to use a partial
phrase matching:

It would find not only all the authors mentioned above, but also
the ones whose names contain the expression Ellis, J
anywhere inside the name, such as De Lellis, Jim. It thus
gives you the largest possible number of hits at the largest risk of
false positives. The search is relatively slow.

(Note though that this way of searching may be very handy in case
of compound family names such Pepe-Altarelli, M or 't
Hooft, G where a casual user query for Hooft, G would
match the wanted author, unlike the methods mentioned above.)

Finally, let us note that you can use the regular
expression syntax to construct any complex author query. A simple
example is to search for an author expressed in the database as either
Ellis, J or Ellis, John:

*NOTE:
If you produce your own list of publications and you notice that
sometimes your first name is spelled abbreviated and sometimes in
full, or if you want to identify your publications among several
authors with the same abbreviation, please contact the administrators of Himalayan Document Centre so that
they could work with you on inputting a consistently spelled and
properly formatted first name everywhere. Only the consistent
database content will ensure a proper author searching behaviour.

You may select a certain field according to which sort the search
results, for example to sort the results by main title. However,
sometimes you may want to sort by a report number and it happens
that your documents have several of them. For example, the report
numbers hep-ph/0204140, CERN-TH-2002-069 and
RM3-TH-02-4 all denote the
same document. Now if you sort your search results set
containing this document, the system will take into consideration
the first report number, that may be either of these three.
Sometimes you may want to classify this document under its
hep-ph number, sometimes under its CERN number,
depending on whether you produce a list of CERN or hep-ph
publications. How can you influence the search engine to prefer
one report number rather than the other?

In other words, the search engine by default answers a query
like "sort by first author" or "sort by first report number", but
sometimes you may want to ask the search engine to "sort by first
report number that starts by the text CERN-". The latter
possibility is available via a "silent" sort parameter called
sp (for "sort pattern") that sorts preferentially
according to the given textual pattern if they can be found. The
parameter is "silent" in a way that it is not present in the search
interface, you have to add it manually to your search URL.
For example, to get all CERN-TH publications of the year 2001
sorted by their CERN-TH numbers, you would search for
CERN-TH-2001* within reportnumber index,
and on the search results page, being satisfied with the results,
you would add &sp=CERN-TH to the URL to sort the
results preferentially by CERN-TH report numbers, to get a nicely
sorted list of all CERN-TH 2001 publications.

On the search results page, links to other servers like Google, SPIRES or KEK are
automatically proposed in a box entitled "Try your search on". You
can simply click on the proposed links to run your query on these
search engines.

Note that the links aren't printed if the search engine doesn't
support it. For example, SPIRES or KEK cannot search for terms within
"any field", so we don't link to them in these cases.

If a metadata record contains some associated fulltext files, Himalayan Document Centre
tries to extract the textual information from the files and index it into a separate fulltext index.
To search for all records that contain the term e- in their fulltext files,
type:

Recall that fulltext words aren't included in the default global ``any field'' index,
but that you may freely combine a fulltext and metadata search. For example, to find all
articles written by Ellis that contain the word muon either in the
metadata or in the fulltext, type:

If a metadata record contains an associated fulltext file, Himalayan Document Centre
tries to extract references automatically from that file and index
them into a separate reference index. To search for
all records that cite Ellis in their reference lists,
type:

To search for all records that cite preprint hep-ph/0103062
in their reference lists, type:

To search for all records that cite an article from Giddings
and Ross published in Physical Review D in volume
61 in year 2000, type:

Recall that citation terms aren't included in the default global "any field" index,
but that you may freely combine a citation search with a metadata search.
For example, to find all articles on standard model that aren't written by
Ellis but that do cite him, type: