Thanks for the greatly detailed answer :-)
Seems like my question(s) was/were not as clear as I thought...
I wrote a spider that does a kind of abstraction of plain text page, html
pages etc. and produces the described xml data.
> > MetaNames document url size type date crawldate keywords \
> > description link title content
All these tags are metanames _and_ properties. They are used as fields. The
idea is to be able to search for words in the document content as well as
for documents of a certain size or contenttype. And all data are to be
displayed if wanted.
Indexing the data is easy, but as I came to the search-interface, I found
it annoying to use
> > swish-e -f index_test -w "content=(harry AND potter)"
because <content> is where all the words are.
The searchsyntax and my search-script get simpler, if I can use
> > swish-e -f index_test -w "harry AND potter"
for a simple search (less braces). Extended searches (like a field search)
that use the other fields are expected to be more complicated.
As you wrote
> MetaNameAlias swishdefault content
solves my problem.
I didn't understand the documentation in that point, it seems.
So the meaning of
MetaNameAlias swishdefault content
is something like
swishdefault = swishdefault + content
Greetings
Guido
At 28.08.2002 08:05 -0700, you wrote:
>Sorry, I'm rotten at giving short answers...
>
>At 05:47 AM 08/28/02 -0700, Guido Adam wrote:
> >All tags are defined as meta-tags in the swish.conf:
> >
> > MetaNames document url size type date crawldate keywords \
> > description link title content
> >
> >Problem:
> >If I search, I have to do something like
> >
> > swish-e -f index_test -w "content=harry"
> >
> >I'd like to do
> >
> > swish-e -f index_test -w "harry"
>
>Note that that is the same thing as
>
> swish-e -f index_test -w swishdefault=harry
>
>That means your front-end code can be more generic. Since *everything* is
>a metaname you can always specify a metaname and that will make it easier
>to program:
>
> $swish_query = "$metaname=($query_words)";
>
> >Is it possible to "define" <content> as swishdefault, <title> as
> >swishtitle, <url> as swishdocpath and <description> as swishdescription? If
> >so, how to do that?
>
>I think you are mixing some concepts here. Or at least you are asking two
>questions.
>
>Swish has properties and metanames. Metanames are used for searching,
>where properties are used to store associated data with each file. It's
>kind of backwards as properties are really metadata.
>
>So, you can alias the meta names while indexing:
>
> http://swish-e.org/2.2/docs/SWISH-CONFIG.html#item_MetaNameAlias
>
>Remove "content" from MetaNames and instead add it as:
>
> MetaNameAlias swishdefault content
>
>Then searching ./swish -w foo will find "foo" even if it was in the tag
><content>. Use the -T indexed_words option to index a single document and
>you can see how it works.
>
>Now, the other tags you list above sound more like properties. So then you
>would use PropertyNameAlias instead.
>
>So I think those are your answers.
>
>If you are not *mixing* indexing of HTML and XML docs, then there's no need
>to map (alias) your tag names onto the default propertynames that swish
>uses. Just use your names and use -x to get out the data you want.
>
>That's how swish works internally. It just uses a default -x setting of:
>
> "r %p \"%t\" %l"
>
>which is in long form:
>
> -x '<swishrank> <swishdocpath> "<swishtitle>" <swishdocsize>\n'
>
>
>Now, the "title" is a special case, and I'm not really sure what you want
>to do. I try to explain below.
>
> >The index contains xml data only.
>
>Just to be clear, HTML and XML parsing are basically the same. There's
>three differences.
>
>1) HTML tags are not added when using "UndefinedMetaTags auto".
>"UndefinedMetaTags auto" might be useful when you are indexing XML and want
>every tag to be automatically created as a Metaname. (My guess is this is
>not that useful of a configuration setting.)
>
>2) HTML tags set flags on the word indicating *where* in the HTML doc a
>word is found, such as in the <head>, <title>, <body>, <strong|b|em|i>,
><h*>. These flags do two things. First, they can be used with the -t
>switch to limit searches to words in those sections of a document (anyone
>use that feature?) Second, the flags are used in ranking to rank some
>words higher than others, most commonly title words are ranked higher than
>body words.
>
>(BTW -- that flag is called the word's "structure")
>
>3) Text in HTML <title> tags are indexed as swishdefault, so you
>automatically search the title in addition to the body of the document.
>
>The MetaNameAlias thing happens after processing HTML tags. So, although
>you can do:
>
> MetaNamesAlias swishdefault title
>
>and get your <title> indexed as swishdefault, it will *not* have the flags
>to indicate that it is a title word and rank higher in search results.
>
>One plan is to be able to set a ranking bias by metaname so that you could
>say, rank words in <keywords>...</keywords> higher. But that doesn't solve
>the problem of indexing <title> as swishdefault, and also making those
>words rank higher. Plus, that won't work for aliases since alias mapping
>happens at indexing and rank calculation is done at search time.
>
>You can't make the parser assign the flags by simply indexing your .xml
>files as type HTML2 because the tag mapping doesn't happen in the parser --
>that is, the parser doesn't rename the tags before swish sees them (the
>mapping happens when swish lookups up the tags ID number). You wouldn't
>want that because then you couldn't have separate alias mappings for
>metanames and property names.
>
>It might be possible to have a (yet another) config option that allows you
>to set the flags on tags. Something like:
>
> MetaNameAlias swishdefault title
> StructureFlags title in_title in_head
>
>So that would emulate what happens when processing HTML. Words in a
><title> tag get indexed as swishdefault metaname, plus those words are
>flagged as being title words (and in the <head> section, too).
>
>If I wanted that behavior today I'd use -S prog and write a perl program to
>parse my XML and output HTML.
>
>Ok, time for another cup of coffee.....
>
>
>--
>Bill Moseley
>mailto:moseley@hank.org