From general-return-412-apmail-lucene-general-archive=lucene.apache.org@lucene.apache.org Tue Feb 13 20:11:59 2007
Return-Path:
Delivered-To: apmail-lucene-general-archive@www.apache.org
Received: (qmail 18958 invoked from network); 13 Feb 2007 20:11:59 -0000
Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2)
by minotaur.apache.org with SMTP; 13 Feb 2007 20:11:59 -0000
Received: (qmail 78586 invoked by uid 500); 13 Feb 2007 20:12:05 -0000
Delivered-To: apmail-lucene-general-archive@lucene.apache.org
Received: (qmail 78567 invoked by uid 500); 13 Feb 2007 20:12:05 -0000
Mailing-List: contact general-help@lucene.apache.org; run by ezmlm
Precedence: bulk
List-Help:
List-Unsubscribe:
List-Post:
List-Id:
Reply-To: general@lucene.apache.org
Delivered-To: mailing list general@lucene.apache.org
Received: (qmail 78549 invoked by uid 99); 13 Feb 2007 20:12:05 -0000
Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133)
by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Feb 2007 12:12:05 -0800
X-ASF-Spam-Status: No, hits=0.0 required=10.0
tests=
X-Spam-Check-By: apache.org
Received-SPF: neutral (herse.apache.org: local policy)
Received: from [203.99.254.144] (HELO rsmtp2.corp.hki.yahoo.com) (203.99.254.144)
by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Feb 2007 12:11:54 -0800
Received: from oftenbreaklx (wlanvpn-mc2e-246-180.corp.yahoo.com [172.21.148.180])
by rsmtp2.corp.hki.yahoo.com (8.13.8/8.13.6/y.rout) with ESMTP id l1DKBOk3012506
for ; Tue, 13 Feb 2007 12:11:24 -0800 (PST)
DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns;
h=from:to:references:subject:date:message-id:mime-version:
content-type:content-transfer-encoding:x-mailer:thread-index:in-reply-to:x-mimeole;
b=OBe/DZZU0IowXIcaQXvFXMXEUQD80NlHx0VSt1oEdqrsjv3Ert0MuexYBMThL2T0
From: "Deepa Paranjpe"
To:
References: <453699EA.3050501@apache.org> <45469084.5070201@apache.org> <765FF919-884A-4999-BE2F-41EFE11D567E@101tec.com>
Subject: Problems with "AND" queries
Date: Tue, 13 Feb 2007 12:11:23 -0800
Message-ID: <001901c74fab$23603ce0$789215ac@ds.corp.yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain;
charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Mailer: Microsoft Office Outlook 11
Thread-Index: AccBphkPT3abX83ZRzi6aHajh7j4CROBKjBw
In-Reply-To: <765FF919-884A-4999-BE2F-41EFE11D567E@101tec.com>
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028
X-Virus-Checked: Checked by ClamAV on apache.org
I have small documents indexed.
When I query the index using a BooleanQuery containing {why,is,the,sky,blue}
with all queries having the MUST BooleanClause, I do not retrieve any
results.
However, when I use only { why,sky,blue} I get results which are
Why is the sky blue? And several of them.
What is going wrong? Please help.
-----Original Message-----
From: Stefan Groschupf [mailto:sg@101tec.com]
Sent: Monday, November 06, 2006 5:18 AM
To: general@lucene.apache.org
Subject: Re: [PROPOSAL] index server project
Hi,
do people think we are already in a stage where we can setup some
basic infrastructure like mailing list and wiki and move the
discussion to the new mailing list. Maybe setup a incubator project?
I would be happy to help with such basic tasks.
Stefan
Am 31.10.2006 um 22:03 schrieb Yonik Seeley:
> On 10/30/06, Doug Cutting wrote:
>> Yonik Seeley wrote:
>> > On 10/18/06, Doug Cutting wrote:
>> >> We assume that, within an index, a file with a given name is
>> written
>> >> only once.
>> >
>> > Is this necessary, and will we need the lockless patch (that avoids
>> > renaming or rewriting *any* files), or is Lucene's current index
>> > behavior sufficient?
>>
>> It's not strictly required, but it would make index synchronization a
>> lot simpler. Yes, I was assuming the lockless patch would be
>> committed
>> to Lucene before this project gets very far. Something more than
>> that
>> would be required in order to keep old versions, but this could be as
>> simple as a Directory subclass that refuses to remove files for a
>> time.
>
> Or a snapshot (hard links) mechanism.
> Lucene would also need a way to open a specific index version (rather
> than just the latest), but I guess that could also be hacked into
> Directory by hiding later "segments" files (assumes lockless is
> committed).
>
>> > It's unfortunate the master needs to be involved on every
>> document add.
>>
>> That should not normally be the case.
>
> Ahh... I had assumed that "id" in the following method was document
> id:
> IndexLocation getUpdateableIndex(String id);
>
> I see now it's index id.
>
> But what is index id exactly? Looking at the example API you laid
> down, it must be a single physical index (as opposed to a logical
> index). In which case, is it entirely up to the client to manage
> multi-shard indicies? For example, if we had a "photo" index broken
> up into 3 shards, each shard would have a separate index id and it
> would be up to the client to know this, and to query across the
> different "photo0", "photo1", "photo2" indicies. The master would
> have no clue those indicies were related. Hmmm, that doesn't work
> very well for deletes though.
>
> It seems like there should be the concept of a logical index, that is
> composed of multiple shards, and each shard has multiple copies.
>
> Or were you thinking that a cluster would only contain a single
> logical index, and hence all different index ids are simply different
> shards of that single logical index? That would seem to be consistent
> with ClientToMasterProtocol .getSearchableIndexes() lacking an id
> argument.
>
>> I was not imagining a real-time system, where the next query after a
>> document is added would always include that document. Is that a
>> requirement? That's harder.
>
> Not real-time, but it would be nice if we kept it close to what Lucene
> can currently provide.
> Most people seem fine with a latency of minutes.
>
>> At this point I'm mostly trying to see if this functionality would
>> meet
>> the needs of Solr, Nutch and others.
>>
>
> It depends on the project scope and how extensible things are.
> It seems like the master would be a WAR, capable of running stand-
> alone.
> What about index servers (slaves)? Would this project include just
> the interfaces to be implemented by Solr/Nutch nodes, some common
> implementation code behind the interfaces in the form of a library, or
> also complete standalone WARs?
>
> I'd need to be able to extend the ClientToSlave protocol to add
> additional methods for Solr (for passing in extra parameters and
> returning various extra data such as facets, highlighting, etc).
>
>> Must we include a notion of document identity and/or document
>> version in
>> the mechanism? Would that facillitate updates and coherency?
>
> It doesn't need to be in the interfaces I don't think, so it depends
> on the scope of the index server implementations.
>
> -Yonik
>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
101tec Inc.
search tech for web 2.1
Menlo Park, California
http://www.101tec.com