From java-dev-return-46456-apmail-lucene-java-dev-archive=lucene.apache.org@lucene.apache.org Sun Feb 07 03:58:53 2010
Return-Path:
Delivered-To: apmail-lucene-java-dev-archive@www.apache.org
Received: (qmail 19155 invoked from network); 7 Feb 2010 03:58:53 -0000
Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3)
by minotaur.apache.org with SMTP; 7 Feb 2010 03:58:53 -0000
Received: (qmail 17206 invoked by uid 500); 7 Feb 2010 03:58:52 -0000
Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org
Received: (qmail 16974 invoked by uid 500); 7 Feb 2010 03:58:51 -0000
Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
List-Help:
List-Unsubscribe:
List-Post:
List-Id:
Reply-To: java-dev@lucene.apache.org
Delivered-To: mailing list java-dev@lucene.apache.org
Received: (qmail 16966 invoked by uid 99); 7 Feb 2010 03:58:51 -0000
Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230)
by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 07 Feb 2010 03:58:51 +0000
X-ASF-Spam-Status: No, hits=-2000.0 required=10.0
tests=ALL_TRUSTED
X-Spam-Check-By: apache.org
Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140)
by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 07 Feb 2010 03:58:49 +0000
Received: from brutus.apache.org (localhost [127.0.0.1])
by brutus.apache.org (Postfix) with ESMTP id E7F45234C4A8
for ; Sat, 6 Feb 2010 19:58:27 -0800 (PST)
Message-ID: <1992701879.104381265515107948.JavaMail.jira@brutus.apache.org>
Date: Sun, 7 Feb 2010 03:58:27 +0000 (UTC)
From: "John Wang (JIRA)"
To: java-dev@lucene.apache.org
Subject: [jira] Commented: (LUCENE-2252) stored field retrieve slow
In-Reply-To: <1045149059.99461265487508284.JavaMail.jira@brutus.apache.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394
X-Virus-Checked: Checked by ClamAV on apache.org
[ https://issues.apache.org/jira/browse/LUCENE-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830641#action_12830641 ]
John Wang commented on LUCENE-2252:
-----------------------------------
bq. I still think 4 bytes/doc is too much (its too much wasted ram for virtually no gain)
That depends on the application. In modern machines (at least with the machines we are using, e.g. a macbook pro) we can afford it :) I am not sure I agree with "virtually no gain" if you look at the numbers I posted. IMHO, the gain is significant.
I hate to get into a subjective argument on this though.
bq. I dont understand why you need something like a custom segment file to do this, why cant you just simply use Directory to load this particular file into memory for your use case?
Having a custom segment allows me to not having to get into this subjective argument in what is too much memory or what is the gain, since it just depends on my application, right?
Furthermore, with the question at hand, even if we do use Directory implementation Uwe suggested, it is not optimal. For my use case, the cost of the seek/read for the count on the data file is very wasteful. Also even for getting position, I can just a random access into an array compare to a in-memory seek,read/parse.
The very simple store mechanism we have written outside of lucene has a gain of >85x, yes, 8500%, over lucene stored fields. We would like to however, take advantage of the some of the good stuff already in lucene, e.g. merge mechanism (which is very nicely done), delete handling etc.
> stored field retrieve slow
> --------------------------
>
> Key: LUCENE-2252
> URL: https://issues.apache.org/jira/browse/LUCENE-2252
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Store
> Affects Versions: 3.0
> Reporter: John Wang
>
> IndexReader.document() on a stored field is rather slow. Did a simple multi-threaded test and profiled it:
> 40+% time is spent in getting the offset from the index file
> 30+% time is spent in reading the count (e.g. number of fields to load)
> Although I ran it on my lap top where the disk isn't that great, but still seems to be much room in improvement, e.g. load field index file into memory (for a 5M doc index, the extra memory footprint is 20MB, peanuts comparing to other stuff being loaded)
> A related note, are there plans to have custom segments as part of flexible indexing feature?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org