Hi,
keyword extraction of very large files will consume a lot of memory, cause all keywords have to be kept in memory (I`m not sure, if this is a Lucene issue or how its been used). For this you have three options:
- use all keywords, but live with the memory issue
- restrict the amount of keywords, but live with only half indexed files
- disable keyword extraction by using a index configuration for nt:resource where only a dummy non existing property should be indexed
Imho the second is the worst solution because it is not reliable.
Second time, I`ve seen more memory consumption was when lucene index files were merged. But I didn`t had the time to investigate here further, extending the memory a bit helped, so I don`t know about the cause here.
Kind regards, Robert
-----Ursprüngliche Nachricht-----
Von: pgupta [mailto:pankaj.gupta@ansys.com]
Gesendet: Freitag, 6. September 2013 05:36
An: users@jackrabbit.apache.org
Betreff: Re: Huge memory usage while re-indexing
Unfortunately not, as our users can potentially construct a search query using any property.
Do you think it's the number of indexable properties causing the memory issues? I was thinking it was perhaps more to do with the keyword extraction from file contents. We came across somewhat similar memory issue when we increased the number of words used for indexing from 10,000 to a million.
This again caused huge memory spike (~ 2GB) while importing a large text file (~ 100 MB). Because of this we had to revert this setting to the default value.
So my initial thinking is that either Lucene indexing (or how it's being used by Jackrabbit) is not scalable, or our configuration is not optimal to handle these cases.
--
View this message in context: http://jackrabbit.510166.n4.nabble.com/Huge-memory-usage-while-re-indexing-tp4659465p4659472.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.