I found it. My reader was returning 0 at the end of the stream instead of -1. Doh.
Thanks again for the suggestions. They did ultimately lead me to the right answer.
Ed
----- Original Message ----
From: Edwin Smith <edwin.smith@yahoo.com>
To: java-user@lucene.apache.org
Sent: Tuesday, October 7, 2008 10:43:06 AM
Subject: Re: ArrayIndexOutOfBoundsException in FastCharStream.readChar
Thanks for the tip. I tried your experiment and, sure enough, it works just fine, so it's
not the contents but obviously some other behavior of my custom reader. (Does the analyzer
require that "mark" and "reset" be implemented, for example? I did not implement them.)
The stack trace is as follows:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsExceptionat java.lang.System.arraycopy(
at org.apache.lucene.analysis.standard.StandardTokenizerImpl.zzRefill(
at org.apache.lucene.analysis.standard.StandardTokenizerImpl.getNextToken(
at org.apache.lucene.analysis.standard.StandardTokenizer.next(
at org.apache.lucene.analysis.standard.StandardFilter.next(
at org.apache.lucene.analysis.LowerCaseFilter.next(
at org.apache.lucene.analysis.StopFilter.next(
at org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.invertField(
at org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.processField(
at org.apache.lucene.index.DocumentsWriter$ThreadState.processDocument(
at org.apache.lucene.index.DocumentsWriter.updateDocument(
at org.apache.lucene.index.DocumentsWriter.addDocument(
at org.apache.lucene.index.IndexWriter.addDocument(
at org.apache.lucene.index.IndexWriter.addDocument(
at com.affinovate.v4.server.search.ServerTest.main(Native Method)StandardTokenizerImpl.java:366)StandardTokenizerImpl.java:573)StandardTokenizer.java:139)StandardFilter.java:42)LowerCaseFilter.java:33)StopFilter.java:118)DocumentsWriter.java:1522)DocumentsWriter.java:1412)DocumentsWriter.java:1121)DocumentsWriter.java:2442)DocumentsWriter.java:2424)IndexWriter.java:1464)IndexWriter.java:1442)ServerTest.java:36)
The error is occuring in the second line of zzRefill():
System.arraycopy(
I set a breakpoint to catch it before it erred, and the value of zzEndRead is 0 and the value
of zzStartRead 1. Thus the error.
I was being clever and made a custom reader using competing threads against a SAX parser.
I probbably did it more to see if I could than for any valid reason, so I will probably just
simplify my approach and use the parser to pull a complete string and use a string reader
like you suggest.
Thanks for the help.
Ed
zzBuffer, zzStartRead,
zzBuffer, 0,
zzEndRead-zzStartRead);
----- Original Message ----
From: Michael McCandless <lucene@mikemccandless.com>
To: java-user@lucene.apache.org
Sent: Tuesday, October 7, 2008 5:12:05 AM
Subject: Re: ArrayIndexOutOfBoundsException in FastCharStream.readChar
If you capture the exact text produced by the reader, and wrap it in a
StringReader and pass that to StandardAnalyzer, do you then see the
same exception?
Can you post the full stack trace on 2.3.2?
Mike
Edwin Smith wrote:
> I upgraded to the latest, 3.3.2 and had the same problem, even
> though it was clearly a different lexer reading the text.
>
> I did find some problems with the reader I was using, and it now
> reads some files that it didn't before, so it may still be some
> reader problem I haven't identified, but the text coming in from it
> looks correct to me, so I don't know.
>
> Very frustrating.
>
> Ed
>
>
>
> ----- Original Message ----
> From: Edwin Smith <edwin.smith@yahoo.com>
> To: java-user@lucene.apache.org
> Sent: Monday, October 6, 2008 3:20:51 PM
> Subject: Re: ArrayIndexOutOfBoundsException in FastCharStream.readChar
>
> No particular reason. It is just what I had loaded last and hadn't
> upgraded. It sounds like there might be good reason to do that now.
>
> Thanks for the tip.
>
> Ed
>
>
>
> ----- Original Message ----
> From: Steven A Rowe <sarowe@syr.edu>
> To: java-user@lucene.apache.org
> Sent: Monday, October 6, 2008 3:18:20 PM
> Subject: RE: ArrayIndexOutOfBoundsException in FastCharStream.readChar
>
> Hi Edwin,
>
> I don't know specifically what's causing the exception you're
> seeing, but note that in Lucene 2.3.0+, the JavaCC-generated version
> of StandardTokenizer (where your exception originates) has been
> replaced with a JFlex-generated version - see <http://issues.apache.org/jira/browse/LUCENE-966
> >.
>
> FYI, indexing speed was much improved in 2.3.0 over previous
> versions -- up to 10 times faster, according to reports on this list
> -- is there any particular reason you aren't using 2.3.2 (the most
> recent release)?
>
> Steve
>
> On 10/06/2008 at 2:32 PM, Edwin Smith wrote:
>> Oh, and in case it matters, I'm using Lucene 2.2.0.
>>
>> Ed
>>
>>
>>
>> ----- Original Message ----
>>
>>
>> I am stumped and have not seen any other reference to this
>> problem. I am getting the following exception on everything I
>> try to index. Does anyone know what my problem might be?
>>
>> Thanks,
>>
>> Ed
>>
>> java.lang.ArrayIndexOutOfBoundsException at
>> org.apache.lucene.analysis.standard.FastCharStream.readChar( at
>> org.apache.lucene.analysis.standard.FastCharStream.BeginToken( at
>> org.apache.lucene.analysis.standard.StandardTokenizerTokenMana
>> ger.getNextToken( at
>> org.apache.lucene.analysis.standard.StandardTokenizer.jj_ntk( at
>> org.apache.lucene.analysis.standard.StandardTokenizer.next( at
>> org.apache.lucene.analysis.standard.StandardFilter.next( at
>> org.apache.lucene.analysis.LowerCaseFilter.next( at
>> org.apache.lucene.analysis.StopFilter.next( at
>> org.apache.lucene.index.DocumentWriter.invertDocument( at
>> org.apache.lucene.index.DocumentWriter.addDocument( at
>> org.apache.lucene.index.IndexWriter.buildSingleDocSegment( at
>> org.apache.lucene.index.IndexWriter.addDocument( at
>> org.apache.lucene.index.IndexWriter.addDocument( at
>> com.affinovate.v4.server.search.Indexer.index( at
>> com.affinovate.v4.server.search.Indexer.perform( at
>> com.affinovate.v4.server.db.TaskQueue.run( at java.lang.Thread.run(:
>> 2048FastCharStream.java:46)FastCharStream.java:79)StandardToke
>> nizerTokenManager.java:1180)StandardTokenizer.java:158)Standar
>> dTokenizer.java:36)StandardFilter.java:41)LowerCaseFilter.java
>>> 33)StopFilter.java:107)DocumentWriter.java:219)DocumentWriter
>> .java:95)IndexWriter.java:1013)IndexWriter.java:1001)IndexWrit
>> er.java:983)Indexer.java:61)Indexer.java:93)TaskQueue.java:115
>> )Thread.java:619)
>>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org