lucene-java-user mailing list archives

Hi Erick,
First, consider using your own analyzer and/or breaking the IP addresses
> up by substituting ' ' for '.' upon input.
Do you mean breaking the IP up into one token for each segment, like ["192",
"168", "1", "100"] ?
> But on to your question. Please post what you mean by
> "a large number". 10,000? 1,000,000,000? we have no clue
> from your posts so far...
I apologize for the lack of details. A large part of the data will be
wireless MAC addresses detected over the air, so it depends on the site. But
I suppose, worst case, we're looking at thousands or tens of thousands.
Comparatively speaking, then, I guess it's not such a large number compared
to some of the other questions discussed on the list.
That said, efficiency is hugely overrated at this stage of your
> design. I'd personally use whatever is easiest and run some
> tests.
>
> Just index them as single (unbroken) tokens to start and search
> your partial address with PrefixQuery.
This is what I was thinking originally, too. Although there could be times
where they are searching for a piece at the end of the address, which is why
my original post had me building a WildcardQuery.
The system will be searching log messages, too, and for that I'll use the
more normal StandardAnalyzer/QueryParser approach.
So what I am thinking of doing going forward is creating a custom query
parser class, that basically has special cases (IP addresses, MAC addresses)
where the query must be more customized, and in the other cases fall through
to the standard QueryParser class. Does this sound like a good idea?
Thanks again for your continued help!