OK, I know lots of great work has been done to reduce the memory
footprint for sorting and faceting, but what I'm seeing is drastic
enough that I want to see if I'm missing something and to ask what
finer-grained tools people are using to answer the question "How much
more memory efficient is the new way of doing things"?
Setup:
I'm indexing 1.9M Wikipedia articles. Firing up a fresh Solr and
firing a relatively insane query at it while monitoring in jConsole.
Doing a GC from jConsole and looking at the memory used by Solr.
Crude, but I'm trying to get a flavor of what's going on here.
Field Unique values type
id 1,917,727 string
user_sort 62,123 string
text 57,759 text (1.4.1 flavor for all
three Solr versions)
user_id 62,122 int
http://localhost:8983/solr/select/?q=*:*&version=2.2&start=0&rows=10&indent=on&sort=user_sort
asc, id desc&facet=on&facet.field=text&facet.field=user_id&facet.field=id
Yeah, yeah, yeah, faceting and sorting by a unique ID is silly. But it
*does* stress memory.
Anyway, here are the numbers I'm seeing:
1.4.1 328 M
3.2 328 M
trunk 90 M
And it's even more impressive than that when you consider that 20M or
so is just to get in the door.....
Is it fair to say that the two big innovations that have reduced the
memory footprint are:
1> going to byte arrays for string storage
2> the FST work?
Final question. It looks like the FST work is back-ported to the
current 3_x code branch, is that true? Anything else back-ported
there? I'll check that branch out and give it a whirl for kicks.
Thanks,
Erick
A novice programmer gets a program to compile and says "I'm sure it'll
run fine now"
A veteran programmer runs a program for the first time, gets the
expected results and says "I must have done something wrong, that
can't *really* be working".
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org