It looks like index is using two maps, first has about 20k (String -> Set) pairs second about 430k (Binary -> String) pairs.

The thing bothering me is - while we have 20k distinct String in system, number of instances in heap ~ 430k. I clearly understand how it happened, but there is definitely room for improvement here.
Also hash map and hash set implementations are wasting too much memory (looks at average size of com.tangosol.util.SafeHashMap$Entry[], and memory consumed by SafeHashMap$Entry).

Is it possible to plug own index implementation to coherence cache somehow or at least use custom provided map and set classes for index?

if you internalized the String instances in the value extractor which is used when creating the index, then the number of String instances in the forward map (Binary->String) would be 20k. You woudn't need to change the filtering logic.

Of course this comes at a small performance cost when updating entries and some memory overhead of maintaining the set of internalized String instances in the permanent generation space (and which can possibly cause OutOfMemoryError-s).

Yes I'm aware of such technics. String.intern() is not an option due to performance reason (at least on Sun JVM, it has O(ln(n)) lookup time and O(n) insertion time ). Of cause I can use WeakHashMap to organize own string table.

But I already have this table in first map (String -> Binary), in former project we have used handmade Map implementation with method internKey() to avoid duplication of data.
I'm looking for a way to reuse our old and proven techniques with coherence caches.

But I already have this table in first map (String -> Binary), in former project we have used handmade Map implementation with method internKey() to avoid duplication of data.
I'm looking for a way to reuse our old and proven techniques with coherence caches.

Thank you,
Alexey

Unfortunately there is no way the extractor could get hold of the already existing key reference in the reverse index. I agree that the extracted value reference in the reverse index entry (the key from that entry) should be reused by Coherence as the forward index value if the reverse index entry for the same extracted value exists, but apparently it is not.

Try to submit an enhancement request for this (and please share the ticket number for it so we can also look for it in the release notes).