Normally I use instances of class Set to maintain a set of domain objects. These instances behaves pretty well, when the size of entries are upto 30000. After that the performance goes down ...

Another solution I have found is the class EsIdentitySet, which takes around 1/3 of the time of the normal Set class (with 90000 entries: 69 seconds against 235 seconds, with 40000 entries: 11 seconds against 19 seconds).

Look at AbtHighCapacityDictionary and AbtHighCapacityLookupTable. They both take advantage of #abtHash32. The performance hit you see is because the standard #hash wraps at 32767, so you end up with lots of hash collisions, and you end up with a linear search. The #abtHash32 method generates a 32-bit hash, which is better suited for large collections.

That being said, I don't believe there's a high capacity set, but how hard could it be to implement?

koschate wrote:Look at AbtHighCapacityDictionary and AbtHighCapacityLookupTable. They both take advantage of #abtHash32.

Based on AbtHighCapacityLookupTable I made a high capacity identity lookup table and would like to make two points.

1 It would be nice if such a dictionary was in the base image.
2 There is quite some code duplication going on in the various dictionary classes which can be reduced by extracting the hash and compare operations.

This is an example of how the class hierarchy of VA Smalltalk has grown in unanticipated, and not necessarily good, ways over the years. The original Abt<collection class name> classes came about due to a division of labor between 2 development groups (one inside IBM and one outside). The AbtHighCapacity<collection class name> classes came aboout in the same way -- they were developed as part of the ObjectExtender work by IBM Consulting and moved wholesale into the base.

So, what can/should be done at this point. It's time for a refactoring of, at least, the collection class hierarchy to pick out the essential differences between the classes and move these differences into their own methods so common code can be common. This might even give us the opportunity to parameterize algorithms -- after all, you don't need a different subclass of SortedCollection for each different sort algorithm.