Details

Description

set and vec currently reconstruct their inputs even when they are already of the requested type. Since it's a pretty common pattern to call set/vec on an input to ensure its type, this seems like an easy performance win in a common case.

Proposed: Check for set? in set and vec? in vec and return the coll as is if already of the requested type.

Performance:

See attached clj1410-bench.txt for test details :

Input/size

Function

Original

Patched

Comment

set/10

set

1.452 µs

0.002 µs

726x faster

set/1000

set

248.842 µs

0.006 µs

41473x faster

vector/10

set

1.288 µs

1.323 µs

slightly slower

vector/1000

set

222.992 µs

221.116 µs

~same

set/10

vec

0.614 µs

0.592 µs

~same

set/1000

vec

56.876 µs

55.920 µs

~same

vector/10

vec

0.252 µs

0.007 µs

36x faster

vector/1000

vec

24.428 µs

0.007 µs

3500x faster

As expected, if an instance of the correct type is passed, then the difference is large (with bigger savings for sets which do more work for dupe checking). In cases where the type is different, there is an extra instance? check (which looks to be jit'ed away or negligible). We only see a slower time in the case of passing a small vector to the set function - 3% slower (35 ns). The benefit seems greater than the cost for this change.

Alex Miller
added a comment - 26/Apr/14 10:18 AM Re:
*Open question*
Would it be better to pass-through arguments that satisfy the general (`set?`,`vec?`) or concrete (`PersistentHashSet`,`PersistentVector`) type?
I don't think there is any question that relying on abstractions via set?/vec? is better than referring to concrete types.

Peter Taoussanis
added a comment - 26/Apr/14 11:39 AM Attached some simple benchmarks. These were run with HotSpot enabled, after a 100k lap warmup.
Google Doc times: http://goo.gl/W7EACR
The `set` benefit can be substantial, and the overhead in non-benefitial cases is negligible.
The effect on `vec` is subtler: the benefit is relatively smaller and the overhead relatively larger.

I have incorporated a variant of this change into more extensive changes to set (CLJ-1546) and vec (CLJ-1618) slated for 1.7.

The variation included is that instead of just returning the same instance, I return the instance with-meta nil. This better matches current behavior (always get a new instance, meta is lost) which might potentially be useful in some cases.

Alex Miller
added a comment - 28/Dec/14 11:10 AM I have incorporated a variant of this change into more extensive changes to set (CLJ-1546) and vec (CLJ-1618) slated for 1.7.
The variation included is that instead of just returning the same instance, I return the instance with-meta nil. This better matches current behavior (always get a new instance, meta is lost) which might potentially be useful in some cases.