if the cache grows out of the capcacity limit, the least frequently used item is invalidated. If
there exists a tie, the least recently accessed key will be evicted1.

Track the accesses with doubly linked list

It is pretty obvious that we MUST use dict as the internal data store to achieve the O(1)O(1)O(1)
complexity for the data access, and somehow keep cache items sorted with the access frequency
all the time. Otherwise, the data eviction has to iterate all cache items(aka O(n)O(n)O(n) complexity)
to find the least frequently used.

Inspired by the LRU cache implementation, the accesse frequency is tracked with a doubly linked list
(see here for the linked list operations details, such as dll_append). Without loss of generality,
we store all the item (key, value) pair and an access counter sorted by the number of accessess
in the descending order.

For any operations with cache hits, the node is moved towards the list head to maintain the order.

For set operation with a cache miss, the tail node will be evicted if the cache reaches capcacity
limit. A new node is appended then moved accordingly.

The preliminary profiling shows that the manjority CPU time is spent on the linked list reordering,
so the LFUCache.update_freq method is deliberately extracted externally to test the different policies.
For example, the first attempt takes a bubble-sortesque apporach:

If the access counter of the node is no less than the its precedence, swap them.

This preformas significantly better, 8.45s in the synthetic benchmark with profiling enabled, but still way
too slow. The profiling shows that 99.4% CPU time is spent on the doubly linked list tranverse, and we do
it linearly, anyway we can leverage the sorted linked list to make it faster?

Put it in the bucket

The lfu paper presents a neat solution to address the performance issue: the single doubly linked list is segmented
to multiple buckets, the node with the same access counter are put into the same bucket in the order of recentness.
When the access counter is updated, we can simply pop the node from the current bucket, and place it to the new bucket.
Illustrated as below:

I take a simplified detour to explore the idea but avoiding the hustle of wrangling doubly doubly linked list:
a lookup table is used to map the access counter to the bucket.
This may incur O(n)O(n)O(n) complexity of the key eviction in the worst case; but it turns out that it performs very well,
— 0.35s with profiling enabled, a 24x performance boost.

It took many hours to get the doubly doubly linked list solution right due to its complexity. I had to use
the namedtuple to sort out the list index. And it just performed as well as the simplified version.

Check the memory leak

The memory leak is probably the biggest concern of a cache implementation, especially we are dealing with the circulated reference.
Before declaring success, I’d like to run a benchmark to check the memory consumption first: