ConcurrentDictionary in Caching Implementation

The .net class ConcurrentDictionary<T1, T2> does provide thread-safe methods to store and read objects with a unique key. This ability makes it a good candidate for implementing memory based “level 1 caching” for fast access to cached information from long running database request.

When implementing such a caching using this class, you should be aware that methods like “GetOrAdd” really are thread safe, but this does not mean that they will behave as you might expect.

For demonstration purpose I will replace the long running database request simply with a “Thread.Sleep(500);” and returning a unique value.

I’ve this little unit test to demonstrate the first issue with the ConcurrentDictionary:

[TestMethod]

publicvoid OverlappingCallsToGetOrAddDontBlock()

{

var target = newConcurrentDictionary<string, string>();

var code1 = false;

var code2 = false;

var code2ExecutesAfterCode1 = false;

var thread = newThread(

() => target.GetOrAdd(

“Hello”,

s =>

{

code1 = true;

Thread.Sleep(500);

return s + “1”;

}));

thread.Start();

Thread.Sleep(100);

var start = DateTime.Now;

var result = target.GetOrAdd(

“Hello”,

s =>

{

code2ExecutesAfterCode1 = code1;

code2 = true;

Thread.Sleep(500);

return s + “2”;

});

var end = DateTime.Now;

// the result should be “Hello”

Assert.AreEqual(“Hello1”, result);

// the first code fragment has been executed

Assert.IsTrue(code1);

// the second code fragment has been executed (this

// would be “false” if the second invoke of GetOrAdd

// would wait for the first to complete)

Assert.IsTrue(code2);

Assert.IsTrue(code2ExecutesAfterCode1);

Assert.IsTrue((end – start).TotalMilliseconds > 450);

}

As you already can see by reading the comments and asserts, the call to GetOrAdd does execute the generation of the value for the key “Hello” two times. This is because the first result has not been added to the internal value collection when the 2nd call starts. Even more disturbing for me is: the 2nd call provides a delegate that returns “Hello2”, that is being executed but the return value is “Hello1” (even if “Hello2” is the “newer” value).

It depends on the timing:

0 ms

We’ve started a thread that takes 500ms to generate a value for “Hello” – so that value is added to the internal collection of the ConcurrentDictionary.

100 ms

While this value-generation we requested the value for the key “Hello” again – this will lookup the internal collection for the value of key “Hello”, will not find anything, so it will start the value creation again.

500 ms

The value from the first call inside the additionally started thread will be added to the internal collection.

600 ms

The second call did finish generating its value and tries to insert it into the internal collection – but there is already the value from the first call. In this case it does NOT update the value inside the ConcurrentDictionary, but uses the value from the internal collection instead (“Hello1”).

In this case the generation of the second value was a totally waste of time. In case of the second call ending before the first one, both calls will return the second value … which value will be “the one” depends on when the call starts (before adding a value for the key) and when it ends.

If you use the ConcurrentDictionary as a simple storage for a cache on a server side, you might get into trouble when you start up your application and get multiple requests that need potentially cached information. They will all start nearly the same time, so requesting a page 10 times a second that might need 2 seconds to get some “normally cached” data will case 20 concurrent queries for that data – what might not be what you really want to do. You should carefully design and test your caching – there might be an open source caching library that already implements what you need, so you don’t risk wasting that amount of CPU-time and IO-load.

I’ve talked about the “first issue” with using this class in a self implemented cache – so what’s the “second issue”? The second issue is security related and I will post that in a week or two (depending on my other workload).

Post navigation

One Response to ConcurrentDictionary in Caching Implementation

Regarding ConcurrentDictionary this works as designed:
“If you call GetOrAdd simultaneously on different threads, addValueFactory may be called multiple times, but its key/value pair might not be added to the dictionary for every call.”http://msdn.microsoft.com/en-us/library/ee378677.aspx
(The method taking the value instead of a callback does not suffer from this problem.)

This is however not an issue related to ConcurrentDictionary, it is the general issue of caching information that takes time to aquire. I would also point out, that this is not only the case “when you start up your application”, since reasonable (sorry ;-)) cache implementations should at some point clean up the data to avoid growing indefinitely.

One solution would be to block within the value factory method. A better solution might be based on Tasks: Instead of caching the factory method (running a long time), create a task that calls the actual factory method, and put that task in the cache.
The first caller will create the task, while subsequent ones will simply get it and wait for it to finish. (There is just a small time frame with similar concurrency issues, but this can be solved with standard means, if it actually is an issue.)