Navigation

At the time of this writing, popular key/value servers include
Memcached, Redis and many others.
While these tools all have different usage focuses, they all have in common that the storage model
is based on the retrieval of a value based on a key; as such, they are all potentially
suitable for caching, particularly Memcached which is first and foremost designed for
caching.

With a caching system in mind, dogpile.cache provides an interface to a particular Python API
targeted at that system.

A dogpile.cache configuration consists of the following components:

A region, which is an instance of CacheRegion, and defines the configuration
details for a particular cache backend. The CacheRegion can be considered
the “front end” used by applications.

A backend, which is an instance of CacheBackend, describing how values
are stored and retrieved from a backend. This interface specifies only
get(), set() and delete().
The actual kind of CacheBackend in use for a particular CacheRegion
is determined by the underlying Python API being used to talk to the cache, such
as Pylibmc. The CacheBackend is instantiated behind the scenes and
not directly accessed by applications under normal circumstances.

Value generation functions. These are user-defined functions that generate
new values to be placed in the cache. While dogpile.cache offers the usual
“set” approach of placing data into the cache, the usual mode of usage is to only instruct
it to “get” a value, passing it a creation function which will be used to
generate a new value if and only if one is needed. This “get-or-create” pattern
is the entire key to the “Dogpile” system, which coordinates a single value creation
operation among many concurrent get operations for a particular key, eliminating
the issue of an expired value being redundantly re-generated by many workers simultaneously.

In this section, we’re illustrating Memcached usage
using the pylibmc backend, which is a high performing
Python library for Memcached. It can be compared to the python-memcached
client, which is also an excellent product. Pylibmc is written against Memcached’s native API
so is markedly faster, though might be considered to have rougher edges. The API is actually a bit
more verbose to allow for correct multithreaded usage.

Above, we create a CacheRegion using the make_region() function, then
apply the backend configuration via the CacheRegion.configure() method, which returns the
region. The name of the backend is the only argument required by CacheRegion.configure()
itself, in this case dogpile.cache.pylibmc. However, in this specific case, the pylibmc
backend also requires that the URL of the memcached server be passed within the arguments dictionary.

The configuration is separated into two sections. Upon construction via make_region(),
the CacheRegion object is available, typically at module
import time, for usage in decorating functions. Additional configuration details passed to
CacheRegion.configure() are typically loaded from a configuration file and therefore
not necessarily available until runtime, hence the two-step configurational process.

Key arguments passed to CacheRegion.configure() include expiration_time, which is the expiration
time passed to the Dogpile lock, and arguments, which are arguments used directly
by the backend - in this case we are using arguments that are passed directly
to the pylibmc module.

Optional. A
function that will produce a “cache key” given
a data creation function and arguments, when using
the CacheRegion.cache_on_arguments() method.
The structure of this function
should be two levels: given the data creation function,
return a new function that generates the key based on
the given arguments. Such as:

The namespace is that passed to
CacheRegion.cache_on_arguments(). It’s not consulted
outside this function, so in fact can be of any form.
For example, it can be passed as a tuple, used to specify
arguments to pluck from **kw:

key_mangler¶ – Function which will be used on all incoming
keys before passing to the backend. Defaults to None,
in which case the key mangling function recommended by
the cache backend will be used. A typical mangler
is the SHA1 mangler found at sha1_mangle_key()
which coerces keys into a SHA1
hash, so that the string length is fixed. To
disable all key mangling, set to False. Another typical
mangler is the built-in Python function str, which can be used
to convert non-string or Unicode keys to bytestrings, which is
needed when using a backend such as bsddb or dbm under Python 2.x
in conjunction with Unicode keys.

A callable that, when specified,
will be passed to and called by dogpile.lock when
there is a stale value present in the cache. It will be passed the
mutex and is responsible releasing that mutex when finished.
This can be used to defer the computation of expensive creator
functions to later points in the future by way of, for example, a
background thread, a long-running queue, or a task manager system
like Celery.

For a specific example using async_creation_runner, new values can
be created in a background thread like so:

importthreadingdefasync_creation_runner(cache,somekey,creator,mutex):''' Used by dogpile.core:Lock when appropriate '''defrunner():try:value=creator()cache.set(somekey,value)finally:mutex.release()thread=threading.Thread(target=runner)thread.start()region=make_region(async_creation_runner=async_creation_runner,).configure('dogpile.cache.memcached',expiration_time=5,arguments={'url':'127.0.0.1:11211','distributed_lock':True,})

Remember that the first request for a key with no associated
value will always block; async_creator will not be invoked.
However, subsequent requests for cached-but-expired values will
still return promptly. They will be refreshed by whatever
asynchronous means the provided async_creation_runner callable
implements.

local_region=make_region()memcached_region=make_region()# regions are ready to use for function# decorators, but not yet for actual caching# later, when config is availablemyconfig={"cache.local.backend":"dogpile.cache.dbm","cache.local.arguments.filename":"/path/to/dbmfile.dbm","cache.memcached.backend":"dogpile.cache.pylibmc","cache.memcached.arguments.url":"127.0.0.1, 10.0.0.1",}local_region.configure_from_config(myconfig,"cache.local.")memcached_region.configure_from_config(myconfig,"cache.memcached.")

The CacheRegion object is our front-end interface to a cache. It includes
the following methods:

CacheRegion.get(key, expiration_time=None, ignore_expiration=False)

Return a value from the cache, based on the given key.

If the value is not present, the method returns the token
NO_VALUE. NO_VALUE evaluates to False, but is separate from
None to distinguish between a cached value of None.

By default, the configured expiration time of the
CacheRegion, or alternatively the expiration
time supplied by the expiration_time argument,
is tested against the creation time of the retrieved
value versus the current time (as reported by time.time()).
If stale, the cached value is ignored and the NO_VALUE
token is returned. Passing the flag ignore_expiration=True
bypasses the expiration time check.

Changed in version 0.3.0: CacheRegion.get() now checks the value’s creation time
against the expiration time, rather than returning
the value unconditionally.

The method also interprets the cached value in terms
of the current “invalidation” time as set by
the invalidate() method. If a value is present,
but its creation time is older than the current
invalidation time, the NO_VALUE token is returned.
Passing the flag ignore_expiration=True bypasses
the invalidation time check.

key¶ – Key to be retrieved. While it’s typical for a key to be a
string, it is ultimately passed directly down to the cache backend,
before being optionally processed by the key_mangler function, so can
be of any type recognized by the backend or by the key_mangler
function, if present.

Optional expiration time value
which will supersede that configured on the CacheRegion
itself.

Note

The CacheRegion.get.expiration_time
argument is not persisted in the cache and is relevant
only to this specific cache retrieval operation, relative to
the creation time stored with the existing cached value.
Subsequent calls to CacheRegion.get() are not affected
by this value.

If the value does not exist or is considered to be expired
based on its creation time, the given
creation function may or may not be used to recreate the value
and persist the newly generated value in the cache.

Whether or not the function is used depends on if the
dogpile lock can be acquired or not. If it can’t, it means
a different thread or process is already running a creation
function for this key against the cache. When the dogpile
lock cannot be acquired, the method will block if no
previous value is available, until the lock is released and
a new value available. If a previous value
is available, that value is returned immediately without blocking.

If the invalidate() method has been called, and
the retrieved value’s timestamp is older than the invalidation
timestamp, the value is unconditionally prevented from
being returned. The method will attempt to acquire the dogpile
lock to generate a new value, or will wait
until the lock is released to return the new value.

Changed in version 0.3.0: The value is unconditionally regenerated if the creation
time is older than the last call to invalidate().

Parameters

key¶ – Key to be retrieved. While it’s typical for a key to be a
string, it is ultimately passed directly down to the cache backend,
before being optionally processed by the key_mangler function, so can
be of any type recognized by the backend or by the key_mangler
function, if present.

optional callable function which will receive
the value returned by the “creator”, and will then return True or
False, indicating if the value should actually be cached or not. If
it returns False, the value is still returned, but isn’t cached.
E.g.:

The decorated function can then be called normally, where
data will be pulled from the cache region unless a new
value is needed:

result=generate_something(5,6)

The function is also given an attribute invalidate(), which
provides for invalidation of the value. Pass to invalidate()
the same arguments you’d pass to the function itself to represent
a particular value:

generate_something.invalidate(5,6)

Another attribute set() is added to provide extra caching
possibilities relative to the function. This is a convenience
method for CacheRegion.set() which will store a given
value directly without calling the decorated function.
The value to be cached is passed as the first argument, and the
arguments which would normally be passed to the function
should follow:

generate_something.set(3,5,6)

The above example is equivalent to calling
generate_something(5,6), if the function were to produce
the value 3 as the value to be cached.

New in version 0.4.1: Added set() method to decorated function.

Similar to set() is refresh(). This attribute will
invoke the decorated function and populate a new value into
the cache with the new value, as well as returning that value:

newvalue=generate_something.refresh(5,6)

New in version 0.5.0: Added refresh() method to decorated
function.

original() on other hand will invoke the decorated function
without any caching:

newvalue=generate_something.original(5,6)

New in version 0.6.0: Added original() method to decorated
function.

Lastly, the get() method returns either the value cached
for the given key, or the token NO_VALUE if no such key
exists:

value=generate_something.get(5,6)

New in version 0.5.3: Added get() method to decorated
function.

The default key generation will use the name
of the function, the module name for the function,
the arguments passed, as well as an optional “namespace”
parameter in order to generate a cache key.

Given a function one inside the module
myapp.tools:

@region.cache_on_arguments(namespace="foo")defone(a,b):returna+b

Above, calling one(3,4) will produce a
cache key as follows:

myapp.tools:one|foo|34

The key generator will ignore an initial argument
of self or cls, making the decorator suitable
(with caveats) for use with instance or class methods.
Given the example:

Above, the namespace parameter disambiguates
between somemethod on MyClass and MyOtherClass.
Python class declaration mechanics otherwise prevent
the decorator from having awareness of the MyClass
and MyOtherClass names, as the function is received
by the decorator before it becomes an instance method.

namespace¶ – optional string argument which will be
established as part of the cache key. This may be needed
to disambiguate functions of the same name within the same
source file, such as those
associated with classes - note that the decorator itself
can’t see the parent class on a function as the class is
being declared.

May be specified as a callable, taking no arguments, that
returns a value to be used as the expiration_time. This callable
will be called whenever the decorated function itself is called, in
caching or retrieving. Thus, this can be used to
determine a dynamic expiration time for the cached function
result. Example use cases include “cache the result until the
end of the day, week or time period” and “cache until a certain date
or time passes”.

callable, will be called on each function argument
in order to convert to a string. Defaults to str(). If the
function accepts non-ascii unicode arguments on Python 2.x, the
unicode() builtin can be substituted, but note this will
produce unicode cache keys which may require key mangling before
reaching the cache.

Backends are located using the setuptools entrypoint system. To make life easier
for writers of ad-hoc backends, a helper function is included which registers any
backend in the same way as if it were part of the existing sys.path.

For example, to create a backend called DictionaryBackend, we subclass
CacheBackend:

The values we receive for the backend here are instances of
CachedValue. This is a tuple subclass of length two, of the form:

(payload,metadata)

Where “payload” is the thing being cached, and “metadata” is information
we store in the cache - a dictionary which currently has just the “creation time”
and a “version identifier” as key/values. If the cache backend requires serialization,
pickle or similar can be used on the tuple - the “metadata” portion will always
be a small and easily serializable Python structure.

The ProxyBackend is a decorator class provided to easily augment existing
backend behavior without having to extend the original class. Using a decorator
class is also adventageous as it allows us to share the altered behavior between
different backends.

Proxies are added to the CacheRegion object using the CacheRegion.configure()
method. Only the overridden methods need to be specified and the real backend can
be accessed with the self.proxied object from inside the ProxyBackend.

For example, a simple class to log all calls to .set() would look like this:

ProxyBackend can be be configured to optionally take arguments (as long as the
ProxyBackend.__init__() method is called properly, either directly
or via super(). In the example
below, the RetryDeleteProxy class accepts a retry_count parameter
on initialization. In the event of an exception on delete(), it will retry
this many times before returning:

The wrap parameter of the CacheRegion.configure() accepts a list
which can contain any combination of instantiated proxy objects
as well as uninstantiated proxy classes.
Putting the two examples above together would look like this:

In the above example, the LoggingProxy object would be instantated by the
CacheRegion and applied to wrap requests on behalf of
the retry_proxy instance; that proxy in turn wraps
requests on behalf of the original dogpile.cache.pylibmc backend.

CacheRegion includes logging facilities that will emit debug log
messages when key cache events occur, including when keys are regenerated as
well as when hard invalidations occur. Using the Python logging module, set the log level to
dogpile.cache to logging.DEBUG: