The Performance Impact of Using dict() Instead of {} in CPython 2.7

I’ve been reviewing lot of code lately for various open source and
internal projects written in Python. As part of those reviews, I
have noticed what I think is a trend toward using dict()
instead of {} to create dictionaries. I don’t know exactly why
this trend has emerged. Perhaps the authors perceive dict() as
more readable than {}. Whatever the reason, my intuition told
me calling the function version of the constructor for a dictionary
would impose a performance penalty. I studied what happens in both
cases to understand how significant that penalty is, and the
results confirmed my intuition.

tl;dr

With CPython 2.7, using dict() to create dictionaries takes up to
6 times longer and involves more memory allocation operations than the
literal syntax. Use {} to create dictionaries, especially if you
are pre-populating them, unless the literal syntax does not work for
your case.

Initial Hypothesis

I wanted to study the performance difference between the literal
syntax for creating a dictionary instance ({}) and using the name
of the class to create one (dict()). I knew that the Python
interpreter is based on opcodes and that there are codes dedicated to
creating a dictionary that would not be invoked when the dict()
form was used instead of the literal form. I suspected that the extra
overhead for looking up the name “dict” and then calling the
function would make the “function” form slower.

Measuring Performance

I began my analysis by applying timeit to see if the
performance difference was even measurable.

Immediately I could see that not only was there a difference, it was
even more significant than I expected. Using dict() to create an
empty dictionary took 6 times longer than using the literal
syntax. Would the difference be the same if the dictionary had
members?

Passing a few members to the dictionary brought the difference closer
together, but the function form was still taking three times as long.

What is going on?

After establishing the performance difference, I asked myself what was
going on to cause such a significant slowdown. To answer that
question, I needed to look more deeply into what the interpreter was
doing as it processed each expression. I wanted to see which (and how
many) opcodes were being executed. I used dis todisassemble the Python expressions to see which opcodes implement
each.

To use dis from the command line, I needed input files containing the
different expressions I was studying. I created func.py:

dict()

and literal.py:

{}

The output of dis is arranged in columns with the original source line
number, the instruction “address” within the code object, the opcode
name, and any arguments passed to the opcode.

The function form uses two separate opcodes: LOAD_NAME to find the
object associated with the name “dict”, and CALL_FUNCTION to
invoke it. The last three opcodes are not involved in creating or
populating the dictionary, and appear in both versions of the code, so
I ignored them for my analysis.

The BUILD_MAP opcode creates a new empty dictionary instance and
places it on the top of the interpreter’s stack.

After comparing the two sets of opcodes, I suspected that theCALL_FUNCTION operation was the culprit, since calling functions
is relatively expensive in Python. However, these were trivial
examples and did not look like what I was seeing in code reviews. Most
of the actual code I had seen was populating the dictionary as it
created it, and I wanted to understand what difference that would
make.

Examining More Complex Examples

I created two new source files that set three key/value pairs in the
dictionary as it is created. I started with func-members.py, which
instantiated a dictionary with the same members using the dict()
function.

1
2
3
4

dict(a="A",b="B",c="C",)

I realized that in order to really understand what was going on,

I would have to look at the interpreter implementation.

The disassembled version of func-members.py started the same way
as the earlier example, looking for the dict() function:

$ python2.7 -m dis func-members.py
1 0 LOAD_NAME 0 (dict)

Then it showed key/value pairs being pushed onto the stack usingLOAD_CONST to create named arguments for the function.

The disassembled version of this example showed a few differences from
the literal example without any values. First, the argument toBUILD_MAP was 3 instead of 0, indicating that there were three
key/value pairs on the stack to go into the dictionary.

$ python2.7 -m dis literal-members.py
1 0 BUILD_MAP 3

It also showed the values and then keys being pushed onto the stack
using LOAD_CONST.

3 LOAD_CONST 0 ('A')
6 LOAD_CONST 1 ('a')

Finally, a new opcode, STORE_MAP, appeared once after each
key/value pair is processed.

After looking at the output more closely, I noticed that there were
actually fewer opcodes in the function form than the literal
form. There were no STORE_MAP opcodes, just the CALL_FUNCTION
after all of the items were on the stack. At this point I realized
that in order to really understand what was going on, I would have to
look at the interpreter implementation.

Interpreter Source

Up to now, I had been examining the behavior of the interpreter by
feeding it inputs and seeing what it did with them. To examine it at a
deeper level, I had to download the source following the instructions
in the Dev Guide.

The interpreter evaluates opcodes in a loop defined inPyEval_EvalFrameEx() in Python/ceval.c. Each opcode name
corresponds to an entry in the switch statement. For example, thePOP_TOP opcode that appears near the end of each disassembled
example is implemented as:

1
2
3
4

casePOP_TOP:v=POP();Py_DECREF(v);gotofast_next_opcode;

The top-most item is removed from the stack and its reference count is
decremented to allow it (eventually) to be garbage collected. After
orienting myself in the source, I was ready to trace through the
opcodes used in the examples above.

What Happens When You Call dict()?

The disassembly above shows that the opcodes used to call dict()
to create a dictionary are LOAD_NAME, LOAD_CONST, andCALL_FUNCTION.

The LOAD_NAME opcode finds the object associated with the given
name (“dict” in this case) and puts it on top of the stack.

The oparg value indicates which constant to take out of the set of
constants found in the code object. The constant’s reference count is
increased and then it is pushed onto the top of the stack. This is an
inexpensive operation since no name look-up is needed.

The portion of the implementation of CALL_FUNCTION in the case
statement looks similarly simple:

The function is called and its return value is pushed onto the
stack. (The WITH_TSC conditional compilation instruction controls
whether the Pentium timestamp counter is used, and can be ignored for
this general analysis.)

The implementation of call_function() starts to expose some of
the complexity of calling Python functions.

staticPyObject*call_function(PyObject***pp_stack,intoparg#ifdef WITH_TSC,uint64*pintr0,uint64*pintr1#endif){intna=oparg&0xff;intnk=(oparg>>8)&0xff;intn=na+2*nk;PyObject**pfunc=(*pp_stack)-n-1;PyObject*func=*pfunc;PyObject*x,*w;/* Always dispatch PyCFunction first, because these are presumed to be the most frequent callable object. */if(PyCFunction_Check(func)&&nk==0){intflags=PyCFunction_GET_FLAGS(func);PyThreadState*tstate=PyThreadState_GET();PCALL(PCALL_CFUNCTION);if(flags&(METH_NOARGS|METH_O)){PyCFunctionmeth=PyCFunction_GET_FUNCTION(func);PyObject*self=PyCFunction_GET_SELF(func);if(flags&METH_NOARGS&&na==0){C_TRACE(x,(*meth)(self,NULL));}elseif(flags&METH_O&&na==1){PyObject*arg=EXT_POP(*pp_stack);C_TRACE(x,(*meth)(self,arg));Py_DECREF(arg);}else{err_args(func,flags,na);x=NULL;}}else{PyObject*callargs;callargs=load_args(pp_stack,na);READ_TIMESTAMP(*pintr0);C_TRACE(x,PyCFunction_Call(func,callargs,NULL));READ_TIMESTAMP(*pintr1);Py_XDECREF(callargs);}}else{if(PyMethod_Check(func)&&PyMethod_GET_SELF(func)!=NULL){/* optimize access to bound methods */PyObject*self=PyMethod_GET_SELF(func);PCALL(PCALL_METHOD);PCALL(PCALL_BOUND_METHOD);Py_INCREF(self);func=PyMethod_GET_FUNCTION(func);Py_INCREF(func);Py_DECREF(*pfunc);*pfunc=self;na++;n++;}elsePy_INCREF(func);READ_TIMESTAMP(*pintr0);if(PyFunction_Check(func))x=fast_function(func,pp_stack,n,na,nk);elsex=do_call(func,pp_stack,na,nk);READ_TIMESTAMP(*pintr1);Py_DECREF(func);}/* Clear the stack of the function object. Also removes the arguments in case they weren't consumed already (fast_function() and err_args() leave them on the stack). */while((*pp_stack)>pfunc){w=EXT_POP(*pp_stack);Py_DECREF(w);PCALL(PCALL_POP);}returnx;}

The number arguments the function is passed is given in oparg. The
low-end byte is the number of positional arguments, and the high-end
byte is the number of keyword arguments (lines 8-9). The value 768 in
the example above translates to 3 keyword arguments and 0 positional
arguments.

21 CALL_FUNCTION 768

There are separate cases for built-in functions implemented in C,
function written in Python, and methods of objects. All of the cases
eventually use load_args() to pull the positional arguments off
of the stack as a tuple:

In this case, a set of keyword arguments are passed so the very last
case is triggered and PyDict_Merge() is used to copy the keyword
arguments into the dictionary. There are a couple of cases for
merging, but from what I can tell because there are two dictionaries
involved the first case applies. The target dictionary is resized to
be big enough to hold the new values, and then the items from the
merging dictionary are copied in one at a time.

intPyDict_Merge(PyObject*a,PyObject*b,intoverride){registerPyDictObject*mp,*other;registerPy_ssize_ti;PyDictEntry*entry;/* We accept for the argument either a concrete dictionary object, * or an abstract "mapping" object. For the former, we can do * things quite efficiently. For the latter, we only require that * PyMapping_Keys() and PyObject_GetItem() be supported. */if(a==NULL||!PyDict_Check(a)||b==NULL){PyErr_BadInternalCall();return-1;}mp=(PyDictObject*)a;if(PyDict_Check(b)){other=(PyDictObject*)b;if(other==mp||other->ma_used==0)/* a.update(a) or a.update({}); nothing to do */return0;if(mp->ma_used==0)/* Since the target dict is empty, PyDict_GetItem() * always returns NULL. Setting override to 1 * skips the unnecessary test. */override=1;/* Do one big resize at the start, rather than * incrementally resizing as we insert new items. Expect * that there will be no (or few) overlapping keys. */if((mp->ma_fill+other->ma_used)*3>=(mp->ma_mask+1)*2){if(dictresize(mp,(mp->ma_used+other->ma_used)*2)!=0)return-1;}for(i=0;i<=other->ma_mask;i++){entry=&other->ma_table[i];if(entry->me_value!=NULL&&(override||PyDict_GetItem(a,entry->me_key)==NULL)){Py_INCREF(entry->me_key);Py_INCREF(entry->me_value);if(insertdict(mp,entry->me_key,(long)entry->me_hash,entry->me_value)!=0)return-1;}}}else{/* Do it the generic, slower way */PyObject*keys=PyMapping_Keys(b);PyObject*iter;PyObject*key,*value;intstatus;if(keys==NULL)/* Docstring says this is equivalent to E.keys() so * if E doesn't have a .keys() method we want * AttributeError to percolate up. Might as well * do the same for any other error. */return-1;iter=PyObject_GetIter(keys);Py_DECREF(keys);if(iter==NULL)return-1;for(key=PyIter_Next(iter);key;key=PyIter_Next(iter)){if(!override&&PyDict_GetItem(a,key)!=NULL){Py_DECREF(key);continue;}value=PyObject_GetItem(b,key);if(value==NULL){Py_DECREF(iter);Py_DECREF(key);return-1;}status=PyDict_SetItem(a,key,value);Py_DECREF(key);Py_DECREF(value);if(status<0){Py_DECREF(iter);return-1;}}Py_DECREF(iter);if(PyErr_Occurred())/* Iterator completed, via error */return-1;}return0;}

Creating a Dictionary with {}

The opcodes used to implement the literal examples are BUILD_MAP,LOAD_CONST, and STORE_MAP. I started with the first opcode,BUILD_MAP, which creates the dictionary instance:

/* Create a new dictionary pre-sized to hold an estimated number of elements. Underestimates are okay because the dictionary will resize as necessary. Overestimates just mean the dictionary will be more sparse than usual.*/PyObject*_PyDict_NewPresized(Py_ssize_tminused){PyObject*op=PyDict_New();if(minused>5&&op!=NULL&&dictresize((PyDictObject*)op,minused)==-1){Py_DECREF(op);returnNULL;}returnop;}

The argument for BUILD_MAP is the number of items that are going
to be added to the new dictionary as it is created. The disassembly
for literal-members.py showed that value as 3 earlier.

$ python2.7 -m dis literal-members.py
1 0 BUILD_MAP 3

Specifying the initial number of items in the dictionary is an
optimization for managing memory, since it means the table size can be
set ahead of time and it does not need to be reallocated in some
cases.

The argument for BUILD_MAP is the number of items that are
going to be added to the new dictionary as it is created.

Each key/value pair is added to the dictionary using three
opcodes. Two instances of LOAD_CONST push the value, then the key,
onto the stack. Then a STORE_MAP opcode adds the pair to the
dictionary.

2 10 LOAD_CONST 2 ('B')
13 LOAD_CONST 3 ('b')
16 STORE_MAP

As we saw early, LOAD_CONST is fairly straightforward and
economical. STORE_MAP looks for the key, value, and dictionary on
the stack and calls PyDict_SetItem() to add the new key/value
pair. The STACKADJ(-2) line removes the key and value off from the
stack.

The functions it calls handle resizing the internal data structure
used by the dictionary, if that’s necessary.

Extreme Tests

I now had an answer explaining why the literal syntax was so much more
efficient than using the name to instantiate a dictionary. I had
noticed earlier, though, that the performance benefits were reduced
when I added a few key/value pairs so I wanted to see if that trend
continued as I added more arguments.

I decided to jump right to the maximum point, and create dictionaries
with 255 members. Because the number of keyword arguments passed to a
function is represented in a byte, a function may be passed at most
255 literal keyword arguments (you can pass more arguments if you
populate a dictionary and use the syntax callable(**kwds), but 255
seemed like a reasonable number to test).

Not wanting to type out all of those arguments, I created one script to
generate a call to dict() with 255 arguments:

The difference has narrowed, but using dict() still takes 1.6
times as long as {}.

Conclusions

In summary, calling dict() requires these steps:

Find the object associated with the name “dict” and push it
onto the stack.

Push the key/value pairs onto the stack as constant values.

Get the key/value pairs off of the stack and create a dictionary
to hold the keyword arguments to the function.

Call the constructor for dict to make a new object.

Initialize the new object by passing the keyword arguments to its
initialization method.

Resize the new dict and copy the key/value pairs into it
from the keyword arguments.

Whereas using {} to create a dictionary uses only these steps:

Create an empty but pre-allocated dictionary instance.

Push the key/value pairs onto the stack as constant values.

Store each key/value pair in the dictionary.

The times involved here are pretty small, but as a general principle I
try to avoid code constructions I know to introduce performance hits.
On the other hand, there may be times when using dict() is
necessary, or easier. For example, in versions of Python earlier than
2.7, creating a dictionary from an iterable required using a generator
expression as argument to dict():

d=dict((k,v)fork,vinsome_iterable)

With 2.7 and later, though, dictionary comprehensions are built into
the language syntax:

{k:vfork,vinsome_iterable}

So that eliminates one common case for calling dict(). Using
the literal dictionary syntax feels more “pythonic” to me, so I try to
just do it anyway.