dumpdata/loaddata with MySQL and ForeignKeys

InnoDB tables within MySQL have no ability to defer reference checking until after a transaction is complete. This prevents most dumpdata/loaddata cycles unless the dump order falls so that referenced models are dumped before models that depend on them.

This code uses Ofer Faigon's topological sort to sort the models so that any models with a ForeignKey relationship are dumped after the models they reference.

Fixtures are an important part of the django Unit Testing framework so I really needed to be able to test my more complicated models.

Caveats

You use this snippet to dump the data and the built in manage.py loaddata to load the fixture output by this program. A similar solution could be applied to the XML processing on the loaddata side but this sufficed for my situations.

This code does not handle Circular or self-references. The loaddata for those needs to be much smarter.

#!/usr/bin/env python# Author: <[email protected]>## Purpose: Given a set of classes, sort them such that ones that have# ForeignKey relationships with later keys are show up after# the classes they depend on## Created: 12/27/07## Modified: 3/20/08## Graham King added the abilility to walk other ManyToMany# relationships as well as handling fixtures such as content typesfromdjango.core.managementimportsetup_environimportsettingssetup_environ(settings)importsysfromdjango.dbimportmodelsfromdjango.core.management.baseimportCommandErrorfromdjango.coreimportserializers# Original topological sort code written by Ofer Faigon# (www.bitformation.com) and used with permissiondefforeign_key_sort(items):"""Perform topological sort. items is a list of django classes Returns a list of the items in one of the possible orders, or None if partial_order contains a loop. """defadd_node(graph,node):"""Add a node to the graph if not already exists."""ifnotgraph.has_key(node):graph[node]=[0]# 0 = number of arcs coming into this node.defadd_arc(graph,fromnode,tonode):"""Add an arc to a graph. Can create multiple arcs. The end nodes must already exist."""graph[fromnode].append(tonode)# Update the count of incoming arcs in tonode.graph[tonode][0]=graph[tonode][0]+1# step 1 - create a directed graph with an arc a->b for each input# pair (a,b).# The graph is represented by a dictionary. The dictionary contains# a pair item:list for each node in the graph. /item/ is the value# of the node. /list/'s 1st item is the count of incoming arcs, and# the rest are the destinations of the outgoing arcs. For example:# {'a':[0,'b','c'], 'b':[1], 'c':[1]}# represents the graph: c <-- a --> b# The graph may contain loops and multiple arcs.# Note that our representation does not contain reference loops to# cause GC problems even when the represented graph contains loops,# because we keep the node names rather than references to the nodes.graph={}# iterate once for the nodesforvinitems:add_node(graph,v)# iterate again to pull out the dependency informationforainitems:rel_lst=related_field_filter(a._meta.fields)# Add foreign keysrel_lst.extend(a._meta.many_to_many)# Add many to manyforbinrel_lst:# print "adding arc %s <- %s" % (b.rel.to, a)add_arc(graph,b.rel.to,a)# Step 2 - find all roots (nodes with zero incoming arcs).roots=[nodefor(node,nodeinfo)ingraph.items()ifnodeinfo[0]==0]# step 3 - repeatedly emit a root and remove it from the graph. Removing# a node may convert some of the node's direct children into roots.# Whenever that happens, we append the new roots to the list of# current roots.sorted=[]whilelen(roots)!=0:# If len(roots) is always 1 when we get here, it means that# the input describes a complete ordering and there is only# one possible output.# When len(roots) > 1, we can choose any root to send to the# output; this freedom represents the multiple complete orderings# that satisfy the input restrictions. We arbitrarily take one of# the roots using pop(). Note that for the algorithm to be efficient,# this operation must be done in O(1) time.root=roots.pop()sorted.append(root)forchildingraph[root][1:]:graph[child][0]=graph[child][0]-1ifgraph[child][0]==0:roots.append(child)delgraph[root]iflen(graph.items())!=0:# There is a loop in the input.raiseCircularReferenceException,"Circular Dependency Detected in Input"returnsorted# Problem: #classCircularReferenceException(Exception):passdefisclass(obj):ifstr(obj).find("<class")==0:returnTruereturnFalsedeffind_classes(module):classes=[]fork,objinmodule.__dict__.iteritems():ifisclass(obj):# print k, "is a class!"classes.append(obj)returnclassesdefmodel_filter(lst):""" Given a list of classes, Filter out everything that's not an instance of models.Model """returnfilter(lambdax:issubclass(x,models.Model),lst)defrelated_field_filter(lst):""" given a list of fields, return the ones that are related """ret=[]forfinlst:s=str(f)ifs.find('django.db.models.fields.related')>=0:ret.append(f)returnretdefmain():fromoptparseimportOptionParser,make_optionfromdjango.db.modelsimportget_app,get_apps,get_modelsoption_list=(make_option('--format',default='json',dest='format',help='Specifies the output serialization format for fixtures.'),make_option('--indent',default=None,dest='indent',type='int',help='Specifies the indent level to use when pretty-printing output'),make_option('--traceback',default=False,dest='traceback',type='int',help='Specifies the indent level to use when pretty-printing output'),)help='Output the contents of the database as a fixture of the given format.'args='[appname ...]'parser=OptionParser(option_list=option_list)(options,app_labels)=parser.parse_args()iflen(app_labels)==0:app_list=get_apps()else:app_list=[get_app(app_label)forapp_labelinapp_labels]# Check that the serialization format exists; this is a shortcut to# avoid collating all the objects and _then_ failing.ifoptions.formatnotinserializers.get_public_serializer_formats():raiseCommandError("Unknown serialization format: %s"%options.format)try:serializers.get_serializer(options.format)exceptKeyError:raiseCommandError("Unknown serialization format: %s"%options.format)objects=[]models=[]forappinapp_list:models.extend(get_models(app))models=foreign_key_sort(models)formodelinmodels:objects.extend(model._default_manager.all())try:printserializers.serialize(options.format,objects,indent=options.indent)exceptException,e:ifoptions.traceback:raiseraiseCommandError("Unable to serialize database: %s"%e)if__name__=='__main__':main()