That works, but if you have a big file Django will rack up a database query for each row. That can take a long time to finish.

Lucky for us, PostgreSQL has a built-in tool called COPY that hammers data in and out the database with one quick query.

This package tries to make using COPY as easy as any other database routine supported by Django. It is implemented by a custom model manager.

Here’s how it imports a CSV to a database table.

frommyapp.modelsimportMyModelMyModel.objects.from_csv("./data.csv",# The path to a source file (a Python file object is also acceptable)dict(name='NAME',number='NUMBER')# A crosswalk of model fields to CSV headers.)

frommyapp.modelsimportPersonfromdjango.core.management.baseimportBaseCommandclassCommand(BaseCommand):defhandle(self,*args,**kwargs):# Since the CSV headers match the model fields,# you only need to provide the file's path (or a Python file object)insert_count=Person.objects.from_csv('/path/to/my/import.csv')print"{} records inserted".format(insert_count)

frommyapp.modelsimportPersonfromdjango.core.management.baseimportBaseCommandclassCommand(BaseCommand):defhandle(self,*args,**kwargs):# All this method needs is the path to your CSV.# (If you don't provide one, the method will return the CSV as a string.)Person.objects.to_csv('/path/to/my/export.csv')

Run your exporter and that’s it.

$ python manage.py myexportcommand

That’s it. You can even export your queryset after any filters or other tricks. This will work:

The path to the delimited data file, or a Python file
object containing delimited data

Keyword Argument

Description

mapping

A (optional) dictionary: keys are strings corresponding
to the model field, and values correspond to string
field names for the CSV header. If not informed, the
mapping is generated based on the CSV file header.

drop_constraints

A boolean that indicates whether or not constraints
on the table and fields and should be dropped prior to
loading, then restored afterward. Default is True.
This is done to boost speed.

drop_indexes

A boolean that indicates whether or not indexes
on the table and fields and should be dropped prior to
loading, then restored afterward. Default is True.
This is done to boost speed.

delimiter

The character that separates values in the data file.
By default it is “,”. This must be a single one-byte
character.

quote_character

Specifies the quoting character to be used when a
data value is quoted. The default is double-quote.
This must be a single one-byte character.

null

Specifies the string that represents a null value.
The default is an unquoted empty string. This must
be a single one-byte character.

force_not_null

Specifies which columns should ignore matches
against the null string. Empty values in these columns
will remain zero-length strings rather than becoming
nulls. The default is None. If passed, this must be
list of column names.

force_null

Specifies which columns should register matches
against the null string, even if it has been quoted.
In the default case where the null string is empty,
this converts a quoted empty string into NULL. The
default is None. If passed, this must be list of
column names.

encoding

Specifies the character set encoding of the strings
in the CSV data source. For example, 'latin-1',
'utf-8', and 'cp437' are all valid encoding
parameters.

By default, the COPY command cannot transform data on-the-fly as it is loaded into the database.

This library first loads the data into a temporary table before inserting all records into the model table. So it is possible to use PostgreSQL’s built-in SQL methods to modify values during the insert.

As an example, imagine a CSV that includes a column of yes and no values that you wanted to store in the database as 1 or 0 in an integer field.

You can provide a SQL statement for how to transform the data during the insert into the model table. The transformation must include a string interpolation keyed to “name”, where the title of the database column will be slotted.

A second approach is to provide a SQL string for how to transform a field during the insert on the model itself. This lets you specify different transformations for different fields of the same type.

You must name the method so that the field name is sandwiched between copy_ and _template. It must return a SQL statement with a string interpolation keyed to “name”, where the name of the database column will be slotted.

For the example above, the model might be modified to look like this.

fromdjango.dbimportmodelsfrompostgres_copyimportCopyManagerclassPerson(models.Model):name=models.CharField(max_length=500)value=models.IntegerField()objects=CopyManager()defcopy_value_template(self):return""" CASE WHEN "%(name)s" = 'yes' THEN 1 WHEN "%(name)s" = 'no' THEN 0 END """

And that’s it.

Here’s another example of a common issue, transforming the CSV’s date format to one PostgreSQL and Django will understand.

It’s important to handle empty strings (by converting them to NULL) in this example. PostgreSQL will accept empty strings, but Django won’t be able to ingest the field and you’ll get a strange “year out of range” error when you call something like MyModel.objects.all().

If your model has columns that are not in the CSV, you can set static values for what is inserted using the static_mapping keyword argument. It will insert the provided values into every row in the database.

An example could be if you want to include the name of the source CSV file along with each row.

The from_csv method connects with a lower level CopyMapping class with optional hooks that run before and after the COPY statement. They run first when the CSV is into a temporary table and then again before and after the INSERT statement that then slots data into your model’s table.

If you have extra steps or more complicated logic you’d like to work into a loading routine, CopyMapping and its hooks provide an opportunity to extend the base library.

To try them out, subclass CopyMapping and fill in as many of the optional hook methods below as you need.

frompostgres_copyimportCopyMappingclassHookedCopyMapping(CopyMapping):defpre_copy(self,cursor):print"pre_copy!"# Doing whatever you'd like heredefpost_copy(self,cursor):print"post_copy!"# And heredefpre_insert(self,cursor):print"pre_insert!"# And heredefpost_insert(self,cursor):print"post_insert!"# And finally here

Now you can run that subclass directly rather than via a manager. The only differences are that model is the first argument CopyMapping, which creates an object that is executed with a call to its save method.

frommyapp.modelsimportPersonfrommyapp.loadersimportHookedCopyMappingfromdjango.core.management.baseimportBaseCommandclassCommand(BaseCommand):defhandle(self,*args,**kwargs):# Note that we're using HookedCopyMapping herec=HookedCopyMapping(Person,'/path/to/my/data.csv',dict(name='NAME',number='NUMBER'),)# Then save it.c.save()

The to_csv manager method only requires one argument, the path to where the CSV should be exported. It also allows users to optionally limit or expand the fields written out by providing them as additional parameters. Other options allow for configuration of the output file.