Advanced Usage of the Bulkloader

property which is list of date. Write your own transform function to convert delimited string to a list of date and vise verse.

It is straightforward to write your own config.yaml for bulkloader to import/export simple entity with simple properties, it is not when come to relationship and list properties. Insufficient documentation from Google is one reason, lack of python skill is the other. It took me hours after hours to figure them out. If you do not want to spend your precious time to repeat what I went through, you are at right place. Relax and read on.

As of this writing, the bulk loader service is only implemented in Python, not in Java, but it works with any app deployed in GAE implemented with either Python or Java.

Before we go, I assume you have already installed python 2.5 or above. If didn’t please download it from here. I also assume you have already know the bulkloader yaml configure file. You either create one from scratch or auto generate from your data store by command create_config_file. Please refer Python section in App Engine documentation for detail.

The class diagram below shows sample entities used throughout this post.

The House is the parent of Deal, in other words, it owns Deal.
The Booking is the parent of Guest, i.e, it owns Guest. It also refers House and Deal with unowned relationship.

import/export owned entities

From code point of view, it is easy to think that the parent entity carries the child and thus the relationship:
public class Booking{
private List<Guest> guests;
}

But from data store point of view, opposite is the case: the child carries the relationship information. The relationship is burned to child’s key, which carries keys of its parent and all ancestors. Therefore it is clear that the relationship goes with child entity when come to import or export.

On export, we break its key to two parts: the parent key and its own key; on import, we do the opposite: create a complete key by merging its own key and all keys of its ancestors. Here is the data file:
firstName,middleName,lastName,age,bookingKey,guestKey
GTom Jr.,,Yong,12,BTom20110702,Tom12
GTom Sr.,,Yong,62,BTom20110702,Tom62

To track the relationship easier, use application assigned key name for all entities:
@PrimaryKey
@Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
@Extension(vendorName="datanucleus", key="gae.encoded-pk", value="true")
private String encodedKey;
/**
* App assigned key name for this instance, be part of the complete key.
* Without this property, new entity will be assigned by app engine with an int value.
*/
@Persistent
@Extension(vendorName="datanucleus", key="gae.pk-name", value="true")
private String keyName;

import/export list of dates

With CSV connector, if property is a list, its values have be to concatenated to a single string on export and broken back to a list on import, assuming the data type of the element is not changed during the transformations: it stays as string. The bulk loader has a simple way to accomplish this task:
- kind: Booking
connector: csv
connector_options:
encoding: utf-8
property_map:
- property: __key__
external_name: bookingKey
export_transform: transform.key_id_or_name_as_string
- property: keys2deals
external_name: keys2deals
# A semicolon-separated list.
import_transform: "lambda x: x.split(';')"
export_transform: "';'.join"

The unowned relationship is implemented by holding keys to the referred entities. In the case of Booking, the keys2deals, is a list of string where each element is a key of Deal entity. Because the data type stays as string, they are easier to be imported and exported with in-line functions showed above.

The Booking entity also has a property as a list of dates. Because the data type is not string, a data type conversion need to be performed in addition to the format transformation. On export, first convert the date list to a string list, then format the string list to a delimited single string. On import, first break the delimited single string to a string list, then convert the string list to date list. Because google.appengine.ext.bulkload.transform package does not have functions for the job, we have to write our own:
#!/usr/bin/python2.5
from datetime import datetime, date
#convert a single date string to date
def toDate(datestring):
#Unsupported type datetime.date
#return datetime.strptime(datestring,fmt).date()
return datetime.strptime(datestring,fmt)
#convert a list of date strings to list of date
def toDateList(format,delimiter):
global fmt
fmt = format
def to_date_list(value):
return map(toDate,value.split(delimiter))
return to_date_list
#convert a single date to string
def dateToString(dt):
return dt.strftime(fmt)
#convert list of dates to a single string
def dateListToString(format,delimiter):
global fmt
fmt = format
def date_list_to_string(value):
dateStringList = map(dateToString,value)
return delimiter.join(dateStringList)
return date_list_to_string

The top level function toDateList is used on import and dateListToString is used on export, both take format and delimiter as arguments:
....
- import: transformers
....
- kind: Booking
...
- property: bookedDates
external_name: bookedDates
import_transform: transformers.toDateList('%Y-%m-%d',';')
export_transform: transformers.dateListToString('%Y-%m-%d',';')