Migrating the Drupal way. Part I: creating a node.

My position with Acquia will find me helping out with a lot of migrations and upgrades. I'm going to embark on a multiple-part blog to discuss some of the common techniques that I use when moving clients to Drupal.

Migrating to Drupal can seem intimidating if you already maintain a database-driven website. However, populating a Drupal site with your current content might be easier than you think. Whether you are migrating from a popular CMS or a fully custom application, you can easily use Drupal modules to mimic your current data structures and migrate your data using a simple custom PHP script. I should note that while there are several different methods to accomplish this task, this happens to be my favorite.

When interacting with Drupal, it's a good idea to do things the Drupal way. Fortunately, Drupal core allows you to bootstrap Drupal and use all of its API functionality outside of a normal Drupal instance. For yours truly, learning about this has been a godsend because it provides a fast, simple way to migrate data.

Creating a basic node

When writing an import script, you will need to bootstrap Drupal to use the API functions. Using drupal_bootstrap($phase), you can load Drupal up to a certain loading phase by designating a $phase argument. The value of $phase allows you to specifically load the site configuration, database layer, modules and other requisite functionality. For our purposes, we will use drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL) to make sure that we have access to the whole API.

Note: Make sure that you create this script in the root of your Drupal installation.

Creating more complex nodes

The script above will create a new node with a title and body that is published and promoted to the homepage. However, the process becomes more slightly more complicated if you have more data than simple title and body fields. The CCK module is a popular method to extend your nodes by adding any number of custom fields. When Drupal displays your content, CCK adds your custom fields to the node object using hook_nodeapi(). Luckily, you can replicate this by adding your own fields in the import script. So, how can you find out the structure of these fields? One really easy method is to use the Devel module.

The Devel module can be used to show how Drupal sees your node object

Using the Devel module

The Devel module is a great way to see, among other things, the structure of the node object which is invaluable in this case. After installing the module and viewing a node you will see new tabs: Dev load and Dev render. Click the Dev load tab, then click the "... (Object) stdClass" header to expand the node object definition. Here you will find some familiar data like nid, type, etc. Near the bottom, you will see some other definitions that begin with "field_". These should resemble the CCK fields that you created for your node type.

Depending on your CCK definitions, the assignments in your import script might look like one of the following:

Here you can see some examples of how CCK has added fields to the node object

Add these assignments to your import script and you will start to see the power of the Drupal API. Let's say you are migrating from another CMS with a number of related fields, categories, images, etc. You could expand this script to iterate through your old database and map all of the related elements to a corresponding node object. Execute your script, and all of your old data will now become Drupal data! The best part about using the API is that it takes care of all of everything from search indexing to path aliases and all of the other little things we might overlook.

Migrating to Drupal can seem like a daunting task, but when doing things the Drupal way it's quite straight forward. Whether you are planning a migration of 100 nodes or 100,000 nodes, proper scripting can make it seem like a breeze!

Commentaires

node_save is one way to create the node, but my preference is
drupal_execute which has the benefit of creating the node in a more
Drupalish way (i.e. executing the validation from modules that care
about the node prior to it being saved). node_save is probably
faster, but I'd rather have valid data than fast data.

This usually sets defaults for most items, and you can always
override the uid, date or other items later. The biggest benefit of
this, is the invocation of hook_prepare and hook_nodeapi (with op
'prepare'). If you have other modules that take advantage of these
hooks, and want them to work on our imported nodes, then you will need
to call node_object_prepare with your node, at some point in your
import script.

Note that you could just as easily call this from the end of your
import script as well, but then you wouldn't have the chance to
override it's values.

I'm having trouble using node_save with a cck nodereference field
(on Drupal 5.x). Anyone know of further documentation on that?

A number of entries at drupal.org mention the problem (e.g. http://drupal.org/node/275754
) but haven't helped. It seems that using a select list on a custom
form to set the value will work, but programatically setting the value
using the syntax suggested above
$node->field_nodereference[0]['nid'] = 58; will fail.
Advice?

The code above works fine in 6.x, but I didn't test it in 5. In the
past, I've done nodereference imports for 5.x by populating the
database field manually after creating the node. You could
use something like the above script and then add:

Thanks for the summary, Kevin: the little fiddly bits like
status can cause node_save to fail silently,
and it's really hard with just the Devel module alone to work out
precisely what's the bare minimum needed to save a node.

If anyone's interested, Node factory is
really good at handling the bare bones of node creation (basically the
second PHP block in your post). It's still considered bleeding-edge by
its maintainer, because I think he wants to nail CCK support, but for
one-time imports I'd happily use it to set up basic nodes without
reservation.

Another big point, in the drupal_execute() vs. node_save()
decision, is that using node_save() will bypass most validation for
nodes such as required and non-required fields, allowed values for CCK
fields, length of fields and any custom validation you've added in a
custom module.

This can be a big advantage or a huge shock depending on how you
look at it. I tend to prefer using node_save() over drupal_execute()
for this reason. I don't particularly care for sanitizing my clients
data for them, unless we agree to not import anything that doesn't
validate, but many times in the course of importing there is an
occasional missing field which can throw errors with
drupal_execute().

Thanks for these great tips. I was wondering if there's a good way
to import pages and map them to a menu item programmatically. (i.e.
import an 'about us' page, then map that page/path to the primary
menu. I can see how the path is set, but not how mapping to a
particular menu item is accomplished.) Any thoughts?

For my imports, I always have to check whether the node has already
been imported, since our import data may have been edited. In that
case, if I've found an existing node (by checking the cck field where
I stored the imported data's 'original' id), I load that node and set
$node['updated'] to time() and $node['revision'] to 1 (create
revision) or 0 (do not).

There is no reason to ever manually specify a node['nid'] (other
than updates, as stated above). In D6 it's an auto-increment field. If
you need to store an original id (as we do), you'll have to put it in
a cck field.

I'm trying to import data using an insert if it doesn't already
exist, update if it does strategy. Luckily, the record id of the
import data file is the same as the nid of the existing node (if any
such).

I perform a full bootstrap, but when I call node_load($nid) to
determine whether the first node exists I get garbage back. I stepped
into the code and the problem seems to be related to drupal not
properly unserializing the schema data after retrieving it from the
cache.