Headless Drupal: Using Drupal’s API to Batch Script Your Drupal Site.

January 20, 2009

Whenever I work with a significant framework or off-the-shelf software, I invariably encounter situations in which I need to do “one-off” programmatic batch tasks outside the normal flow of the application.

Of course, you can look at the database structure and manipulate the data directly in a database client or through your favorite programming language, but this can actually be less convenient (and less safe) then directly using the application’s API which encapsulates and abstracts away the underlying data structure.

And often we are already familiar with this API anyway as a result of using the framework or customizing the software. Unfortunately, its not always obvious how to invoke the application in an entirely programmatic way to perform these types of tasks. These methods usually exist, but they are often not well documented.

Today, I will explore how to do some programmatic manipulation of Drupal (specifically Drupal 6, although this approach is very similar in Drupal 5) showing specific examples to get you started creating your own scripts.

Invoking Drupal Programmatically

Invoking Drupal programmatically is surprisingly simple, with just a few lines of code:

The drupal_bootstrap() function can take other constants to load certain parts of Drupal. I usually just fully load Drupal to ensure that I have access to the full API and can perform any task I require.

And because we are the only ones who will see this execute, and we want to know if anything goes wrong, we may want to set PHP’s error reporting manually by adding this just before the bootstrapping process:

error_reporting(E_ALL);

We now have access to Drupal’s API and can use it to manipulate our Drupal site programmatically.

But before we do anything interesting with this foundation, you may be wondering how to actually execute this code. The easiest way is to just create a PHP script in your Drupal root directory alongside Drupal’s cron.php script, add the code you want, and navigate to it in your browser. So, if Drupal is installed in a subdirectory called ‘drupal_example’ and we added this code to the a file in that subdirectory called ‘batch_example.php’, we would simply visit this URL to invoke it:

http://www.example.com/drupal_example/batch_example.php

This may seem like an odd way to invoke a batch processing script, especially if you are coming from another language. But as I said, this is the easiest approach, which pretty much eliminates any possibility of things like path errors, and it allows you to spit out nicely formatted HTML.

If you really want to invoke this script on the command line, I would even suggest that you not do this by calling the PHP binary, but instead pass the URL to wget, which will have the same effect as loading the script in your browser:

wget http://www.example.com/drupal_example/batch_example.php

In fact, this is exactly how you typically invoke the cron.php Drupal script from your system’s cron, and obviously, we can do the same with these batch scripts to run periodic scripts that don’t logically fit into our custom modules, inside a hook_cron() function.

With all that out of the way, lets start actually doing something useful.

Querying Drupal, Outputting HTML, Email

As the name suggests, the db_query() method in the Drupal API allows you to send a query to the underlying database, without having to manage the database connection yourself. You can then use db_fetch_object() to access the data. For illustration purposes, I am going to use a trivial query that will list the node types that have been created in the site:

I simply constructed a query for the nodes I wanted to find, in this case, pages only, and returned the “nid” or node identifier field. With this, I can iterate over the results, passing the nid to node_load() to instantiate the basic node object.

Once we have an instantiated node object, say, in a variable called “$node” we can access fields like the following that might be of interest:

$node->nid: the node’s ID.

$node->vid: the version ID for the node.

$node->type: basically, the content type, such as a ‘page’ or ‘blog’.

$node->uid: the author’s user ID.

$node->created: the date the node was created, stored as a UNIX timestamp.

$node->changed: the date the node was last updated, stored as a UNIX timestamp.

$node->title: the title assigned to the node.

$node->body: the entire representation of the node.

$node->content[‘body’][‘#value’]: the actual value assigned to the body.

Notice that in the example above, I used the format_date() function in the Drupal API to convert the date fields to something human readable. The two examples of format_date() suggest its flexibility.

Now that we know how to access node fields, we can easily update these fields.

Batch Updating Drupal Nodes

Now things are starting to get interesting. Once you have a node object, simple assignment can be used to change its values. In the following example, I will disable commenting on all page nodes in a site:

The call to node_submit() allows installed modules to act on this node before it is saved. So for example, the core Drupal node module sets the creation date of the node. The $node->validated check makes sure the node has finished this process successfully, and of course, node_save() actually saves the node back to the database.

Batch Creating Drupal Nodes

As long as Drupal’s machinery has something that looks and acts like a node, it will treat it as a node. So we can simply create a generic object by calling new StdClass(), make the necessary assignments and save it as we did before. In the following example, I will create ten story nodes:

To make this easier, I moved the node creation to a function with some sensible default arguments that can be overridden as needed. This function populates a very bare-bones node, and most likely you will want to expand on what I’ve provided here.

Batch Deleting Drupal Nodes

As you may have guessed by now, to delete a node, we call the node_delete() method, passing it a node identifier. So, to clean up the batch creation script we just ran, let’s delete all story nodes created withing the past hour:

50 Comments

Comment by Jason

2009-01-28 10:37:09

Thanks so much for this tutorial! I’ve searched high and low for useful details and examples.

I’m going to need to programatically create, update, and delete nodes from a cron job, unfortunately it involves several custom CCK content types, so I would be very very interested in seeing a similar tutorial for that task.

Thanks for the info. I had something similar, but you nailed down some loose ends I had not gotten around to.

But a warning – be careful of the Update script! You mention that calling node_save will call the Drupal core. It will change your created and changed dates in the node table. Suddenly, all your posts are out of order. I have not yet determined if that was a side-effect of one of the other installed optional modules that I have, or if it would do the same on a pure, clean, fresh Drupal install. I’ll let you know what I discover.

Yes, the node_submit() function described above will set these dates. If this is causing you problems, you may be able to do away with this step, and the subsequent validation check. These exist mainly to play nicely with other modules.

I’m having to do some batch processing on several nodes. These nodes have been created on a local Drupal installation and are images, videos or audio files (which make them sometimes big). My batch work has to send these nodes to another Drupal installation, located on the web. The work would be something like this.

for each local node
1) get local node infos (attached files, taxonomy, etc)
2) send all the data (images, videos, …) by FTP to the remote Drupal files directory.
3) Call several XMLRPC services on the remote Drupal to create the node, attach the sent data to it, set taxonomy, …
4) Log result.
end for each

But, the problem, is that calling a PHP script accross a web server often means that it will have a maximum execution time. (Perhaps 15 seconds).

I was thinking about using drupal’s built in batch functions, but the problem of timeout will be the same. And even if I would try to cut the job into smalls jobs, handled by some Ajax calls, sending a video file of 10 Mo will take to much time too.

Do you know if I can call the batch script directly with the PHP CGI? How will Drupal handle the domain name, as they will be no URL? Have you got an Idea how to do that?

I don’t really want to disable the max execution time on the local server. I was also thinking about creating another Virtual Host with some other PHP settings, just to handle the long batch works.

I’m not sure I have an exact answer that will fit your situation, but I could suggest some alternatives to consider that might work well for you. It occurs to me that this response might be long enough to warrant its own blog post (and I should put something up again soon anyway!) so I might do that. Either way, look for my $ 0.02 soon.

[…] as well as CCK and Views Definitions, Between Drupal Instances March 6, 2009 In my previous Headless Drupal post, I proposed ways to work with Drupal content programmatically, particularly for bulk tasks like […]

Thanks very much for this clear and helpful tutorial. I needed to store outputs from an application in a manageable way and Drupal as a CMS seemed ideal – -the only problem was doing it programmatically. Your tutorial has made this possible for me . Thanks

[…] i found this excellent blog post on creating drupal nodes programmatically and this useful resource on drupal node fields. i knocked up the following script that does the job. Just copy this to s9y-import.php and place it […]

This is great work. Very impressive, indeed. I myself am involved in developing a couple of measurement concepts and should use your work as a reference, I shall certainly cite you. Have you made any updates to these? Also, is there a way you could either possibly post a video taking us through this line by line.? Perhaps even on YouTube? Thanks and great work!

This problem is driving me up the freaking wall…files with the two lines of code (require_once deal and the function under it) only work in the root folder . If i move the file to some other folder and also chance the require_once file path it still doesn’t work. I’m wondering what the hell’s going on. It’s only behaving correctly if the php file is in the root folder

I am looking for a way to bootstrap Drupal outside of the root directory, any thoughts – already tried setting the include path, but since bootstrap includes files with a ‘./’ prepended, I’m getting nowhere.

It should be noted that unless an anonymous user has access to delete, update, or create nodes, then these scripts will not work. You can use wget’s cookie handling to work around this.

From the wget manual:

This example shows how to log to a server using POST and then proceed to download the desired pages, presumably only accessible to authorized users:
# Log in to the server. This can be done only once.
wget –save-cookies cookies.txt \
–post-data ‘user=foo&password=bar’ \http://server.com/auth.php

If the server is using session cookies to track user authentication, the above will not work because –save-cookies will not save them (and neither will browsers) and the cookies.txt file will be empty. In that case use –keep-session-cookies along with –save-cookies to force saving of session cookies.

Your batch delete script does not delete the nodes created in your batch create script. You are creating pages in the create script, but deleting stories in the delete script. Just a word of warning to anyone trying to use them both for some reason – make sure the items created in the batch are the same node type as those you are trying to delete.

I hardly leave a response, but after looking at through a few of the responses
on this page stonemind consulting ? Headless Drupal: Using Drupal?s API to Batch Script Your Drupal Site.

. I actually do have a few questions for you if you tend not to mind.

Could it be simply me or do a few of these responses look as if
they are written by brain dead visitors? 😛 And, if you are writing on additional online sites, I would like to keep up with you.
Could you post a list of every one of all your
social sites like your Facebook page, twitter feed, or linkedin profile?

I know this if off topic but I’m looking into starting my own blog and was wondering what all is required to get setup? I’m assuming having
a blog like yours would cost a pretty penny?
I’m not very web smart so I’m not 100% sure. Any tips or advice would be greatly appreciated. Many thanks