I recently worked on a Drupal 8 site where we had to create blog posts (nodes) from an Atom feed that updates regularly. In earlier versions of Drupal, this would have been a job for the Feeds module, but in Drupal 8 we use the core Migrate API instead.

Here are some of the features of this project.

Handle an Atom feed with the Migrate framework.

Download images and create File and Media entities in Drupal.

Skip some entries based on a custom field.

Split the main body into separate image and text paragraphs.

The last part is the most fun, but I will postpone that to Part 2 of this post.

Getting Started

I also added a custom module, atom_migrate, to hold the configuration and a little bit of custom code for the migration.

This is pretty standard for using the Migrate API in Drupal 8. The Migrate Plus module adds several features, including the framework for handling XML sources. The Migrate Tools module provides drush commands for managing migrations.

I use the Features module to update the site configuration after making updates to my module. I declare the module as a feature, and then

drush fim atom_migrate

after making changes. Another method is to uninstall and re-install the module (ugh). I am told that you can put your YAML files under atom_migrate/migrations/ instead of atom_migrate/config/install/, and then clear caches (the plugin cache, to be specific), but I have not tested this. I have also seen the Configuration development recommended, but I have not tested that, either.

Finally, this migration is supposed to run periodically, so we need a way to trigger that. One way would be to set up a cron job on the server that invokes

drush migrate-import --group=atom --update

On this project, we decided it would be more portable to avoid server dependencies, so I implemented hook_cron:

Room for improvement

I do not think this code will update existing articles, as drush migrate-import --update would.

I should get the migration IDs programmatically instead of hard-coding the list. As it is, if I add a new migration, then I will have to update this code.

Maybe Migrate Tools should add API functions so that it is easier to get the same effect as the drush commands from custom code.

Configuring for an Atom feed

The Migrate Plus module provides plugins for downloading a file from an external URL (or from a local file, useful for testing purposes) and for parsing XML, so this is mostly just a question of configuration. I have the following configuration in migrate_plus.migration_group.atom.yml:

The one difficulty I had is that the XML I got had some elements qualified by namespaces and some (including everything from the Atom spec) without. Thanks to @robcast on the Drupal#migration Slack channel for pointing out thenamespaces option, which has the effect of doing

$xpath->registerNamespace('atom', "http://www.w3.org/2005/Atom");

in PHP. This lets me use selectors like atom:id to target XML tags like <id> (no namespace). Curiously, this option does not seem to apply to the item_selector key.

Before using the namespaces key, I found a work-around with a little help from Google and Stack Overflow:

selector: '*[local-name()="author"]/*[local-name()="name"]'

Skip some entries based on a custom field

In order to use the Migrate API effectively, it helps to get some practice with chaining multiple process plugins. The building blocks are there, and a few examples go a long way in learning how to use them.

For example, the feed I was using had some articles that I wanted to import and also some items that I wanted to ignore. In the source section of my migration, I defined the content_format key and the XPath that selects it. Then I chained the static_map and skip_on_empty plugins in the process section of the migration:

The static_map plugin converts “Blog” to “Blog” (not much of a conversion) and anything else to NULL. In the latter case, the skip_on_empty plugin cancels processing of the current item.

There is not actually a field called blog_type. The Migrate API lets you make up a destination field like this and ignore it, or use it as an intermediate result in other fields. There is an example of this in the next section.

Creating File and Media entities

There are two ways to manage complex migrations. The first is to have a separate migration for each step of the process, and that is how I managed the “featured image” for my migration. (This is not part of the Atom specification. It is a custom field on the feed I was using.)

The second method is to create the intermediate entities in the “process” phase of the migration. My next blog post will give an example of this method.

Here is an example of using separate migrations. The first migration uses the download process plugin to fetch images referenced in the feed and create file entities of type image:

The settings key is a fake destination. It is just there so that the migration will quit early if the url for this row is empty.

First look at how I create the destination file name.

I have a fake source key called constants. (You can call it whatever you want, but constants is the convention.) It has several sub-keys, which are referenced as constants/image_base_dir, constants/image_name, and so on.

I set one of my fake destination keys, temp_date, using the callback plugin: this has the effect of setting this intermediate result to date('Y-m'), or something like 2017-12.

I set the next fake destination key, temp_image_uri, using the concat plugin to paste together 'public:/', the date string I just created, and 'post.jpg', using / as glue. That gives something like public://2017-12/post.jpg.

Next I use the download plugin. The first argument is the URL of the image file, which I have defined in the source section of the migration. The second argument is the destination file name. Since I supply the optional rename: true key, Drupal will add _0, _1, and so on in order to create distinct file names.

The download plugin returns the URI of the created file, something like public://2017-12/post_17.jpg. I assign this to the uri property of the File entity that this migration creates.

Now that we have the migration creating File entities, we could attach those files to a content type with a file-reference field, using the migration_lookup plugin. In fact, we did something a little different on this project.

We decided to use the core Media module. This is a little aggressive with Drupal 8.4, but the Media module is developing quickly. It looks as though everyone will be using it once Drupal 8.5 is released, and we hope to be one step ahead of the crowd.

Since we are using Media entities, we have another migration that deals with them. Here is the interesting part of this migration:

The Migrate API keeps track of the association between guid (the source key) and file ID in the blog_image migration (the one before this). The migration_lookup plugin translates the guid to the file ID, and that gets assigned to the target_id property of field_media_image on the Media entity.

The alt property is populated by description: one of the source fields that I left out because there is nothing new about it.

The migration_dependencies key tells the Migrate API that this migration should not be run until the blog_image migration is complete. We can override that by adding the --force option to drush migrate-import.

The migration that creates nodes uses these Media entities, translating the feed’s guid into a Media entity ID using the migration_lookup plugin. There is nothing new in this step, but you can see the full code in the GitHub repository.

References

In the spirit of Open Source, I have borrowed heavily from blog posts, Slack messages, documentation, and other forms of support. Here are links to some of the sources I found helpful.

Benji Fisher

Senior Developer

At Isovera, I build complex web sites using Drupal. Sometimes that means putting together the pieces supplied by Drupal and contributed modules, and sometimes that means writing a custom module for the project at hand.