Drupal 8 has a great Plugin API that not only allows one to write plugins that modify Drupal core’s behavior, but also to define your own pluggable system that you and others can plug into as well. The Plugin API documentation has a number of examples that cover many common use cases for plugins and configuration entities, but none quite covered my use case, so hopefully this article can help fill some of the gaps. I’m going to use an actual example that I built to help ground the discussion of abstract concepts in something more concrete — the abstractions like Annotations, EntityWithPluginCollectionInterface, PluginFormInterface, Derivatives, and others are powerful, but it can be tough to pin down how and when to actually use one versus another.

The Challenge

My end goal is a module that creates Drupal nodes of a custom “Event” content-type based on a variety of external calendaring systems. My campus uses the Resource25 calendaring system which will be the source of most events, while others may be sourced from Outlook calendars, Google Calendars, or other data sources. What the source systems are and how they are connected to will never be completely unified and will change over time, so I want to create a flexible, pluggable, system that will allow new plugins to be written that just cover how to fetch from new source systems without reinventing the whole thing every time we need to add a data source.

There are two complications that lift this implementation out of the realm covered in a straight-forward manner in the Drupal documentation:

Each plugin has both standard configuration options (such as “Enabled”) as well as plugin-specific configuration options.

For a given Drupal site, there might be multiple instances of each source plugin that are configured to pull from different calendar feeds using the same protocols and methods.

Relationship between plugins and Configuration Entity instances

This system is very similar to Drupal-core’s Block system. The Block system defines an API for “Block Plugins” where each block-plugin has PHP code that drives it. Multiple instances of each block-plugin can be placed in the page, each with their own settings. For example, the System Menu Block Plugin in the system module can have multiple instances added to the block-layout, each configured to display a different menu with different settings for initial menu-level and depth of menu-levels visible as well as the standard settings for title and display of the title.

In fact, the Plugin API documentation specifically recommends using the Block system as an example. Unfortunately this is a complicated example to follow because the Block system is scattered between Drupal-core library files and the Block module, rather than all contained in one place.

What is needed?

I’m going to skip over the basics of the module skeleton as these are well documented already in order to focus on the parts that are needed to achieve the system described above.

Here are the pieces that we need to get functioning and which I will dive further into:

Defining the plugins themselves with a Plugin Manager and discovery mechanisms so that Drupal knows what plugins are available.

Defining the Configuration Entity for our plugin system so that instances can be configured and exported along with the rest of our site configuration.

Defining dynamic links so that we can create new instances for each plugin installed.

Allowing plugins to define extra configuration fields/properties that are particular to their needs.

A snapshot of the resulting module is available on GitHub at adamfranco/middlebury_event_sync. I’ll be linking to code in the module rather than copying it here.

Defining the Plugins

For this system, we are going to define a single Plugin Manager that will provide discovery and loading of our Event Source plugins and a Plugin Interface that all of our Event Source plugins will have to implement.

Next up is an Annotation class at src/AnnotationEventSource.php which defines the parameters that will be used in each plugin class‘s doc-block to identify that plugin class as one of our plugins. This is the key to the discovery mechanism we’re using — the annotation in the doc-block registers the plugin so that it is available through our Plugin Manager.

We also define Plugin Interface which defines common methods for all of our Event Source plugins. This isn’t strictly required for all plugin systems as the PluginInspectionInterface and/or ConfigurablePluginInterface can be implemented by plugins directly, but I wanted all plugins to implement a few common methods for standard settings and actual operations.

The manager, service, annotation, and plugin interface are the things we need to define to allow our plugins in src/Plugin/EventSource/ to be discovered and available to Drupal. Now that we have defined our plugins, we can use \Drupal::service('plugin.manager.event_source')->getDefinitions() to access all of the Event Source plugins installed in our site, both those provided by our module as well as those provided by other modules.

Configuration Entities

The next task is to define a configuration entity, the instances of which will hold the configuration for each service we are connecting to and reference the plugin that will be used to do the connecting. Configuration in Drupal 8 is a great new system that allows configuration to be modified in a development environment, exported, version-controlled, and then imported in a production environment. Configuration Entities allow units of configuration to be created, listed, edited, and deleted like other entities in Drupal (nodes, users, taxonomy-terms, menu items, etc). For this module, I need to use config entities because we don’t have just a single global setting, but rather zero or more configuration sets for each plugin.

Our config entity is a single class that extends ConfigEntityBase and has a number of annotations that point to the Controllers and Forms used to list and edit the instances of the Entity. Our config entity class also implements an interface I created that defines how to access the plugin associated with the config entity, not something available by default.

If we just wanted to have a single config entity for each plugin, this would all be enough. However we want to be able to have multiple config entity instances associated with each plugin. To achieve this our config entity must also implement the confusingly named EntityWithPluginCollectionInterface which provides access an EventSourcePluginCollection that we define. This “Plugin Collection” handles the mapping between config entity instances and the plugin instances that house our actual data-fetching code. Via this Plugin Collection we can pass a config entity (or its settings) off to the plugin instance so that the plugin instance has access to data in addition to its methods.

After defining the config entity and getting it working, it is a good idea to define a Configuration Schema in yml. The primary purpose of this is to allow translation of your configuration — it doesn’t seem to affect the actual export of the configuration.

Our config entity is pretty light-weight — it stores its id (a.k.a. machine-name), label, which plugin-id it is associated with, which module is providing that plugin, and finally an arbitrary set of plugin-settings defined by the plugin. These plugin settings include some common ones such as enabled, ttl, and time-shift, as well as plugin-specific settings like URIs, username, and password. There are other ways that the plugin settings could be injected into the config entity, but I felt that putting them in a single container called “settings” was flexible and simple and avoided potential conflicts between the keys needed by the config entity and the plugin.

Dynamic Links to add instances

A missing piece with what we have so far are the links to create new config entity instances. Because new plugins may be provided by other installed modules, we can’t hard-code our links to add new instances in our action links. Instead we need to implement a Derivative plugin that generates a list of links based on the available plugins and point to this Deriver from our module’s module.links.action.yml.

Configuration Entity “add” links provided by our Deriver plugin.

Extra configuration fields per plugin

To assist with common settings and functionality, I created an abstract base class that all of my plugins can inherit. This base class implements the PluginFormInterface to provide form fields on the config entity’s editing screens. Each plugin can override these methods to provide and handle additional fields while leaving the common fields to the base class. These fields are handled as a sub-form loaded in our config-entity form class.

The Result

Once the items above were in place I now have a system that lets me create as many instances of each plugin as needed. The buttons to add a new instance are provided by the Deriver and the listing itself by a controller.

Admin UI for adding Event sources. Note that only two plugins are currently installed.

Hopefully this overview can point you enough in the right direction to implement your own Drupal plugin systems for other purposes.

Resources

Below are links to the documentation and resources that I used in writing this pluggable system and this article:

As a software developer or system admin have you ever encountered regular expressions that are just a bit too hard to understand? Kind of frustrating, right? As a rule, regular expressions are often relatively easy to write, but pretty hard to read even if you know what they are supposed to do. Then there is this bugger:

Let’s say you are working on a large feature or update that requires a bunch of commits to complete. You finish up with your work and are then ready to merge it onto your master branch.

For example, here is the history of my drupal repository after some work updating the cas module to the latest version (and to support the new version of phpCAS):

As you can see, I have a number of commits, followed by a merge in with the new module code, followed by some more commits.

Now, if I merge my feature branch (master-cas3-simple) into the master via

git merge master-cas3-simple

then the history will look like this:

While the history is all there, it isn’t obvious that all of the commits beyond “Convert MS Word quote…” are a single unit of work. They all kind of blend together because git performed a “fast-forward” commit. Usually fast-forward commits are helpful since they keep the history from being cluttered with hundreds of unnecessary merge commits, but in this case we are loosing the context of these commits being a unit of work.

To preserve the grouping of these commits together I can instead force the merge operation to create a merge commit (and even append a message) by using the --no-ff option to git merge:

As you can see, merging with the --no-ff option creates a merge commit which very obviously delineates work on this feature. If we decided that we wanted to roll back this feature it would be much easier to sort out where the starting point before the feature was.

For the past few months I have been doing a lot of work on the phpCAS library, mostly to improve the community trunk of phpCAS so that I wouldn’t have to maintain our own custom fork with support for the CAS attribute format we use at Middlebury College. The phpCAS project lead, Joachim Fritschi, has been great to work with and I’ve had a blast helping out with the project.

The tooling has involved a few challenges however, since Jasig (the organization that hosts the CAS and phpCAS projects) uses Subversion for its source-code repositories and we use Git for all of our projects. Now, I could just suck it up and use Subversion when doing phpCAS development, but there are a few reasons I don’t:

We make use of Git submodules to include phpCAS along with the source-code of our applications, necessitating the use of a public Git repository that includes phpCAS.

The git-svn tools allow me to use git on my end to work with a Subversion repository, which is great because…

I find that Git’s fast history browsing and searching make troubleshooting and bug fixing much easier than any other tools I’ve used.

For the past two years I have been using git-svn to work with the phpCAS repository and every so often pushing changes up to a public Git repository on GitHub. Our applications reference this repository as a submodule when they need to make use of phpCAS. Now that I’ve been doing more work on phpCAS (and am more interested in keeping our applications using up-to-date versions), I’ve decided to automate the process of mirroring the Subversion repository on GitHub. Read on for details of how I’ve set this up and the scripts for keeping the mirror in sync.

Why use reverse-proxy caching?

For most public-facing web applications, the significant majority of their traffic is anonymous, non-authenticated users. Even with a variety of internal data-cache mechanisms and other good optimizations, a large amount of code execution goes into executing a PHP application to generate a page even if the content of this page will be the same for many users. Code and query optimization are very important to improving the experience for all users of a web application, but even the most basic “Hello World” script will top out at about 3k requests/second due to the overhead of Apache and PHP — many real applications top out at less than 200 requests/second. Varnish, a light-weight proxy-server that can run on the same host as the webserver, can cache pages in memory and can serve them at rates of more than 10k requests/second with thousands of concurrent connections.

While the point of web-applications is to have content be dynamic and easily changeable, for most applications and most of the anonymous users, receiving content that is slightly stale (cached for 5 minutes or something similar) isn’t a big deal. Sure, visitors to your blog might not see the latest post for a few minutes, but they will get their response in 4 milliseconds rather than 2 seconds.

Should your site get posted on Slashdot, a caching reverse-proxy server will give anonymous visitor #2 and up the same page from cache (until expiration), while authenticated users continue to have their requests passed through to the Apache/PHP back-end. Everyone wins.

For the past 6 months our Web Application Development work-group has been Bugzilla as our issue tracker with quite a bit of success. While it has its warts, Bugzilla seems like a pretty decent issue-tracking system and is flexible enough to fit into a variety of different work-flows. One very important feature of Bugzilla is support for LDAP authentication. This enables any Middlebury College user to log in and report a bug using their standard campus credentials.

While LDAP authentication works great, there is one problem: If a person has never logged into our Bugzilla, we can’t add them to the CC list of an issue. This is important for us because issues usually don’t get submitted directly to the bug tracker, but rather come in via calls, emails, tweets, and face-to-face meetings. We are then left to submit issues to Bugzilla ourselves to keep track of our to-do items. Ideally we’d add the original reporter to the bug’s CC list so that they will automatically be notified as we make progress on the issue, but their Bugzilla account must exist before we can add them to the bug.

Searching about the internet I wasn’t able to find anything about how to import LDAP users (or any kind of users) into Bugzilla, though I was able to find some basic instructions on how to create a single user via Bugzilla’s Perl API. To improve on the lack of user-import support I’ve created an Perl script that creates users from lines in a tab-delimited text file (create_users.pl) as well as a companion PHP script that will export an appropriately-formatted list of users from an Active Directory (LDAP) server (export_users.php).

One of the requirements in the migration of our websites to Drupal is that we create a robust and redundant platform that can stay running or degrade gracefully when hardware or software problems inevitably arise. While our sites get heavy use from our communities and the public, our traffic numbers are no where near those of a top-1000 site and could comfortably run off of one machine that ran both the database and web-server.

Single Machine Configuration

This simple configuration however has the major weakness that any hiccups in the hardware or software of the machine will likely take the site offline until the issues can be addressed. In order to give our site a better chance at staying up as failures occur, we separate some of the functional pieces of the site onto discrete machines and then ensure that each function is redundant or fail-safe. This post and the next will detail a few of the techniques we have used to build a robust site.

Central Authentication Service (CAS) is a single-sign-on system for web applications written in Java that we have begun to deploy here at Middlebury College. Web applications communicate with it by forwarding users to the central login page and then checking the responces via a web-service protocol.

A few months ago Ian and I got CAS installed on campus and began updating applications to work with it rather than maintaining their own internal connections to the Active Directory server. Throughout this process we ran into a few challenges (such as returning attributes with the authentication-success response) and a bug in CAS, but we worked through these and got CAS up and running successfully.

We are now at a point where we need to do some customizations to our CAS installation to deal with changes to the group structure in the Active Directory. As well, the bug I reported was apparently fixed in a new CAS version, an improvement I need to test before we update our production installation. Both of these require a bit more poking at CAS than we can do safely in our production environment, so I am now embarking on the process of setting up a Java/Tomcat development environment on my PC. I’m documenting this process here both for my own benefit (when I have to set this up again on my laptop) and in case it helps anyone else.

Read on for my step-by-step instructions for setting up a CAS development environment on OS X.Continue Reading »

One of the great things about the Git version-control system is the ability to incrementally commit your changes on a private branch to keep a step-by-step record of your thought and writing process on a fix or a feature, and then merge the completed work onto your main [or public] branch after your feature or fix is all done and tested. By keeping an incremental log of your changes — rather than just committing one giant set of code with changes to 30 files — it becomes much easier to know why a certain line was changed in the future when bugs are discovered with it.

One thing that often happens to me though, is that I work for about a half hour to an hour trying to get a new piece of code working and in the process make several sets of changes to one file that are only loosely related.

Let’s say that I am fixing a bug in my ‘MediaLibrary’ class and while doing so notice some some spelling mistakes in some comments that I fix. Now my one file has two changes my bug fix, and the spelling fix. Rather than committing both changes together with one comment describing both changes, I can highlight one of the changes in git-gui and select the “Stage Hunk for Commit” option.

With that one hunk staged I can now commit with a message applicable to that change. Other changes can then be staged and committed with their own messages resulting in a very understandable history of changes.

“Stage Hunk for Commit” can also be used to commit important changes while not including debugging lines inserted in your code.