Jump to:

We at the AcquiaDrupalGardens team were recently thinking about how to solve the problem with style attributes being blacklisted by core's HTML filter. It seems to be a choice between wysiwyg_filter and HTMLPurifier. We'd prefer to use your code as it seems more robust, tested and applied elsewhere.

Indeed, I'd like to see it replace the filter we have in core. I think the fear is performance. Do you have any benchmarks on using HTMLPurifier? Do you feel like it is performant in varied situations (lots of text, many transformations per page load, etc)?

We'll probably need to start on a 7.x port ASAP if we can prove the performance will be good enough. Also, are you interested in doing any (paid) work on this? Please get in touch if so via my contact form or here on this issue.

I have not made any official benchmarks, however, it appears to be anecdotally the case that most servers can't really handle ~100 transformations per page load. The Drupal filter does have an aggressive caching policy which should alleviate this.

Most things seem to behave themselves, but there are TODOs in the patch noting some areas that still need work (some might be existing bugs in D6 though). It also seems like integration with the non-Drupal form isn't working totally right in all cases yet.

Other notes:

This patch moves the HTML Purifier cache settings from the per-format configuration pages to their own dedicated settings page. Because of the way the filter administration form works in D7, it would require some real gymnastics to keep making these available where they were before, and wouldn't really make sense to anyway - if they affect all text formats on the site, then they shouldn't be attached to a single text format's configuration form because it is confusing to save one text format and have the settings automatically applied to other text formats also.

Re moving the cache settings, it's a good change that probably should have been in D6, except I couldn't figure out how to make a global settings page, so into the per-filter it went. It may still be useful to provide a link to the global settings page.

I'll take the patch for a spin soon; it might not be too early to set up a branch either. David, would you at all be interested in commit access?

I'm not sure I'll have a lot of time to work on this module beyond the initial effort of porting it for (possible use in) Drupal Gardens, so you might want to hold off on giving me commit access... or, rather, you're certainly welcome to give me commit access, but I'm not sure how useful it will wind up being to you in the long run :) It's a very cool module and a neat library, though.

Either way, a branch sounds like it might be a good idea if you feel this is getting close - that would help other people more easily test it out with D7 and help fix the remaining issues.

Here is an updated patch that adds a link to the global settings page as described in #4.

It also fixes a bug with the default filter settings - I was previously using something that made no sense at all (a full HTMLPurifier_Config object when we actually want a settings array). For now, I've fixed this by just removing the 'htmlpurifier_config' default setting altogether, since the module itself works fine without it (see the attached "interdiff" file for the change).

However, I ideally want to figure out how to do it correctly because it's needed for anyone else who wants to write code that cleanly modifies the filter's default behavior. The requirement is that any element which exists in the 'default settings' array in hook_filter_info() should have the same exact structure as what is saved to the database when you submit that part of the filter settings form. So in particular, if you manually save the filter settings form, you wind up with an array of data in 'htmlpurifier_config' with elements that look like this:

[URI.AllowedSchemes] => httphttpsmailtoftpnntpnews

This is the format expected by $config->mergeArrayFromForm(), and that is what we want. The problem is I can't figure out how to use the library to recreate this array structure automatically in 'default settings'. The closest I could get was with code like this:

Does the library provide a method that I am not finding which outputs the configuration in an array like the first one - i.e. in an format that is suitable to feed back in to $config->mergeArrayFromForm()? Just wondering if I'm missing something obvious :)

Interesting question. mergeArrayFromForm is intended to deal with data supplied by an HTML form, so the closest thing you can get to that is the code that generates the HTML form, which is in HTMLPurifier_Printer_Config, but you still need a browser.

What you're probably actually looking for is HTMLPurifier_Config->loadArray(), which loads in the result of getAll, or perhaps even HTMLPurifier_Config::inherit($parent), which creates a config object that inherits the settings of another object, but can override them.

JacobSingh: I don't think an upgrade hook is necessary, based on reviewing the filter code you cite, and looking at some other ported modules. Shouldn't the filter upgrade hook handle all the cases we're interested in? I guess I should run an upgrade and find out.

Closed #1359418: Schema issue upon upgrade to D7 as duplicate and merging into here the patch for the D6 upgrade. One difference is it checks if the schema fields actually exists before droping it for those who have native installed the module in D7.