So, if you want to use JavaScript for the task, you don't need HTML Purifier at all; after loading the page, use jQuery to grab all A tags in the page and add the appropriate behavior (you can filter on class to only do this to user links or something).

If you want HTML Purifier to do this, it would certainly be possible, but it would have to be coded. It actually wouldn't be that complicated. Would you be interested in helping? I can get you setup and tell you what you need to do.

Sorry, let me explain more. I just added HTML Purifier to Maia Mailguard, to sanitize email when displaying it to the end user. Since we're displaying spam, I consider it the one of the most hostile of all inputs HTML Purifier may see. :) I configured it to block all URL's, but otherwise has a default install so far. The results have been just wonderful; it's impossible to slip in tracking images or trick the user into going someplace bad, at least not through our interface. But we had just one complaint, in that it completely removes the actual url from links, so it may be hard to discern a scam message from a legit one - it may be only one url that was changed from a copy of a legit message.

So I'd like to continue all the safety the HTML Purifier provides, but have some way to still *see* what url the link otherwise pointed to.

It could be just by changing

<a href="foo">bar</a>

to

bar (foo)

Or I could even envision rewriting it to allow for a jquery script to put it in a tooltip:

<a class="HelpTipAnchor">bar</a>
<span class="HelpTip">foo</span>

(A jquery call later looks for the classes and does the tooltip magic)

So all I need to do I guess, is take the A tag and rewrite it and its attribute a little. I suspect it can be done, but I haven't been able to figure out the docs for HTML Purifier yet. ;)

Ok. So the first step is to set up the development environment (this is doubly important, since some of the features we'll be using haven't been released yet.) Check out this document for instructions, and check back here when you've got a working checkout. Trust me; it will be very nice to have when working on the feature.

You will need to allow a tags within your AllowedTags set; they will be removed once they hit the Injector execution phase; if we get rid of them early, there's no way of telling what the link was when we're with Injector.

Aha, it's an a close tag. Using the $token->start reference (which refers to the a tag we passed up), grab the URL, and then modify the stream (we do this by setting $token to the replacement; it's a reference) with the text token we want: " ($url)" (note, no need to escape anything). Then, using $this->backward(), rewind to the original a tag.

Ok, so I lied: we're on handleElement: a, not handleText: bar. Delete the a tag by setting $token = false

etc. our job here is done

Use of $this->backward() is a little involved: check out AutoParagraph for examples of usage. If it needs more explaining, I can do so here.

If we don't care about getting rid of the a tags completely, we can simplify this process a little:

Do nothing (or optionally add the class tag you want)

Do nothing

Aha, it's an a close tag. Using the $token->start reference (which refers to the a tag we passed up), grab and delete the URL (as simple as unsetting $token->start->attr['href']), and then modify the stream (we do this by setting $token to an array; it's a reference) with the array of tokens we want: the closing end tag (you need to put it back in $token, and " ($url)" (note, no need to escape anything).

We're done

You'll probably want to set up unit tests; use the other injectors as examples. I bet you can find where the test files are ;-)

I do like leaving the a tag there, so it underlines like a link... but we certainly want to make sure the href and onclick, etc are removed, ie, make sure it's otherwise safe. Do I need to include some other cleaning actions that would otherwise have been done for me?

I have to say, I'm impressed with the design I see in HTMLPurifier, this has been pretty easy to jump into and understand, once I got pointed to the right spot.

Looking at this feature, I'm trying to figure out how to make it extendable for several different output types.

In order to do a tooltip within our framework, I need to set a class and id on both the anchor tag and the newly injected span with the URL. The classes are set, but the id would need to be unique. I can make a class that does that, but it doesn't seem like something that belongs in the source of HTMLPurifier. I could put a specific version in the Maia source too, of course, but I wonder if a more generic option would be of interest:

In the constructor for the injector, pass along either text parameters to add to the tags, or even a reference to a function that will return the text to put in the attributes. Or in other terms, instantiate the Iterator with callbacks to specify the modified attributes. If there's a better pattern I let me know, I'm still working on the GoF book. :)

Another option might be to have more configuration items, but that seems like clutter.

You should add a configuration directive for it, since I intend on adding this into the core. ;-)

I had not previously set AllowedElements, but when I do, (to allow a tags) it holds back a lot of others. Do I need to specify all of them?

Ah, that's interesting. If you have not specified AllowedElements, a tags will be allowed automatically, so nothing needs to be set. I forgot you're using the URI configuration directive to exclude links. Disregard that point.

I have to say, I'm impressed with the design I see in HTMLPurifier, this has been pretty easy to jump into and understand, once I got pointed to the right spot.

Glad to hear it! At some point I'll write documentation and a tutorial on making Injectors. I think it's one of the neatest and most under-utilized features in HTML Purifier.

In order to do a tooltip within our framework, I need to set a class and id on both the anchor tag and the newly injected span with the URL. The classes are set, but the id would need to be unique. I can make a class that does that, but it doesn't seem like something that belongs in the source of HTMLPurifier. I could put a specific version in the Maia source too, of course, but I wonder if a more generic option would be of interest:

So, a few interesting points here: HTML Purifier has already pre-empted you on the ID issue, you can read about it here. Unfortunately, you can't really use our built-in functionality for it, since that happens on the step after injectors!

However, I think we can follow the same principle: if we namespace the IDs appropriately, and keep track of the IDs we've already assigned, we should be able to keep things unique, and also not conflict with existing application IDs.

I think callback hooks would be great for the extensibility we're going for, although I also think configuration directive support for the basic use-cases would be a good idea. Oh, I never told you how to define configuration directives.

I'm not completely happy with the single namespace constraint on directives; when you have things like injectors with their own directives, it would make more sense to define AutoFormat.InjectorName.Directive. Maybe we'll change that in 3.2.

So, a few interesting points here: HTML Purifier has already pre-empted you on the ID issue, you can read about it here. Unfortunately, you can't really use our built-in functionality for it, since that happens on the step after injectors!

I noticed. :) And it gets really fun, cause my implementation of the tooltip needs a prefix on the id, so using the id prefix in purifier would break it.

However, I think we can follow the same principle: if we namespace the IDs appropriately, and keep track of the IDs we've already assigned, we should be able to keep things unique, and also not conflict with existing application IDs.

I don't mind the filtering out original id's too much, but a namespace that keeps the ones we inject would be nice.

I think callback hooks would be great for the extensibility we're going for, although I also think configuration directive support for the basic use-cases would be a good idea. Oh, I never told you how to define configuration directives.

I'm not completely happy with the single namespace constraint on directives; when you have things like injectors with their own directives, it would make more sense to define AutoFormat.InjectorName.Directive. Maybe we'll change that in 3.2.

Ideally the hooks could be a string or a callback, and the receiving code could act accordingly. I guess a string is just a short circuit of a callback that returns a string anyway.

I'm just brainstorming here, but I think the parameters needed are:

array of attributes to put in the anchor tag, and their callbacks. another structure to pass in the additional text to append... and that one could be complex, with variable attributes, text, and parameters.

array of attributes to put in the anchor tag, and their callbacks. another structure to pass in the additional text to append... and that one could be complex, with variable attributes, text, and parameters.

I would prefer something a little simpler: the anchor start token itself, and then a text format in form "(%s)", where %s is substituted with the URL text. But it's up to you to code, so it's your call.

As for IDs, at this point I'm not sure I completely understand the subtleties of the issue at hand. Could you describe in more detail how your tooltips work?

I set up a version to try in my app, and ran into another snag - the tooltip pops up under the lightbox the message is viewed in... so I might have to come up with another method. Not related to purifier, but a snag until I figure out what direction I want to go in....

Did patch review, everything looks good. I'm going to apply this to my master, set up a configuration directive, and then commit and push. You'll have to do a git reset --hard remotes/origin/master to update your branch when I'm done if you didn't create a topic branch for your commit.

Oh yeah, that's true. See, Linkify is run on the configuration documentation, so I had to wrap it with a tags to make it clear that the right link isn't active. Not ideal, but whatever. They'll figure it out when they run the code, and we're going to make it customizable anyway.

ok, for the next step... I'd like to make it more flexible on what it outputs.

First question, Is there a routine somewhere that can read a small string and tokenize it?

I was thinking of making a helper class to go with this injector, which will have its methods called to populate the new link. The default class could do the output as we have it, and then a configuration item could be used to override with a subclass.

Is this the Strategy pattern?

Anyway, if there's a procedure to parse a small amount of html and return an array of tokens to put back into the stream, it would be very simply to override the class.

interface HTMLPurifier_Injector_DisplayLinkURI_Strategy
{
// Called with text of link
// returns
public function LinkAttributes($linktext);
//called with uri of link
public function URIDisplay($uri);
}

If not, then the subclass has to create the tokens directly. Anyway, see where I'm going with this?