Got feedback about the code or how the package works once it’s installed in WordPress? Let’s hear it.

I should have realized that otherwise, the comments would turn into an argument about comment spam, fighting it, ways the general idea could be defeated, and more. Which they did.

Look, folks, despite what some people might tell you, I’m not so arrogant as to think that I could single-handedly solve the comment spamming problem for all time. Even if I were, I very much doubt I’d be so clueless as to think that WP-Gatekeeper was that solution. And if both those things were the case, I’m pretty darned near certain I would have very explicitly made the claim of having beaten the spammers. Likely in big, boldfaced, red, capitalized, blinking letters, plus a background MIDI of “We Are The Champions”.

WP-Gatekeeper is not going to stop every possible comment spam attack, human or automated, for the rest of time. Neither is any other defense you can name, without exception. There may be measures that currently have 100% resistance to scripted attacks. They will one day fail—I can pretty much guarantee it. Even today, they are defeatable by actual humans sitting at computers and posting comment spam on every site they find. That kind of spamming is very, very rare, but it happens. I had such an incident within the last month. If I hadn’t been keeping a close eye on new comments just then, I’d likely have missed it completely.

I’m fully aware that there are ways a spambot could defeat WP-Gatekeeper. At the moment, none of them can. That will one day change, of course, assuming challenges become at all popular. Comment spam and the fighting thereof is a dance, a tennis match, an arms race. Neither side will ever win. As one side adopts a new tactic, the other side will move to counter it. The countermeasure will itself be countered. And so it goes. Eventually, either spambots or spam defenses (or the two in combination) will become so advanced that they’ll gain self-awareness, and then we’ll all be royally hosed.

I know this. You know this. Let’s move on from there, okay?

In the end, the goal is to add another arrow to the quiver at the disposal of spam fighters. Think this approach is wrongheaded, annoying, or otherwise pointless? Fine. Don’t use it. For those who want to add this kind of capability—and since I instituted it on meyerweb, I’ve had not a single piece of spam make it onto the site or hit the moderation queue, whereas in my pre-defense days, I’d get at least twenty every day—then the package is there. You can combine it with other defenses, if you like, for even more coverage. I may upgrade it in the future, depending how far I get in learning PHP, mySQL, and form handling, and what feedback I get from people who know PHP better than I do. I may not, in which case the system as it stands is effective, and probably will be for a while. Even if I do one day abandon further development, the code is out there for someone else to improve if they so choose.

In the meantime, if there’s anyone who is using WP-Gatekeeper or has looked at the code, and has feedback on the coding or the way it works for the administrator of a WP blog, please feel free to share. Also, if anyone can point me to an example of PHP code for collecting all of the HTTP_VARS that are returned by an XHTML form and then looking through them, even when the variable names aren’t necessarily known ahead of time, I’d really like to see it. Thanks.

Todd Roberts wrote in to say...

Thanks, Todd, but I need a little more explanation before I’ll be able to “get it”, especially since I’m new to both PHP and the whole “foreach blah as foo” approach. A URL to a demonstration, or a tutorial, is fine.

Aaron Tate wrote in to say...

Basically its a loop structure that iterates through a set of something (in this case it is the pages submitted POST variables.

$_POST is an associative array(set of variables), There are two types of arrays, there are numerical which are referenced by number, and associative arrays which are referenced by name or number. The foreach function steps throught an array(either type), within each step the variable $foo will contain the form field name or ‘key’.
$bar will contain its contents or ‘value’.

Lets take this form for example.

In the first iteration the key will be ‘author’, and the value will be ‘Aaron Tate’.
Second: key will be ‘comment_post_ID’ and value will be ‘552’ (hidden fields are included obviously.)

The above code, given a form submission with, say, a field named “solemnity” into which someone has entered “durian,” and a field “email” into which someone has entered their email address, would loop twice. The first time, $foo would = “solemnity” and $bar would = “durian.” Next time, $foo would = “email” and $bar would = the email address.

To explain Comment #1 a little more, the “foreach” keyword is an Iterator shortcut common in some languages. It enumerates all of the elements of a datastructure in a loop. Every time through the loop, it assigns one of them to a variable:

foreach($_POST as $var){
do things with $var;
}

This works if $_POST is a 1-d array, but it’s not. It’s associative with keys, and in the case of $_POST, those keys are the field names. In PHP, arrays with keys have syntax like key => value, so instead of taking:

Update: Oops, I ran into another problem, which is that given a set of checkboxes, or several text inputs with the same name, only the last instance of that input passed through the value. So, for example, if I set up a “check the boxes next to the green and red squares” type challenge, it would always fail because onnly the last of the two checks would come back. My attempts to parse values to see if they’re arrays seems not to work. Any suggestions/pointers?

Robert Dean wrote in to say...

Michael: Except when I try it, with both boxes checked, $bar returns just green. No red. That’s kind of the problem.

Also, when I looked through http://w3.org/TR/html4/interact/forms.html , I didn’t see any reference to text inputs having to have unique names. Since name, as an attribute of input, can’t be unique (otherwise checkboxes and radio buttons would fail), I’m not sure text inputs are required to be unique either. Of course, if I missed an explanatory line in the text of the specification, it wouldn’t be the first time.

Neal Lindsay wrote in to say...

The complex part of this issue is that http only sends the info for the checked boxes. That means you can’t have a bunch of value-less checkboxes like this:<input type="checkbox" name="inputname[]" />
<input type="checkbox" name="inputname[]" />
<input type="checkbox" name="inputname[]" />
<input type="checkbox" name="inputname[]" />
and know in the end which boxes have been checked. (You would only end up knowing how many had been checked, but not which ones exactly.)

You need to either give each of them a unique value or explicitly say their location in the array within the name field like so:<input type="checkbox" name="inputname[0]" />
<input type="checkbox" name="inputname[1]" />
<input type="checkbox" name="inputname[2]" />
<input type="checkbox" name="inputname[3]" />

Michael Duff wrote in to say...

but if you have a question like “check blue and red” you count the array, if its less or greater then 2 then deny it, the do a loop to make sure both values actually match one of the answers, this might prove easier or harder depending on how you are useto coding.

You’ll note that it contains form elements that have defined values, some of which share names, and others of which do not. When the form is submitted, the result is that the last value in each name-group that doesn’t have brackets is read; anything before it in that group is dropped. Now, if I add the square brackets to the names in the XHTML, PHP suddenly wakes up and handles things correctly. Without them, it doesn’t.

So does anyone know how to make PHP handle HTTP form data correctly without the square-bracket hack, which should not be needed? No guesses; I said does anyone know either how to fix PHP sans hacks, or else exactly why it’s acting this way.

Obviously since scanned fingerprint identification is not an option at this point, then any method used to seperate visitors from spammers and spambots is going to be dependent upon the dedication of the spammers, no? They’ll just figure out a way around any blockade we put up.

Plus, it’s just plain silly to go to the effort of installing a biometric ID system on your server, and even sillier to expect your readers to purchase a thumb scanner (priced anywhere from $57.00 to $460.00 depending on the quality) and hook it up just to comment.

However, for little or no money per person, the problem could be solved immediately and forever. Here’s what you need:

Ziploc baggy (preferably one of those with the little square zip thingy on top – I hear they close with less effort)

Pre-printed, liquid resistant ID sticker with a name and number 1-10, i.e.: Keith Burgin – 1

Padded mailing envelope (you know, the puffy ones)

Propane torch

Kitchen knife (sharp – believe me, you’re gonna want it sharp)

Okay. Eric makes his comment area with one extra text input field for the name and ID number (the aformentioned “Keith Burgin – 1” corresponding to your comment.

You look at the ID number on your current label, type that in, and send your comment. Don’t look for it to be up for a couple of days, this takes a bit of time to sort out.

Once you’ve sent your comment, take the ID sticker and wrap it around the finger of your choice. Hack that baby off with the kitchen knife, seal the stub with the propane torch, seal the finger in the Ziploc bag, slide that in the padded envelope and mail it off to Eric.

Once Eric gets the finger, he simply matches the ID tag to your comment and approves the comment. Simple, no?

Spammers are not dedicated enough to use this system – I can guarantee that. Of course it will cut down on the number of legitimate comments, since an individual will only be able to do this ten times. Of course, you could always use toes, but let’s be a little sympathetic to Eric, people. Who wants to get a toe in the mail? I mean, really…

Hadley Wickham wrote in to say...

Ok let’s go back to basics – what does your browser send to meyerweb.com when you click submit on that form. In the form tag, you’ve said use method “post”, so the browser is going to create a http post request that looks something like this: (many fields omitted for brevity, and defaults used)

checker=cb02
checker=cb04
checker=cb05

checker2[]=cb02
checker2[]=cb04
checker2[]=cb05

This is what PHP gets. Now php has to decide how to take this post data and convert it into php variables. It does this in a pretty straightforward manner, and will execute PHP code something like this:

Now do you see why you need those []? Without them PHP assigns each checked value from ‘checker’ to $checker, each value overwriting the previous. When you add the [], php assumes you want an array that stores them all.

I hope this explains more clearly what’s going on. The details may not be exactly correct, but I hope you get the gist of it.

There are two reasons I wanted the real information, and wasn’t satisfied to settle for just knowing that something worked without any information on why.

First, as you say, personal annoyance; HTML forms have long passed multiple-value results, and every other scripting language I’ve ever used has been able to deal with that or at least make it possible to deal with it. In fact, they would, absent any special handling, generally return a value something like ‘cb02:cb04:cb05’ (using some kind of separator). PHP appears to be either too dumb to understand that, or too “smart” to let it pass through untouched. The latter approach, that of trying to be smarter than the user, being what Microsoft does all the time.

Second, if I’m going to hack, I want to know exactly why the hack is necessary. I don’t use CSS hacks I don’t understand, and I avoid using CSS hacks whenever possible. The same goes for any other hack in any other language. I will avoid hacks whenever I can, and only use them when I understand them.

A group of identically-named checkbox form elements returning an array is a pretty standard feature of HTML forms. It would seem that, if the only way to get it to work is a non-HTML-standard-compliant workaround, it’s a problem with PHP. Since the array is passed in the header in a post, or the URL in a get, it’s the PHP interpretation of those values that’s failing.

He was wrong about the square-bracket workaround being non-compliant; it’s acceptable to throw brackets into name values. Other than that, I agree with him 110%.

Regardless, that closes that case: if I want to use checkboxes or have the same name applied to multiple text boxes, I’m going to have to use the square brackets. Now—for anyone who is using WP-Gatekeeper or has looked at the code, and has feedback on the coding or the way it works for the administrator of a WP blog, I’d like to hear from you.

There is another way, potentially, if you’ll pardon my limited (but now growing) understanding of PHP: why not set, perhaps, 7 checkboxes, each with a designated color. Only 4 would have the proper colors. Have the user select only the green and the blue. Set the array with PHP — then implode the array using the implode() command to a single string and send that string to the handling script. Let PHP then parse the string. Right string: post; wrong string: send them to another site (maybe even their own IP).

Remember to encode character entities if you're posting markup examples! Management reserves the right to edit or remove any comment—especially those that are abusive, irrelevant to the topic at hand, or made by anonymous posters—although honestly, most edits are a matter of fixing mangled markup. Thus the note about encoding your entities. If you're satisfied with what you've written, then go ahead...