Spammers and hackers target WordPress web sites

If you have a WordPress blog, you have probably been the target of spam attacks.
Spammers hope to get more visitors or increase their page rank by putting links to their web sites on
the comment fields of your blog articles. Various plugins can be used to block WordPress
spam, including Akismet, wp-reCaptcha, si-captcha-for-wordpress, and others.

However,
even with the captcha methods that require a user to type a matching word to prove
that he or she is not a spambot, the spam gets through. How is this possible?
The problem is that wp-comments-post.php, which is a core module of WordPress
for processing comments, provides an action hook for pre-processing comments that
is exploited by spammers. A hack targeted toward a WordPress theme, such as the following, adds
filters that make it possible to add comments programatically without going through the comment
dialog of the blog and any captchas that may have been established to reduce spam.

It is possible to prevent the hack above by modifying the wp-comments-post.php to disallow the
use of action hooks for comment handling. Eliminating the following line from the code, as illustrated,
prevents hackers and spammers from sending comments directly and forces all users
to use the WordPress comment dialog.

The following is a discussion about a previous problem in wp-comments-post.php.
After making an entry to my WordPress blog, I noticed that my RSS subscription service did not
list the new entry for several days. When I examined the server files, I found out that my blog had been hacked.
The wp-blog-header.php had been replaced, and the hacked header file was designed to redirect
referrals from search engines to other web sites. Various WordPress sources recommended
updating to the latest version of the software, but some people reported being hacked even after updating.
The new WordPress code and the anti-spam plugins like Akismet and image captcha were not
preventing this problem.

Analyzing the WordPress code, it seemed that the hacking was made possible through buffer overflows
because the code did not validate the comment data before using it. Even the "comment blacklist"
specified through the WordPress administration panel was invoked by
options-discussion.php only after a potentially harmful comment had been processed by
the compact() subroutine.

I modified wp-comments-post.php to analyze the raw comment data and go to an error page when
anything was invalid or unacceptable. My error page has some counters that let me monitor hacking attempts. Below
is a listing of wp-comments-post.php with an example of the code that I inserted. The code restricts
the length of the comments, and checks the comment contents, author names, author IP, and author domain.
Since I implemented this code, I have monitored several hacking attempts that have been thwarted.
You can customize your own blog code using these basic ideas.
The comment validation can be extended further by performing a basic linguistic analysis to verify
that the average word length of the comments and the ratios of function words like prepositions, articles, and
conjuctions are within acceptable ranges for the language of the blog. In English, for example,
the 10 most frequent words (the, be, of, and, a, in, he, to have, it) account for about 25% of the text.

The latest version of WordPress has fixed the problem with hacking through comment overflows, but
the following code still is applicable for cases when you want to take specific action for
particular types of comments.