When post is released I'll be in an exam, but I wanted to post again about the perfectly fascinating spam situation here on my blog. I've blogged about fending off spam on here before (exhibits a, b, c), but I find the problem is detecting it in a transparent manner that you as the reader don't notice very interesting, so I think I'll write another post on the subject. I could use a service like Google's ReCAPTCHA, but that would be boring :P

Recently I've had a trio of spam comments make it all the way through my (rather extensive) checks and onto my blog here. I removed them, of course, but it still baffled me as to why they made it through.

It didn't take long to find out. When I was first implementing comments on here, I added a logger specifically for purposes such as this that saves everything about current environmental state to a log file for later inspection - for both comments that make it through, and those that don't. It's not available publically available, of course (but if you'd like to take a look, just ask and I'll consider it). Upon isolating the entries for the spam comments, I discovered a few interesting things.

The comment keys were aged 21, 21, and 17 seconds respectively (the lower limit I have set is 10 seconds)

All 3 comments claimed that they were Firefox 57

2 out of 3 comments used HTTP 1.0 (even though they claimed to be Firefox 57, and despite my server offering HTTP/1.1 and HTTP/2.0)

All 3 comments utilised HTTPS

The IP Addresses that the comments came form were in Ukraine, Russia, and Canada (hey?) respectively

All 3 appear to be phishing scams, with a link leading to a likely malicious website

The 2 using HTTP/1.0 also asked my server to close the connection after sending a response

The last comment had some really weird capitalisation. After consulting someone experienced on the subject, I learnt that the writer likely natively spoke an eastern language, such as Chinese

This was most interesting. From this, I can conclude:

The last comment was likely submitted by a Chinese operator - even though the source IP address is located in Ukraine

All three are spoofing their user agent string.

Firefox 57 uses HTTP/2.0 by default if you're really in a browser, and the spam comments utilised HTTP/1.0 and HTTP/1.1

Curiously, all of this took place over HTTPS. I'd be really curious to log which cipher was used for the connection here.

In light of this, if I knew more about HTTP client libraries, I could probably identify what software was really used to submit the spam comments (and possibly even what operating system it was running on). If you know, please comment below!

To combat this development, I thought of a few options. Firstly, raising the minimum comment age, whilst effective, may disrupt the user experience, which I don't want to do. Plus, the bot owners could just increase the delay even more. To that end, I decided not to do this.

Secondly, with the amount of data I've collected, I could probably write an AI that takes the environment in and spits out a 'spaminess' score, much like SpamAssassin and rspamd do for email. Perhaps a multi-weighted system would work, with a series of tests that add or take away from the final score? I might investigate upgrading my spam detection system to do this in the future, as it would not only block spam more effectively, but provide a more distilled overview of the characteristics of each comment submission than I have currently.

Lastly, I could block HTTP/1.0 requests. While not perfect (1 out of 3 requests used HTTP/1.1), it would still catch some more bots out without disrupting user experience - as normal browsers (include text-based ones IIRC) use HTTP/1.1 or above. HTTP/1.1 has been around since 1991 (27 years!), so if you're not using it by now - upgrade! For now, this is the best option I can see.

From today, if you try to submit a comment and get a HTTP 505 HTTP Version Not Supported error and see a message saying something like this:

You sent your request via HTTP/1.0, but this is not supported for submitting comments due to high volume of spam. Please retry with HTTP/1.1 or higher.

...then you'll have to upgrade and / or reconfigure your web browser. Please let me know (my email address is on the homepage) if this causes any issues for anyone, and I'll help you out.

Found this interesting? Know more about this? Got a better solution? Comment below!

Comments

Leave a comment

Mess with this field at your own risk. It is here to catch spambots, and so altering it could cause anything to happen!

Comment:

?

Commenting FAQ

What are the commenting rules?

Never use your real name.

Do not give away any personal informtion.

Criticism is welcome, but it must be constructive.

No abuse or inappropriate content of any kind will be tolerated.

Do not spam.

Any comments that are seen to not follow these rules will probably be deleted. It may also result in a ban for the offending user such that they are unable to view the site for an arbitrary length of time.

These rules may also be amended without notice, so please check them often.

How can I format my comments?
Basic markdown can be used to format your comments:

Type this

To get this

Notes

*bold text*

bold text

-

_italics text_

italics text

-

~~deleted text~~

deleted

-

`code text`

code text

Inserts some monospaced code. It is preferred that large blocks of code are linked to using a service such as Pastebin, Github Gists or Ideone.

Inserts a hyperlink. Please use responsibly. [rel=nofollow] is in use and spam will be deleted.

---

Inserts a horizontal line. The previous line must be blank.

All HTML will be escaped with HTML entities such that code in <code> blocks will be visible, and for security reasons.
The maximum number of characters allowed is 2000.
What is my email address used for?

Your email address, should you choose to provide it, is used to send you notifications of replies to your comment(s). It is used to display your Gravatar.

Your email address will be kept securely on the server, and nobody else will be given your email address. You will not receive any spam either.

If you do not wish to recieve emails anymore, please contact me (you can find my email address at the bottom of every page) and I will prevent emails from being sent to your email address. I will eventually write an automated system to handle this, but for now I will deal with requests manually. I should get back to you within 48 hours. Note that your (gr)avatar will also disappear and be replaced by an identicon.

Built by Starbeamrainbowlabs. Feedback can be sent to feedback at starbeamrainbowlabs dot com. Comments can be made on all blog posts. Some icons from the awesome Open Iconic icon set.