A couple of months ago I decided to put my search engine knowledge to the test and started answering questions in the Google Webmasters Help forum. In that time, I’ve now answered 1000 different questions from people of all different skill levels from around the world in an attempt to help them get their website issues sorted.

To people that frequent forums online, 1000 posts/comments might not seem like very many; however when they are narrowly focused on helping people about a specific topic and it isn’t just casual banter/discussion about your favourite football team or motorsport, I think it is actually a pretty neat achievement.

When web based email services like Hotmail where still quite a new service, the barrage of spam that they received was a source of constant complaint for their users. As the internet and their spam filtering techniques have evolved, I would argue that spam has become less of a problem in email as their classifiers have become more sophisticated.

In the last couple of months, either the Windows Live Hotmail spam filtering algorithms have decided to take a nap or the spammers have found a really clever way of slipping through undetected with what a human would consider as easily identifiable spam.

How Do Email Spam Filters Work?

Early versions of email spam filters were very primitive and the mere presence of a word in an email was enough to push it into the spam queue. Fortunately, those simple measures have been replaced by the more robust Bayesian spam filtering which are supported by a lot of customisation.

Bayesian spam filtering decomposes an email into tokens, normally words but sometines other markers and then uses the frequency of each words appearance in that given email against a know sample to determine if it is spam or not. Using that sort of a strategy allows a user to use a trigger word in their email and not have it marked as spam because the other content in the email doesn’t push the classification of the email high enough.

Manual Email Spam Classification

Following I’ll breakdown the first highlighted email in the inset image above into its individual components and show you why it is easy to identify for a human but slightly more complicated for a spam filter to identify.

Sender Name

Harry Long

The surname might be a play on words to do with the size of the senders penis or it could just be a randomly chosen surname. If you scan across the other senders names in the above image, you’ll notice they are all quite straight forward like Walter Carlson or Paula Santos. What you aren’t seeing as the sender names are very obviously spam sender names like Free Porn or Barbara Big Boobs; so there has definitely been some learning taking place.

Sender Email

harry-long7125@hotmail.com

Nothing unusual about the senders email address, it is common place these days for Hotmail email addresses to contain numbers in them as they have so many users. Checking across the other spam email addresses and all of the sender email addresses seem quite reasonable and have some amount of correlation to the senders name and none of them contain clearly identifiable spam trigger words.

Does the Hotmail spam filtering treat email from other Hotmail email addresses as less likely to be spam than an email originating from outside of their service? Could they be placing too much emphasis on the security of their signup process to weed out spammers? I know at some point Gmail had their signup process brute forced and the spammers were able to systematically overcome the CAPTCHA.

Recipient Email

amakye@hotmail.de

To an English speaker, you’d look at the senders email and think that it might be more gibberish however you’d be wrong. Amakye is actually a name – for a real world example, consider Amakye Dede, one of Ghana’s premier musicians.

As a trait of a high quality messages, do people that forward an email onto someone else regularly forward it onto multiple recipients? Given that all of the spam emails seen in the image above have a single additional recipient, I believe that must be quite common.

Following on from the possibility of Hotmail spam filtering providing brownie points for the sender coming from a Hotmail email address, it surely isn’t a coincidence that each of the emails above also have the additional recipient as another Hotmail user.

Subject

FW: MikaPMizunoMTakesGOneRHardACockFAfterTAnotherZInBThisLGangbang

Working from left to right, you can see the message has been supposedly forwarded onto me, signified by the classic FW: prefix for the subject. Scanning through the other emails highlighted and they all contain the same FW: prefix; maybe a forwarded email gains brownie points with the spam classifiers as being less likely to be spam.

Next up and a quick glance at the actual subject topic itself and it looks like it is gibberish until your eye catches actual words in the subject and then rescans the subject seeking out other real words amongst the nonsense. Suddenly you’re left with a subject more like:

FW: Mika Mizuna Takes One Hard Cock After Another In This Gangbang

I can only assume that the word cock isn’t being picked up because it doesn’t have a space or recognised word seperator on either side of it. I also think that the different capital letter used as a word seperator must have something to do with it as well – as an example if the subject used a full stop, hyphen or pipe character it’d be crystal clear.

Message

VCery SWexy YounUg whiCte ChicVk wiGth a NiQce AsRs BoQoty gPets F.Lu.c.ked bLy
http://younghotasia55198146n.com
by now."Corey clicked at him in disgust. The others of their pod had long gone

The message content itself is less obfuscated than the subject and far more easy to read in a single glance. I think the important thing about the message content is that no trigger words such as ass, arse, booty, young or fucked are present and spelled correctly.

Addressing the domain name and the spammers are no longer using domain names with words that’d be clearly identified as spam. In this particular instance, I am surprised that the spam classifier isn’t smart enough to identify that the domain itself isn’t in a common format; when was the last time that you visited a domain that had 8 numbers in it?

I suspect that the last sentence in the message is nothing but a token sentence so the entire email doesn’t look completely bogus. I find it interesting that they’ve mixed in a few words which would seldom be seen together such as clicked, disgust and pod – all of which would typically never be seen in a spam email.

As you can see from the above, to a human it is just so simple to spot spam – however it just isn’t that simple when you need to write software to identify spam while making sure not to falsely identify real email.

Every day that these spam email messages keep landing in my inbox, I keep group selecting them and reporting them to Hotmail. I wait patiently hoping that someone will answer my calls to try and tackle whatever issues their spam classifiers are having identifying those spam emails.

Back in 2007, Google released the first incarnation of the Google Profile. Back then the functionality was quite limited but it served a clear purpose – Google were looking to consolidate down all of the profile data that users had entered into the various Google services into a single location.

During 2009, Google Profiles received an upgrade which allowed users to choose a Google Profile vanity URL for their profile in lieu of a number which provided no context. In addition, Google stated that Google Profiles would begin showing up at the bottom of search results for searches they identified as a name.

Fast forward to 2011 and Google are at it again with another user interface overhaul in March. In addition to that, not only do Google Profiles show up in standard web search and not just at the bottom of a page now, they are displayed prominently using a Google OneBox if a user is logged in to their Google Account.

The following two images show what I see currently when searching for my name, logged in first and subsequently logged out of my Google Account. As you can see, when logged in I’m presented with my Google Profile first, which until today would have displayed my personal blog.

Google search for a name showing a Google Profile OneBox when user is signed in

Google Buzz is a social media product from Google that, as the sticker says, goes beyond status updates and allows you to share updates, photos, videos and more. It is quite an amazing tool, you can share publicly or privately through Buzz, is integrated alongside Gmail and you can connect it to a lot of other social websites to pull in their respective streams of data into your Buzz profile.

I’ve never really swallowed the koolaid though and for a long time I couldn’t work out why I wasn’t onboard, since I’m usually one of the first people to support various new Google products and then it dawned on me – it feels like it is a closed product even through it isn’t.

The whole idea of Buzz, as I see it at least, is to share and connect with people from various different social media platforms in one place. Once the initial connection is made, your group of friends and others can then comment and discuss anything you like about any of the items flowing through Buzz – not that dissimilar to facebook.

My issue with it in general is that for some reason, it feels like a closed product and that I’m writing or publishing into a vault that others can’t easily get access to. If I’m going to spend time writing something, then I want it to be in a place that others can easily find it – like a blog. While information that you publish through Google Buzz can and is crawled by the search engines, it isn’t easily discovered by other people searching on the internet; so my potential to help others who might have had the same problem seems impaired from the offset.

Google spam fighting super star Matt Cutts uses Google Buzz to post things that he thinks don’t warrant a ‘full blog post’ about. If you read through a lot of items on Matt’s site, you’ll quickly see that he typically invests a lot of effort into his writing. My thinking on that front is that, why not still publish whatever he was going to push through Google Buzz and use a different post format for it – an aside for instance. That way, he and his readers don’t have an expectation that it’ll be a hugely detailed or thought provoking piece but at least he is keeping it in his control.

I have the same issue with facebook, which is one of the reasons that I don’t spend any serious amount of time on it – especially when it comes to content development or promotion. I syndicate my blog into facebook, with the intention that if people want to discuss something about the item – they’ll leave a comment on my blog, in lieu of within facebook. I find facebook even worst, in that within a very short space of time – due to the number of connections most people have, any information that someone publishes flashes by and is lost & never to be seen again. facebook is a very point in time or near point in time product and from an editorial stand point, that doesn’t fit that well with me personally.

How do you see a product like Google Buzz fitting into the digital ecosystem? I know it is a great product but at this stage, can’t bring myself to utilise it for writing – even when they might be mostly throw away things.

In the last few days, a homeless man in Ohio named Ted Williams has become a viral internet sensation because he has a golden radio voice. Ted, a panhandler was standing at an intersection when someone took a short video of him and uploaded it to the internet:

The simple handwritten cardboard sign Ted was holding reads:

I have a God given gift of voice. I’m an ex-radio announcer who has fallen on hard time. Please! any help will be greatfully appreciated :) Thank you and God bless you, happy holidays.

Since the video was taken yesterday, God has clearly shined a light on Ted as his fortunates are turning around. While being interviewed on local radio station WNCI, Ted said he has been offered a full time job with the Cleveland Cavaileirs and a house! If a source of income wasn’t the most amazing gift for a homeless person, a house to live in must seem like all his Christmas’s have come at once.

Now onto the dirty, filthy, grubby part of the story and it relates to unscrupulous individuals capitalising on the viral nature of the Ted Williams sensation and trying to cash in for some quick, easy dollars.

Click to enlarge

YouTube provides related or suggested videos to the right of the video you’re watching on their site. When watching the embedded video above on YouTube, the image to the right is what I’m seeing at the moment. Highlighted are five videos belonging to four different YouTube accounts, each beginning with ‘usnews<number>’ with a similar but different copy of the video.

If you click the related video, the user has essentially made a still video with an overlay stating that they can’t show you the video and you’ll need to click the link in the video description to watch the video. As soon as the overlay popped up and the video wasn’t available, it was clear it was just a scammer trying to make some money.

Click to enlarge

Digging a little deeper and you can see that each of the ‘usnews<number>’ accounts has uploaded the video between six and a dozen times to each account. Each video has a different name, slightly different length and links to a page on the same domain for the user to watch the video with ads plastered all over it.

They are employing fairly standard video search engine optimisation techniques, by providing a good video name, categories and description. By uploading dozens of copies of the same video but changing it each time, they are maximising their opportunity of having one of their videos rank when a YouTube user goes looking for Ted Williams. At the time of writing this, usnews57 has 10 different versions of the fake video which have been viewed just under 2500 times.

It disappoints me no end that people will go to these lengths to make a dollar. I know it is easy and no one is necessarily being hurt by their actions but the fact that they would essentially take advantage of a homeless man and his amazing good fortunately indirectly is sickening.

Google Reader is a fantastic web based RSS reader and I use it to read a lot of my favourite sites and also as a way to keep abreast what is happening within an industry, such as travel and hospitality.

One of the usability issues that most web designers face these days are with small screens, such as an iPhone or large screens such as 24″ widescreen LCD computer monitors. The obvious reason is that with small screens, a design has very limited screen real estate to work with and must select what information to display very carefully. With large screens, there is an abundance of screen real estate to work with, which under certain conditions and actually lead to usability issues.

Case in point, when you use Google Reader at a more moderate resolution such as 1280×1024, the interface is very usable and everything works as expected. However, if you happen to run a widescreen monitor at 1920×1200, the interface is considerably less usable as it the width of an RSS item suddenly increases by 600px. I find that when I use Google Reader maximised, that I generally find it hard to scan across to the right hand side to click the (>>) icon to open the actual web page that corresponds to the RSS item.

In June 2008, Google launched Gmail Labs – a new style of sub-product which let users suggest ideas and Google engineers contribute small pieces of functionality to Gmail that users could optionally enable or disable at their leisure. Since June 2008, different Gmail Lab features have come and gone, some have been promoted to the main product by default and others haven’t had the uptake and have subsequently been removed.

One particular Gmail Labs feature that I absolutely love is called Move Icon Column. By default, Gmail provides small icons beside certain items in your inbox on the right hand side of the item to provide a visual clue of its contents, such as chat, calendar, Google Buzz or attachment. Just as I highlighted above that ultra wide screen resolutions don’t provide the best usability in Google Reader, the Gmail interface suffered the same issue. Fortunately, Gmail has the Labs functionality and with a few clicks of a mouse button, you can move the icon column from the right to the left and suddenly it doesn’t matter if you’re using a more moderate screen resolution or ultra wide screen.

I’m subscribed to dozens and dozens of email subscription lists around the world, ranging in topics as far apart as high end sports cars to online marketing and everything in between.

It has been said countless times before that anyone building an email marketing list needs to treat their email subscribers with respect. A lot of businesses choose to implement double opt-in systems to guarantee that a person really wants to receive semi-regular email marketing correspondence and some even go as far as re-opting them in periodically as well. Double opt-in isn’t a silver bullet that authorises a sender to blast emails out, merely a confirmation that they’d like to receive a respectful amount of email from a company.

Historically speaking, I don’t tend to unsubscribe from email lists unless something goes wrong or I have a change of tact for a while. However, in more recent times I’ve become far less tolerant of mediocre or only ‘okay’ quality content or senders who don’t respect my inbox. Now a sender need only step over the line for a short period of time and I’ll start hovering the unsubscribe link.

In the case of Australia Fair Shopping Centre on the Gold Coast, I provided them my email address for a competition they were running. It wasn’t long after I entered the competition that I began receiving emails from the shopping centre about all manner of things, all well structured and on topic. Unfortunately, their frequency has increased to a point now where I just can’t be bothered to open them to find out what businesses have sales or promotions on anymore & would rather just unsubscribe to make the email stream go away. Of course for Australia Fair Shopping Centre and their businesses, this is the worst possible outcome for them.

I can only hope that someone in their marketing team might stumble onto this short post. If they do, my recommendation is to rethink the email schedule, maybe change the email format so you can send fewer emails per week but still get the same content in or allow your subscribers to choose what sort of content to receive to reduce the email footprint.

The organisation behind a number of the standards that web development relies on in the present were forged through the World Wide Web Consortium or W3C. Over the years they’ve published countless documents on their web site outlining in great detail the various standards, such as what HTML tags are allowed to be nested within a <table> element in XHTML 1.0 Strict.

At some point they published a great document stating that cool URI’s don’t change. The general trust of the document, is that once a document is published on the internet – that URI should be permanently available as you never know who might link to it or consume it. As every user of the internet can attest to, there is nothing more frustrating than following a link from one web site to another and being greeted with the infamous 404 Page Not Found error.

While randomly clicking some links on my blog today, I noticed that I had two links in the footer to the HTML and CSS validation services. I haven’t clicked on those links for a long time, but for some reason today I did and was greeted with a 404 error. It would appear that over the course of time, W3 have very subtly updated the URI that the CSS validation service exists at:

Old: http://jigsaw.w3.org/css-validator/referer

New: http://jigsaw.w3.org/css-validator/check/referer

I figure maybe someone at W3C will see this pop up and see that one of their older, heavily referenced URI’s no longer works properly and they’ll put in the appropriate redirect.

In the last week or so there have been a number of publishing errors on Search Engine Land, which I consider to be major mistakes that should have been caught by the authors, writers, editors and good quality work flow.

I would have expected that the original authors of those two respective articles would have previewed their work before pushing the publish button. Of course, it is quite possible that the people that wrote the articles don’t have accounts to login to Search Engine Land, in which case I would have expected them to check their work once one of the editors had published it and provide feedback if necessary.

It appears that hasn’t happened and the work flow that Search Engine Land have implemented isn’t solid enough to catch even the most glaring of oversights. It is a shame really because they produce a fantastic web site with valuable content throughout but the simplest of things like the above tarnishes their work in my eyes – it kind of suggests that they don’t care about it as much as I would have hoped that they do.