Friday, July 1, 2011

How to get gmail.com banned - not that I did this

When I started Mailinator, a LOT of people told me it wouldn't work because websites would ban it right away. Ban it with reckless abandon. Ban it like the new thing on the internet was to just sit around and ban Mailinator all darn day long.

As it turns out, that didn't happen. Sure, some sites do ban Mailinator and some are even really (really) excited about the idea, but in the grand scheme of things, it's not really very many. Thousands of people use Mailinator everyday, so clearly, its a useful tool that many sites accept.

Back in the day however, I sadly fell prey to the words of doom that I was being fed. I mean, holy mackerel - what if sites DO ban it? What then?

So I drew up a plan. A plan, that at this time I can say I may not be fully proud of. A plan that involved guile, wit, a few domain names, and some rate-limiting (thread-safe) data structures.

I write this now because, well, for the most part the war is over and Gotham has grown past needing Batman anymore. Mailinator is not really the rogue tool it once was. Heck, hotmail supports disposable email now. It's mainstream.

Typically there are two reasons people want to ban Mailinator. A few years ago, people really had some sort of notion that your email somehow equated to your identity. Given the radically insecure setup of email in general, that was really a ridiculous technical assumption. Nonetheless it was pervasive.

Secondly, people banned Mailinator for fear of people abusing their website. Now keep in mind, anything you can do with Mailinator, you can also do with YahooMail or Hotmail. Its just that Mailinator lets you do it faster, but Yahoo is plenty happy to let you sign-up for 100 email accounts.

I get occasional emails from people asking me to have Mailinator stop accepting email from their site. Usually for the reason of stopping abuse. If they're nice and it makes sense, I almost always do it. But in my experience, usually when the existence of Mailinator is pinpointed as a cause of abuse, it is in truth merely an avenue that is already inherent to the internet or your website. Even shutting Mailinator down wouldn't solve the problem. The bad-guys just go somewhere else and keep on abusing.

Any sort of abuse is needless to say, no fun for anyone. Mailinator has specific code built-in to detect scripts and stop them.

In truth, Mailinator's system for detecting and shutting-down scripts and abuse really only serves one purpose. Its like that silly metal bar people put on the steering wheels of their cars. Let's be real, if a thief really wants your car, some dinky metal bar on the steering wheel isn't going to do diddly to stop him.

Same with Mailinator's anti-abuse code - it won't stop a determined person - but it does make it more of a pain than simply using something else.

1) Solved hacking and abuse on the internet ? --> Not even maybe2) Solved a little of the hacking and abuse on the internet possibly for me? ---> DING!

Ok, back to the story - as I said, in the beginning, the idea of wide-spread Mailinator banning scared me a lot. So what did I do? I bought some additional domains for Mailinator.

To this day, you can email bob@thisisnotmyrealemail.com and it will end up at mailinator (in the bob inbox) just like bob@mailinator.com.

Cool. Alternate domains. Problem solved.

Wait a second. How exactly do I tell the world about the alternate domains without telling the people that want to ban them all?

Every few weeks I get an email like:

Hi! Love your service. Can you send me the exhaustive, comprehensive, and complete list of alternate domains so I can pick a nice one that suits my individual personal style? kThxBai

At first, I was like "Neat! People love Mailinator and want to...heeeyy.. waaiiit a second".

If I give them the whole list, then they will, um, have the whole list. And then they can ban the whole list.

Ok. I know. I'll list one random alternate domain on the homepage every time you visit. No one will have the whole list. Just one here, one there. Perfect !

There problem solved. Again. Well, sort of.

Soon after I put up this "one random alternate domain per homepage load" system - the scrapers started. Every now and then I'd notice several hundred homepage loads from the same IP in a very short period of time.

They were scripts; scripts that were loading the homepage over and over and scraping out the random alternate domain that was shown. Sneaky. By doing this they could eventually formulate the entire list of alternate domains.

Drat. Now what. For awhile, nothing. I just let them go. A few months later however, I got an email from a Russian guy (sorry Russian guy, I don't remember your name).

You are dumb. Your homepage is easy to scrape and doesn't change so its easy to scrape your alternate domain. You are dumb.

He was right. Well, I'm not sure about the dumb part, but my homepage was easy to scrape. Someone could probably write a script to scrape it in short order. Probably just took a few minutes.

Could I make it harder to scrape? Well, I could, but wouldn't really slow anyone down much.

It was then however, I had a flash. An idea of simply epic proportions. A thought so crazy - that dad-burn-it, it just might work.

Let's not make the page scraping harder - let's make it EASIER.

I removed the bit of code that displayed the alternate domain and put it in its own (teensy) webpage. That "webpage" had absolutely nothing in it, except the text for the randomly chosen alternate domain itself.

Then, I embedded my new tiny webpage into the homepage (so it showed as before). Basically, to the viewer of the homepage - nothing was different. You saw the homepage and a randomly generated alternate domain, just where it was.

But to the folks that had been scraping my site, things looked plenty different. In fact, I probably broke all their scrapers (Sorry nice people trying to get all my alternate domains just to ban them! (ok, not really)).

Now here is a finer point of semantics. If you go to the Mailinator homepage, there is some text that says "Here is an alternate domain" followed, by, well, a randomly chosen alternate domain.

However, now that I split off that tiny little webpage with JUST the alternate domain in it - you could go there too by typing in the url directly. And you'd see nothing BUT the alternate domain. No surrounding text. No text saying "this is an alternate domain". That little page showed a domain, but made no claim about what it was displaying.

For your browsing pleasure, here's the only direct link to that page that I know of: Go ahead, reload the page a few times. (You can see this also on the Mailinator homepage on the lower left).

After the script guys got over the minor annoyance of their scripts breaking because o f my new setup, I'm sure there were office parties across the nation. Mailinator! Now even easier to scrape!

Now for the record, the rest of this post is hypothetical. An unimplemented idea if you will. Who knows - I'll bet nothing you read here on out ever happened. Just random thoughts. Musings. One big theory. Consider it random daydreams of guy who runs a fun email service.

Remember all that script-detecting code from the anti-abuse system? Well, what if I put that in here too I thought. Let's "detect" when a script is hitting our weensy alternate-domain page.

And, what if we also detected when the little web page is being viewed but not "in" the homepage - but by itself (just like the link above). And what if after about 30 page hits from the same script (or so), stop displaying actual alternate domains and start sprinkling in some other things. Hmm... but what other things?

I know - how about "gmail.com". Or, um "hotmail.com". Or maybe, "yahoo.com".

What, in our completely and totally hypothetical situation, would that do?

Well, let's see. There are these folks out there running scripts against Mailinator collecting all my alternate domains. Those scripts probably put results in a database or something and connects to their website. When one of their users tries to sign-up on their site using one of my alternate domains, it's in their database as a banned site and its immediately rejected.

Now imagine the wacky fun if somehow, some way, (totally theoretically speaking) some silly person snuk "gmail.com" in that list. I'd guess banning your users trying sign-up with "gmail.com" addresses is probably not what you want.

And, hypothetically speaking if you had code that would sneak in these non-alternate-domains in the page they weren't supposed to accessing anyway, when would be the best time to set it into action?

Well, those scripts ran at many different times, but just after midnight seemed like a popular time-slot.

If such code existed, making it active Sunday morning from Midnight to 2am seems nice. I mean heck, if my website stopped accepting signups from "gmail.com" on some Sunday morning, I'm sure I'd be downright chipper to hop into the office and find out why.

Boy. If all that stuff happened - I wonder what kind of email conversations I'd have on that Sunday afternoon? I bet they'd be like:

Your alternate domain list displayed 'gmail.com'!Hi Fred, no it doesn't. Just reloaded the homepage 10 times, nothing like that. all the best.

or I bet another would be like:

Yahoo.com? What is this some kind of joke?Sorry, did you mean to email this to Carol Bartz? Not sure what you're talking about.

Phew. Well, that's surely a fun thought experiment. As you can see from the link above however, it surely doesn't do anything like that. Honestly these days, most of the scrapers are gone. I think simply that the internet evolved and more of them simply lost interest in the fight.

Every now and then I'm still asked what I think about banning Mailinator. I've mellowed a lot since the early days and I pretty much always give the same answer. If you think banning Mailinator going to solve your problem, go ahead. In my experience, it won't. And by asking I am guessing that you are making some assumptions in your site that will surface as issues in other ways.

And of course, script writers, you now have the direct link to the alternate domain page above. Scrape away. But keep in mind, the best way not to trigger any Mailinator abuse systems is to not do anything "too fast". Those script detectors are pretty fickle little beasts. It's not a bad idea to try and stay on their good side.

42 comments:

You have script detectors. You could make them help in the evil - if a user is not detected as a scraper script, only give them real answers. If you've detected a script, include the fakes.

This massively reduces the risk of getting a message from someone not trying to abuse the system in some way - and means that if you play your dumb responses right, you're likely to set the scraper script authors puzzling over what the bug could possibly be that added gmail.com to their DB.

Well, you messed with some people, and likely with poor bob@gmail.com who didn't deserve it. But you could have solved the core problem in a more realistic way. You say the problem is that scrapers could collect the entire set of alternate domains in minutes. So the solution seems easy. Link the alternately displayed domain to a clock, and only change it every 67 minutes or so. Visitors will still see an alternate domain when they visit, but rapid access will not get them anywhere.

You don't need to detect.Just base your random number on the IP of the client. This way, a script sees an unique result, where as your user base is averaged over all.Only a botnet or large network can still acess them all with effort.

You could divide your alternative domains into buckets, hash the originator's IP address or netblock to a bucket, and only give out the alternative domains that fall in that bucket.

On the other hand, if you start giving me false answers, the truthfulness can still be easily checked by examine these domains MX records. It's a little more work, but not too much.

In fact, I don't even need to scrape your page. Whenever someone signs up, I just need to check the domain's MX record to see if it points to one I want to ban. You will need multiple IP addresses as well, and hopefully they are not in the same netblock.

Sorry, but the scraping the site for domains seems rather silly (of your adversaries). Just a few random checks and all domains I found point to the same MX. That's a 168 msec delay.

Main question is, why would you waste any further time on this, apart from it being a fun pet project. For over 10 years I've used "mailbox+[whatever]@domain" with the related postfix configuration to handle random emails for sign-ups. If you really want to be clever on this, you need a few more tricks up your sleeve.

The best way to discourage them is to force them to spend lots of human resources on getting your domains. Have you considered using —these nowdays popular— visual captcha systems? One per new domain request...BTW your idea is very interesting, at least if they read this post, they might be restless... When will Paul force us to have a look to our database? :-D

Anon, have you tried installing a titanium lock on a cardboard box? Steering wheels are designed to collapse in an accident, and a hacksaw will go through one in a few seconds. One good slice can result in the bar falling off the steering wheel. The strength of a lock is irrelevant if it's put on something flimsy.

I always get a kick out of these sneaky, geeky tech solutions to people trying to game the system. Keep up the good work!

Hi! I think you're absolutely *BRILLIANT*! I love this post. Great ideas and I learned a lot about how websites provoke new issues or aggravate old issues by trying to solve problems by banning. For me this is a real eye opener and I'm grateful for it.

How about this? Instead of serving up a site 1 domain at a time, serve up a javascript page that contains the 'whole list' of domains, and then picks a 'random' one. Only, the thing is, when time this page is served up, only one of the domains in the javascript list is real: the one the browser will display 'randomly'. The rest of the domains in the list are just garbage. Someone will take one look at your page, think it's 'that easy' and will have a bogus list.

More ideas for you:1. partition the set of domain names. No single IP will ever get more than one subset of names.

E.g. Evens and odds. The set is divided in half, even IPs get one set, odd IP's get the other. Use the 3rd octet to decide. Or even the second. Not many people have access to multiple class B domains.

2. Make a non uniform distribution random number generator. 40% of the name pool used by 80% of the queries20% of the name pool used by 16% of the queries10% of the name pool used by 3% of the queries7% of the name pool used by .4% of the queries.3% of the name pool used by .1% of the queries

You get the drift. So it takes many thousands of queries to be convincing that you have got them all.

remainder get well known domains.

3. A second query from the same IP address gets a delay of 10 seconds. A third query has a 20 second delay. Each query within a certain period of time doubles the delay.

4. While the domain names are in clear text, they are surrounded by a graphic that says this one is the only true address the rest are false. The graphic itself is text that is easily read, but is obfuscated by the usual graphic debris. The false domains are ones that are real, but not connected to you.

E.g.

Don't use this one -> ATT.comThis one isn't a good one -> Rogers.caMaybe not a great idea ->gmail.comTry this one ->SpiffyMail.comNot really -> SperryRand.comYou have to be kidding -> IBM.com

Those were of course purely hypothetical examples. In fact, time was when a lot of sites specifically blocked Yahoo and Hotmail, on the grounds that "anybody" could sign up for them. So, naturally, nobody would be so sinister as to include things like comcast.net or rr.com or aol.com or verizon.net or earthlink.net on a purely hypothetical "fake alternate domains" list. Such a thing is unthinkable.

Similarly, it seems clear that the "alternate domain page" would never display any such thing as .co.uk or .oh.us or even .com -- especially not if some of the real alternate domains were always prefixed with one-off randomly-generated subdomains.

Thats sure an evil way to to get gmail banned I bet thats why too alot of other email providers have a lot of security options now that kind of force you into using your mobile phone to make the accounts now.

The suggestions about ip hashing don't wash, because these guys are likely using proxies anyway.

What I'd like to see is an ability to automatically add a domain to mailinator's alt domain pool. I point the mx records, fill out a form, mailinator checks the mx records and adds it to the pool. Combine that with a dozen extra ips on different /24s -> profit ;)