So say you had a site like StackExchange. Now imagine you created a machine-writeable API capable of posting questions and comments and doing everything your browser can do.

Of course, about 5 minutes after launching, you'd have so much spam that you would probably have ran out of room on your database servers and will have been permanently blacklisted from every search engine(except for maybe Ask :) )

So, how can you avoid that spam, while allowing a machine to actually use your API(and/or have automation involved? Is it possible in this day and age? I'd expect there is a human moderator at the end of it, but how do you cut down on the majority of spam to measures capable of being handled by human moderators?

Hint: All websites have a pseudo-API by way of GET and POST requests between the browser and web server, which can be scripted and therefore would be accessible by the bots you fear.
– IzkataMar 31 '13 at 4:03

@Izkata well yes, but that's why you put measures in place. For an "actual" API, a honey-pot isn't really going to work. Captchas might, but then they're not machine "usable"
– EarlzMar 31 '13 at 4:53

2 Answers
2

There are a number of approaches. First and foremost is requiring (human) signup and an API key; this allows you to require requests be signed, and that allows you to take the next step, which is throttling. Ever notice that Twitter has a limit on the number of calls you can make in an hour? That's the most basic form of throttling. You can also use rate-based throttling, which can return error messages if you're calling a service too quickly. If someone persists in attempting to call the service after they've been throttled, they can be banned for an increasing amount of time--1 minute for the first offense, 2 minutes for the second, 4 for the third, and so on; a form of exponential back-off. Audit trails mean that if someone is ultimately determined to be a spammer, you can just nuke their "contributions" all at once.

I ended up writing my own "proposal" for this, but I'm not sure how good of a solution it would actually be.

Anyway, take the concept of bitcoins and apply it to an API. Use a "difficulty" and a provided user-nonce value and require them to return with a hash value that includes the user-nonce and matches the difficulty/condition required. You can take this concept and make it "difficult" to make API keys, and "easier" to make comments and such with your pre-made API key.

It's definitely not perfect, but would appear to be a method of allowing pure machine registration and "writing" to your API. And it's doubles as being accessible. No visual or audio captcha required.