AntiLeech Splog Stopper: Fighting Back Against Content Thieves

I often have these kinds of thoughts: “What if smokers had to ask for a smoking section in a restaurant, assuming all restaurants catered first to non-smokers?” “What if everyone thought first of asking permission before borrowing and taking what wasn’t theirs?” “What if the people were able to vote on whether or not they really wanted their country invaded, or just their leaders replaced?”

These thoughts all boil down to responsibility. I think the weight of the responsibility should be on the abuser. If you are thinking of doing wrong, I think bells and sirens should go off inside your head for a long time before you can act.

Unfortunately, the responsibility to protect ourselves against evil and idiots lies with us, not with the abuser. Among the new tools available to us bloggers, the responsible ones, is Owen Winkler’s AntiLeech WordPress Plugin.

No, it does better than that. It produces a fake set of content especially for them that includes links back to your site (and mine, too, ok?) and sends it only to them. When they steal this content, it appears online just like normal, except now you’ve turned the tables on them. You’re actually using the sploggers to promote your own site.

AntiLeech can detect a splogger bot using its User-Agent string (an identifier that some bots send when they are collecting data), or by IP address. You can enter a User-Agent or an IP address into the Options panel of your WordPress blog. When a visitor with a qualifying (any checked option on the options page) User-Agent or IP address visits your site, they will see only the generated content. They will see it in your page layout and in your feeds. Anywhere you’re normally outputting content, that’s where the fake content will appear to them.

Regular users whose browsers do not match these strings will see your normal content. RSS aggregators should be able to display your content normally, too.

When a splog (spam blog) grabs the feeds from your blog and uses it as it’s own content, it is called scraping. Different blogs have different copyright policies. Use of full content feeds on sites with advertising or considered “commercial”, even with links back to the original site, is often a violation of the most common blog copyright policies. Putting a stop to these content thieves can be difficult, as seen recently with the Bitacle Battle.

Then Owen Winkler, WordPress hero, straps on his splog fighting coding tools and steps forward to help us fight back against the splog content thieves. He understood that everything and anything visiting your site leaves behind a footprint. The key is finding their footprint and identifying it as a splog and then stopping them from getting their foot in the door in the future.

The AntiLeech WordPress Plugin sends a small graphic “AnitLeech” graphic in your feed’s output. The graphic helps AntiLeech collect User-Agents information that you might want to block. The Plugin’s Administration Panel lists on what page it first saw the User-Agent using the graphic, and provides information to help you better make the decision to block that User-Agent or not. From the Admin Panel, you can choose to block those site’s access or not.

AntiLeech will add information to your robots.txt file, a file in the root directory of your site that contains instructions for web crawlers and web bots, computer programs that visit your site and collect information and data. Instructions on denying access to these splog abusers and scrapers is added to the robots.txt file, putting a stop to their visit before they get more than a toe in the door.

Once you have activated the AntiLeech Plugin, you will find its panel under Options > AntiLeech. You have several options you can control from there.

Under Observed User Agents is the area that will help you detect who may be stealing your content through your blog’s feed. Once you have determined which splogs are stealing your feed content, you can enter in an identifying name in this section. For example, if bitacle.org is stealing your feed content, to block it, add “bitacle” in the form. Any access with an identifying footprint with bitacle in it is considered evil and AntiLeech will kick into action, delivering fake, truncated, and other “unhelpful” information to the scraping site.

Under IP Addresses, you can enter part or all of an IP address to identify abusers. Many serious sploggers will play games with hiding and changing their IP address, but not all. Using a combination of IP address and User Agent name will act in combination for stronger protection. If you add the IP address to the list, they will also get the “faux” or fake information when they access your feed.

The Output Control section is probably the most fascinating as it gives you an insight into what AntiLeech really does. You can control the various options on how AntiLeech will respond to the User Agents and IP addresses you’ve targeted. They are:

Do not insert the AntiLeech image for detecting leechers into feed output.

Do not link to my blog inside the generated posts.

Do not publish the correct link (in the tag) in my blog’s RSS.

Do not attempt to remove AdSense iframes with javascript on remote pages that display feed output.

That’s some serious options. I especially like the last one. Stopping income generated from your stolen content is brilliant.

The last option in the Output Control section is setting what you want displayed in the feed information sent to the scraper. It can be the generated content, truncated content, or custom text that you write. You can say anything you want, but a good start would be: “You may be reading stolen content. Please visit the author’s site to read the original, copyrighted material, and find even more great related content.”

The last option to set in AntiLeech is the option to control your FeedBurner Redirects. While many think that giving control of your blog’s feeds to FeedBurner will protect you from scrapers and copyright theft, it doesn’t. They help, but they don’t always stop splogs from using your feeds.

For those using WordPress.com blogs, they’ve been working overtime to put a stop to some big scrapers such as Bitacle.

Only when we, the responsible ones, are well armed and fight back, we may see an end, or at least a decline, of the evil doers. I just wish they would bear more of the responsibility for being responsible for other people’s interests and not their own. Don’t you?

32 Comments

I’m a big fan of Owen’s plugin AntiLeech. I’ve been thinking of various ways to use it other than the simple ‘Warning, you are reading stolen content’. I like your bells and sirens type message in the post above, and when my content gets stolen bells and sirens do go off, I get angry, and I want to get even with the splog. Why not server up fake content that gets the splog in trouble with their host and advertisers?

You could write up some text to sell guns, drugs, and ask people to click on ads. Then, when this dirty content shows up on the spolg, report them to Google / Yahoo / their host. Game over for the splog. I’ve explored this idea more fully in “Fight Dirty by Entrapping Splogs Using AntiLeech.”

This is a dirty fight, its time us nice rule and law abiding bloggers got a little mean, and learned to throw a little sand, hit low, and especially, roll with the punches. We need to stop being such easy marks.

I’ve been watching discussion around this plugin, but I’ve been hesitant to use it since I don’t really understand the way it works. It sounds like it has the potential to prevent my regular readers from receiving my true feed, so I don’t want to use anything that will prove problematic on that front. How is Anti-leech about false positives?

In my case it works based on the IP address, since bitacle seems to use a common user agent string (the plugin lists all user agents that have accessed the feed).
Here’s my latest post (and the first to successfully having been antileeched), or at least the generated faux content:http://de.bitacle.org/v/249zkosalfyc0/herbst.html?usrmode=1

Jim: Thanks. No wonder my spell check didn’t catch that. Scrapping is for scrapbook lovers, and scraping is what gets the gunk out from fingernails. ;-) Thanks for the catch.

Heliologue: It works as long as you input the information bitacle is using to scrap your blog’s feed. They may change their technique, since they seem to be determined, so you might have to stay on top of this. The Plugin reports on suspected scrapers (and is that scrappers or scrapers? ;-) ) and there is information around the web now on the IP addresses they are using.

In other words, like all responsible efforts to stop evil, you have to do the work to keep up with the user agents and IP addresses used by bitacle and other evil sploggers. So the plugin works only as much as you put work into it. But it makes the job so much easier.

Although this plugin is indeed brilliant, it has more fatal flaw (at least, for some users): it doesn’t help if you use services like FeedBurner. I’ve been battling the last couple of weeks whether or not to force my visitors to switch feeds so that I can take advantage of plugins like this, I’d hate to start back at 0 RSS subscribers.

Anyway, how can this plugin determine false positives when it is up to you to determine if the splogger is splogging your site (stealing your content). It returns a list of potential user agents that it has detected are potentially scraping content from your site. You check to see if they are, and if they are, you put a check next to their user agent name and/or IP address. YOU CHOOSE, you decide, and you pick who is playing nice.

If you are allowing your feeds to be picked up and used by other sites, as syndication or otherwise, then you DO NOT want to include them in your leech list. You have total control. No chance of false positives when you are in charge of deciding who stays, who plays, and who goes.

Thanks Lorelle. I knew that the plugin doesn’t automatically block things for you. What I’m wondering about is how the plugin determines that something may be scraping your content. How it comes up with the list of “suspects” for you to research.

The first thing to know about AntiLeech is that you have full control over what it blocks, and it won’t block anything without your telling it to. AntiLeech uses the following process to create a list of potential user-agents to block:

A small image is embedded in your outgoing feed. When a browser encounters that image on a splogger’s site, it requests the image from your site’s server. Because of this, AntiLeech knows that your image (and your content) is appearing on some page that you don’t control. The URL for this image is specifically created for the user-agent that requests it. AntiLeech uses this URL information to build its collection of potental user-agents and IP addresses to block.

Will it report false positives? Well, no, because it reports any suspicious user-agents that meet the criteria I’ve described. You control which user-agents to actually block, so AntiLeech is never going to block someone unless you’ve told it to. Unlike email spam blockers, legitimate browsers are extremely unlikely to announce themselves using a string that you’re blocking, like “Bitacle”.

So unless your visitors browsers identify themselves as “Bitacle”, they’re going to get the right content. Since most popular browsers don’t even allow you to change the user-agent without installing extra parts or setting odd settings, this simply isn’t going to happen.

IP addresses are a little different in that you want to be sure that the IP you’re going to block is actually always the person that you know to be scraping your site. For some people, their ISP provides a different IP address every time they connect. If a splogger uses their home IP to scrape, then it’s possible it’ll change every time they log on. Can you block these IPs? Sure, but it’s likely that someone else who uses the splogger’s IP will later be unable to access your site. Bitacle, for example, seems to scrape from a system connected via an ISP in Spain.

To be clear about FeedBurner, the options in the plugin only help you redirect your feeds to them instead of using the Ordered List plugin, which has a couple of issues. Currently, I’m trying to gauge the advantage of using FeedBurner (yes, I’m a paid subscriber), since they don’t seem to provide any method of managing the splog problem with the feeds that they re-publish. The only thing I see that is really useful is the feed statistics, which I’m about to get from somewhere else. As soon as I figure out how to cleanly bring my feeds back in-house away from FeedBurner, I’m going to.

The AntiLeech plugin really just simplifies the process that you’d otherwise accomplish using complex .htaccess rules. Really, if you’re comfortable with the .htacces rules, and you don’t want the additional features like providing unique content to the sploggers, I recommend using the .htaccess rules instead of this plugin because they’re much more efficient (as would be any function that takes place at the server level, not that the plugin is not efficient).

I currently block things via .htaccess, but from your response I see that the added-value of your plugin (besides revenge) is that it yells back at me reporting that the image is somewhere. So even if Bitacle or somebody previously blocked changed their identification, AntiLeech will find them again.

What a bummer that this can’t coexist with FeedBurner. Please keep us updated on your findings.

So… if you’re still checking these comments, I have one last question: Sounds like your plugin won’t mess up with the .htaccess file. Right? It doesn’t require to open .htaccess for the server to edit, or does it?

Hmm. Here’s a thought… Sorry:
So, if Feedburner removes the image, hurting AntiLeech’s ability to find sploggers… The Digital Fingerprint (the other new plugin) will still go, and help you identify sploggers.

Does AntiLeech allow users to manually enter offending IPs or user agents? I’m sorry.. I probably should just download the thing and try it myself. But if you could answer this question…

Maria: I think your other questions were answered, but specifically, no, AntiLeech does not modify your .htaccess file at all. It takes control via WordPress’ existing rewrite rules. The downside is that it will only protect WordPress. So if you have some other way to manage content on your site, you’ll need to protect it some other way.

Sounds great, Owen.
I’ll definitely give AntiLeech a try. Regardless of whether it’ll for me or not (and I hope and think it will), I wholeheartedly thank you for joining the team of copyright superheroes. We need more!!!

It’s not about having something “worth” stealing. A huge number of WordPress.com blogs were grabbed by bitacle. A lot of them had little original or “worth” stealing, in my very humble opinion comparatively. Stuff is stolen all the time that you or I may think has little value, but it all has value to the owner.

Those who care about what they write and who may be abusing it, tend to find more abuse than the casual blogger who doesn’t investigate the abuse of your site. So coming up with numbers means taking care and concern, self-interest, willingness to investigate, and professionalism into account. The more you do this as your business, the more likely you are to pay attention to these things and not call copyright infringement “flattery”.

Some sploggers are very particular, grabbing only specific keyword related content. I’ve found my main site, Taking Your Camera on the Road, listed with many sploggers because I had the keywords in my article, not because my article had anything to do with their advertising product. Over four years ago, one was selling aquariums, fish ponds, and related equipment and grabbed my article on photographing fish and sea life through the glass of an aquarium. A little related, but not at all.

Anyone’s stuff can be stolen. It can just take work to track them down. Through WordPress Plugins like AntiLeech and Digital Fingerprint, it’s getting easier to track the thieves.

This has been an eye opener for me. I never thought that my content could be stolen! But after reading your post, I realized that “Hell! When people can steal almost your life in today’s world, they can get steal your blog’s content!”

Thanks Lorelle! You have been an eye opener for me. I’ll download the Plugin right now and get this over with.

I’ll let Owen know that he’s having problems with his site. I don’t know if this is the latest version, but you can download the Plugin script, and copy and paste it into a text file and save and upload it to your Plugins directory from antileech.php.

If you are using WordPress.com, you cannot use WordPress Plugins. If you have a self-hosted version of WordPress, it should work. Contact the Plugin author for information on the status of the Plugin for the latest version of WordPress.

Hi there. My blog is being scraped and I don’t know what to do! I installed the plugin you suggested – Antileech and it worked for all of 2 days! so now I have no idea what to do. I have tried reporting these splog sites to google. I’m a fairly new blogger and trying to build an audience and online reputation and I spend hours and money to buy photos and to write my own content and to see it being reproduced word for word is so discouraging. I tried contacting the plugins owner but he does not provide support
thanks!

[…] The issue of sploggers getting bloggers‘ content has been quite a big thing these days. Lorelle blogged about this nifty plugin from Owen Winkler. It is aptly called AntiLeech WordPress plugin. It not only gets rid of the sploggers. It also generates fake content to show up on the splogger’s site (or sites, as the case might be). […]

[…] WordPress users recently got help from some creative WordPress Plugin authors. Check out AntiLeech Splog Stopper: Fighting Back Against Content Thieves and Digital Fingerprints Help Track Blog Content Theft, WordPress Plugins that will not only help you put identifying unique elements inside of your feed content, but also report back on who is ripping off your blog’s content. […]

[…] A relatively new plugin, AntiLeech has already gartered a good amount of press. The plugin, which is by Owen Winkler, works by misdirecting scrapers. It identifies scrapers through a variety of methods and directs suspected bots to dummy content, content that is determined by the user. […]

[…] One of the reasons I love WP is the third party stuff like the AntiLeech Splog Stopper: It produces a fake set of content especially for them that includes links back to your site (and mine, too, ok?) and sends it only to them. When they steal this content, it appears online just like normal, except now you’ve turned the tables on them. You’re actually using the sploggers to promote your own site. […]

[…] posts and see if the plugin works. For those of you not familiar with content-leechers, check out Lorelle’s post on the topic. I prefer not to link any content-leeching sites, as this would just pad their rankings and […]

[…] What is it? From the plugin page (the plugin page is currently MIA, but you can find more info at this website): “What does AntiLeech do? AntiLeech does not prevent the splogger bots from accessing your […]

[…] AntiLeech Splog Stopper and Digital Fingerprints WordPress Plugins (my reviews of these options) can be used to track content thieves by inserting digital “fingerprints” into your content’s feed which then can be used to search search engines to find the unique content or “fingerprints”. […]

[…] can read more about how these splog-stopping WordPress Plugins work in my reviews on AntiLeech Splog Stopper: Fighting Back Against Content Thieves and Digital Fingerprints Help Track Blog Content […]