PDF Engines and Copyright Infringement: How to Stop Them

There's a not-so-new copyright infringement threat that seems to be becoming even more prevalent these days -- so-called PDF search engines. In the last two weeks alone I received notification that three of these sites were publishing my content without a license or permission.

What's happening? Well, these sites purport to be about helping you find .pdf files. First of all that often means they're hotlinking files (linking directly to the download and not to the download page on the publisher's site, which is often against the terms of use). But the bigger problem is that some of these sites are making your entire .pdf document available for viewing on their own site.

That means any e-book you give away for free or files you host but don't share download links for publicly (such as subscriber-only links) are not only being indexed by these so-called search engines, but they're being republished without your knowledge and usually without your consent.

I've already walked you through how to deal with content thieves. In that article I explained how you should go after their traffic and ad income before trying to have the content pulled offline, and explained how to go about it. Today I want to talk about how you can discover this kind of infringement and how you can attempt to stop at least some of it.

How to Identify .pdf Content Thieves

I don't care for most plagiarism detectors considering they're often used by plagiarists to edit stolen content rather than to really defend against stolen work (hence requests for "Copyscape-passed" content from some buyers on the Web). Instead my tool of choice is Google Alerts.

Set up a Google Alert (or Yahoo! Alert if you prefer) for your name. Set one up for major niche keywords. Most .pdf content thieves don't edit anything. So if your name is in your .pdf file as author names usually are, you'll be alerted if your files are suddenly appearing on other sites with or without your permission. Then you can click the links, visit the pages, and see if they're summarizing the .pdf and linking to it or if they're republishing the content in full.

I plan on taking a more proactive approach in coming weeks as well -- searching for the actual file content. This can work if you only have a few. If you have more than a half dozen free .pdf files available it could prove to be time consuming. The idea is to search for a specific phrase unique to each file.

How to Protect Your .pdf Downloads

Your .pdf files can be important to your business. You spend a lot of time creating unique resources to bring in traffic and backlinks. So how can you stop others from re-publishing those files, getting in the way of your own business goals for them? Here are a few suggestions:

Make sure the usage and distribution / publication terms are laid out in the documents themselves. Then no one has any excuse for saying they didn't know what they were and weren't allowed to do with them (if they choose not to look, that's their own damn fault).

Put a terms of use section on your website as well. This is one I plan to do on a few sites of my own in coming weeks.

Keep your download files in a separate folder on your server and use your robots.txt file to stop the automatic indexing of documents within that folder.

Place the files in a password-protected directory if they're for members or subscribers only and if you don't mind the extra hassle for those members or subscribers.

Use a .pdf download service if possible, where every recipient gets a unique download link.

Have you also seen an increase in stolen .pdf files lately? How are you dealing with it? Leave a comment to share your stories and tips below.