Scribd Puts User Docs Behind A Paywall Without Them Realizing It

from the totally-not-cool dept

Last year, I wrote about some issues I had with the way Scribd tried to avoid liability by suggesting that public domain documents couldn't be hosted on the site or that fair use was not allowed. To the company's credit, it responded quickly and fixed the situation, but soon after that I switched to (mostly) using Docstoc to host documents. Doctstoc has its own problems as well, but for the most part has worked well for me. Still, in my experience Scribd is still quite popular among folks -- especially for uploading and hosting legal documents. Apparently, the company recently made some quiet changes and it's seriously pissed off law professor Eric Goldman, who has relied on the site for quite some time.

The key problem? Without clear notification, it took "older" (and older is left undefined) documents and put them behind a paywall. As Goldman notes, the whole reason he used Scribd was to make the documents available, and it was quite a shock to suddenly find them behind a paywall:

Scribd's paywall stunt instantly put Scribd on my shitlist because it vitiates the reason I chose to use Scribd in the first place. I don't know that they ever promised me perpetual free access to the documents I post, but their value proposition always has been open access to the documents--freely shared with everyone and indexed in the search engines. The paywall destroys that value proposition. They've taken the documents that I wanted to freely share with the public (many of them public documents like court rulings and filings) and made them inaccessible. If my readers can't freely get the documents I wanted to share with them, then what's the point of using Scribd in the first place???

I also feel like Scribd used me. With their implicit promise of open access, they got me to share a lot of high-interest documents and generate lots of link love, then they flipped the default (from free to paywall) as part of a cash grab. I could check out of Scribd, but then I would break a lot of links and it would take a lot of time. So now I feel trapped. It's a terrible feeling.

Goldman is looking at other options, including Docstoc and Rapidshare. Another one worth checking out could be Slideshare, or even potentially Google Docs. However, all this has me thinking again about the wisdom of relying on third parties for such things (even though I do it myself). I do like the ability to display PDF documents, such as legal filings, embedded within a post, but I'm wondering if there are any simple solutions for setting up that sort of thing on your own server. Anyone know of any?

Reader Comments

Scribd - Seems Obvious

There are a lot of seemingly smart people doing dumb things. Do people really believe that they own their own information when they willingly and freely give it to somebody else that they have no control over? Does this really make sense to anybody?

A lot of room in that space

I'm not sure why the Wordpress model isn't more widely used. You essentially have three options - free and limited hosted at wordpress.com, paid and supported hosted at wordpress.com, or free and whatever you want hosted yourself.

Document hosting or nearly any sort of web application could function the same way. With the cost of cloud storage dropping daily, it seems like someone should be able to make this model work for tons of useful things, like embeddable PDF hosting.

Know of any?

If you don't mind using Flash, then one option is to go with FlexPaper. It's exactly what you want, but though it claims to be "GPL v3," it's really not (you have to display their logo even on modified versions, and you can't use it for free on a commercial site). Might be worth the $70, though.

There's also SWFTools, which includes PDF2SWF, and is completely open source. However, this generates a distinct .swf file for each PDF, so I don't know if it's the right solution.

If you don't want Flash and your site is uses PHP, you might be able to hack something together using Samuraj Data's online coverter and embedding the HTML in an iframe.

Re: Re: Karl

My issue with FlexPaper isn't with the product, which actually looks very good (and worth the $70 that Mike would have to pay).

The issue is that it's supposedly GPL, even though it's not. If you look at the FSF's Categories of free and nonfree software page, it would actually be what used to be called "semifree software," and is now just called "proprietary software."

Re: Re: Re: Re: Karl

Yes, but unlike FlexPaper, Flowplayer allows commercial use, which is a requirement of the GPL. From their FAQ:

I'd like to license my code under the GPL, but I'd also like to make it clear that it can't be used for military and/or commercial uses. Can I do this?

No, because those two goals contradict each other. The GNU GPL is designed specifically to prevent the addition of further restrictions. GPLv3 allows a very limited set of them, in section 7, but any other added restriction can be removed by the user.

(Emphasis mine.)

But I guess you're right about the requirement that the logo stay in place. You learn something new every day, I guess.

We're probably just picking nits at this point. FlexPaper seems like a good program, so even if it was proprietary, it would be worth using IMHO.

You can use it for commercial use but you must then release it under the GPL-V3. If you want a different license that allows you to use it for commercial use and keep what you made a secret then you must buy that different license.

Same thing if you want a license that allows you to bundle it with proprietary software.

"This is the appropriate license to use if you intend to bundle or ship FlexPaper as part of a product."

It's released under the GPl-V3, you can do whatever yo want with that provided you maintain the license because the license requires that you do so. If you want a different license, if you want a license that allows you to do something without maintaining the Gpl-V3 license, then you must pay.

Re: Re: Re: Re: Karl

The logo can always be freely removed and redistributed under the GPL license (but then you must give it a different name so that people know it's a mod). I see no good reason to do it but a copy under that license can be.

embed

"Can Authors and Publishers distribute their works under the settlement for free, under a Creative Commons license or otherwise?
Yes. Rightsholders are free to set any price for their work including the ability to distribute their work free of charge. If you are interested in distributing your work for free, including under a Creative Commons license, then you should claim your Book on the Claim Form and, on the “Manage Your Books” page, fill in the box asking you to specify your sale price for the book at “zero.” In the future, the Claim Form will also provide an option for you to offer your Book under a Creative Commons license, and you should check the Claim Form periodically for that option to appear. The Registry will inform Google of your request, and Google will include information on its web site so that end users are aware of the licensing terms chosen by you. Rightsholders are also free to authorize Google directly to distribute their book through a Creative Commons license."

OBJECT tag?

I do like the ability to display PDF documents, such as legal filings, embedded within a post, but I'm wondering if there are any simple solutions for setting up that sort of thing on your own server. Anyone know of any?

Does anything prevent you from storing the files on your own server and using an OBJECT tag in your posts?

Re: OBJECT tag?

Does anything prevent you from storing the files on your own server and using an OBJECT tag in your posts?

The fact that users must have the Acrobat plugin installed. Naturally, this causes browser incompatibility issues. See the "Compatibility" section of the PDFObject guide.

Incidentally, PDFObject seems like it would be useful if you want to go this route, as it gets around most browser limitations using JavaScript.

But I should note that I have Acrobat installed, and I can't view the PDF in my browser (Chrome), even using PDFObject.

There's also one other, possibly major, drawback: No search engine will index anything in an OBJECT tag. Of course, that applies to Flash as well. If that's a worry, you'd have to convert the PDF into HTML before displaying it.

Re: Re: OBJECT tag?

Lots of people disable Adobe Acrobat from automatically opening documents in the page because of all the vulnerabilities and critical bugs it has.

Plus, it loads very slow and would annoy users if you embed 10 pdf files on a single page.

Flash is more reasonable as I can just use the Flashblock extension for Firefox to block all flash on the page and, if I'm interested in seeing the PDF file, I can just click on the flash icon for that object and unblock it without reloading the page, and I can then see it loading in the Flash object on the page.

Scribd Alternative

Use http://www.notelog.com/ if you're looking to post and share your docs. If your used to using scribd this is the best alternative because your technically still using scbrid on this site. The site is academic based, but anyone can create an account outside of academics by creating an expert account. It's absolutely free...