creativecommons

August 05, 2006

You are really on to something. There is something big going on here and a lot of these talks are about trying to figure it out. At the beginning of the talk I'll talk about things I know about and at the end talk about things I don't know -- how is that for a VC pitch?

I'll talk about open source, open content and the rise of the technical non-profits. Universal Access to All Knowledge is a big goal, and if you accomplish it, what are you going to do? Move to Florida?

Wikipedia is the 15th most popular site on the web. This is because of the enlightenment goal. The goal isn't a technical one, but a structural one. Despite centuries old balance...1976 radical expansion of US Copyright Regulation. Property of IP is perhaps the worst idea since the Domino Theory. Information is knowledge, not property. Valenti's crowning achievement radicalized copyright regulation. Most people talk about 130 year protection, but it is the vast scope and repercussions.

First casualty was software. The response was Open Source licenses. MIT's sale to Symbolics, which forked development and RMS' experience lead to open source. This is Brewster's revisionist history, but it may be where it came from.

The second casualty was Music and Video. The response was Creative Commons licenses. Another response was organizations to facilitate community effort. We lost the help of institutions like MIT so we built new ones. The Free Software Foundation. DejaNews was a for profit, sold to Google, dissapated. IMDB, 6 guy community project was bought by Amazon. CDDB became Gracenote, Inc. WAIS Inc was sold to AOL. FTP Software sold to NetManage. Cygnus sold to Redhat. All commercial companies built upon community effort that don't last long. FSF is still around.

The response is the rise of the technical non-profit. Apache software foundation has no full-time employees, but is incorporated to last a while. OASF has gotten money not only from Mitch but from Foundations. Mozilla Foundation is a great success with Firefox and the Google toolbar (money) they spun off a for-profit company. Interesting ecology to watch and try and understand what it means. Linux. Internet Archive is based on the open access model -- can we get paid for the administration we do so everything we do can be openly accessible. Wikimedia foundation you know about. The rise of the technical non-profit is an interesting addition to the ecology, we went wrong with the over-corporatization post WWII. EFF., Public Knowledge and Open Content Alliance exist to enforce rights and serve us. We massively screwed up our law structure and the general approach of knowledge of property.

Open Hardware. Petabox, a cheap machine that is open sourced. The $100 Laptop Program has interest in the order of 5-10 million. What would happen if the next major laptop company is a non-profit? It is because they are non-profit that they are trusted and base on open work.

The structure is now in place to proceed towards Universal Access to All Knowledge. We have institutions dedicated towards these goals, but how are we doing towards it?

In Text, getting the 26-28 million books in the Library of Congress. 1 megabyte for a book, 26 terabytes, $60k cost for the entire library on a Linux machine. But I actually like books, the printed page. Created the mobile bookmobile, which has printed a million books. The cost is a penny a page, a buck a book means you can give books away. In our first debut of this was the supreme court when they were arguing to extend copyright another 20 years, but we lost that one. Erik Eldridge has one, two in India, one in Egypt, one in Uganda... this gets closer to universal access, but what we realized is we need to scan more books. One way to do this is send them somewhere else. The Million Books project sends them to India, but we had to buy 100k books to send to them, but not many others wanted to send books to them. So the Indians were scanning their own books, which may be the right thing for them to do. Put the scanners next to the books. Sending to India to scan is $10 per book, in the US it is $30. The automatic scanners are not effective, so we made our own scanner and can do it at 10 cents a page. Scanning 400 books a day. $750 million dollars to digitize the Library of Congress. About a year an a half of the LoB budget.

Books are within our grasp technologically. There are issues about if it will be done by non-profits or projects like Google Books. We have an orphaned works problem. The way you ask a question in the US is through a lawsuit, Khale and Eldgrige. But if you get to frame a problem (orphaned works) you have already won. Who would forget the orphans? Give the orphans a home!

Next is in-print works. Amazon is working the other way, from print to out of print. We have found with the Open Content Alliance something that works. Even Microsoft is giving us money.

In Audio, if you take all the published works, there are 2-3 million musical works. A fairly litigated area. Some precedent that ripping them and putting them online might not be okay. A lot of musicians just looking to be put on the internet. The Grateful Dead allowed people to trade music. The key was, as long as no one was making any money. This allowed people to feel good about it. Legitimate bootlegging copies by other bands. So we went to this community and said: "would you like unlimited storage and bandwidth for free." They said, "we don't believe you." And they didn't like lossy compression. We said try us. Got lots of Okays. 2k bands, 30k recordings, everything the Grateful Dead played. Many versions of each concert, as there are debates over microphone types. If you give something for free, not only is it not taxed, but you get a tax rebate. Getting Slashdotted is a nightmare, your ISP bills could make you sell your guitar or house. Europe has a different copy-write scheme for performances (50 years), so we are working with the Dutch government to make old stuff free.

In Moving Images, 100-200,000 films. Not much, makes putting them online conceivable. We want to do this with DVD quality, but we are finding lots of archival films that never had distribution. Have 30k films on the Archive, dwarfed by YouTube, which is cool. Discovering genres like Lego Movies. Lots of these things end up in closets. Putting them online is $15 per video hour. We will host it, if it generally belongs in a library and it is okay to share it.

Television, we have a big Tivo, captured a Petabyte so far of 20 channels over a couple of years. We made one week available, the week of 911, we put online a month after. We are now understanding in the US that the news comes with a point of view. Chomsky used to say you should read 7 newspapers a day, recently this might make sense. Getting multiple points of view.

Television is technologically possible, there are some rights issues, but we could do it all -- all text, music, movies and TV is within our grasp. We got a change in the DMCA, yea! But we need a lot of help.

Web. We are best known from our web collection, about a Petabyte in size. In the history of libraries, they tend to get burned, usually by governments, and then they are sorry for 100 years, but it is too late. The lesson from the Library of Alexandria is don't just have one copy. Give copies away. Our first shot at this was with the Library of Alexandria version 2. If we had six or seven of these around the world I could sleep at night. We are trying to do this through large scale swap agreements.

Here is Wikipedia in the Archive. But most people are using it to look at their own stuff, their old websites. One of the reasons this is working is because we are non-profits.

Books, Music, Video, Software and Web -- it is all possible. Some open questions if it is public or private, for-profit or non-profit. Is Google the only shot we are going to have at scanning Harvard's library? Looks like it.

I'm going to use this opportunity to advertise some projects we need help on.

Non-profit Open Networks like SeattleWireless.net or MIT roofnet. Telecom company interests are not aligned with an open internet.

Open and transparent Web Search System -- Nutch. Let's build some alternatives and be more creative. Recall which does time-based search on the whole Archive, a project done by one woman that indexed more pages than Google, then she went to work for Google and hopefully she will come back.

Privacy and Anonymity. It is now known that the US Government is monitoring us. Tor.

Defensive Patent License. What if you did a GPL for Patents? The DPL is a license that reflects a public commitment to defense, so our patents are forever defensive. Any organization may freely use these licensed patents while so publicly committed to defense.

An Open Textbook system, started by Wikipedia. The number one request we get for books is textbooks.

Add Attribution to Wikipedia. Gutenberg guys didn't were nervous about the copyright thing. We should know where the facts from Wikipedia came from. Go read about Transclusion with Ted Nelson, backpointers. Richard Feignman, a physicist in 1982, was talking about how many layers it would take from Propedia to Micropedia to books as sources.

Open Library: annotate the book collection. Why is this book interesting to someone in the modern world. What can we do to re-inject old books into today?

We can pull off Universal Access to All Knowledge. This is where Wiki is going towards, one of the great things that humanity will be remembered for, up there with a Man on the Moon in the mythology of humanity.

April 20, 2006

Allow me to further simplify the Buy Side Publishing model. The most efficient part of the content business isn't in how or what they produce, nor how they distributed it, but how they make money. Today the embraced commoditization is in advertising, with standardized metrics such as CPM. But this makes money through directed attention, not directly from content. To that, with the balance between freedom and profit motive required in a modern business model, you simply:

Apply CPM, and other standardized metrics developed for advertising, to content

Build upon the Creative Commons framework to ensure reuse without DRM under such commercial terms

This fills in the grey area between Commercial and Non-Commercial, or rather, let's you define Commercial use along with terms. Maybe this is an over simplification, but picture this content universe...

But picture this post with a discoverable watermark that bakes in these two terms, with a CPM of $10 communicated to the clearinghouse each time the invisible .gif is impressed. Say you read it and like it, fair reader and writer, and decide to republish it on your site.

Someone else grabs it from my blogs and remixes it into a commercially minded remix.

Now picture someone finds it on your site, and thinks it would be a perfect complement to a Sell Side Advertising ad that is starting to take hold as a meme.

Suddenly, as a publisher, I make money from all three transactions without the one-off transaction costs that plauge old notions of syndication.

I happen to think this is a model that not only unlocks value, but discovers it.

But Ross, you assume that anyone would pay for content when they can
link to it. Not sure that's a valid assumption. What am I missing?

Commercially viable remix use cases.

For example, search and aggregation are limited to fair use cases
today. Google scrapes and indexes an entire page, but only presents a link and summary on their own site. What business models could they come up with going beyond fair use? Or take more traditional media and their reliance on newswires as fodder. What if they could efficiently syndicate diverse content sourced online into print? Or from the initial publisher perspective, is there content you want to offer openly for non-commercial reuse, but also not restrict commercial use so long as you get paid?

November 16, 2005

The lawyer in me may be coming out, but I'm fascinated by the Google Print project and I'm going to watch the webcast of the New York Public Library/Wired Magazine debate.
Titled "The Battle over Books: Authors & Publishers Take on the
Google Print Library Project", the cast of characters is interesting,
including Chris Anderson, David Drummond, Lawrence Lessig, Allan Adler (AAP), David Ferriero (NYPL), Paul LeClerc (NYPL) and Nick Taylor (NYPL)

If anyone else cares to join me, Ross has agreed to open up the Socialtext conference room so I can watch it on a big screen and pretend I was at the sold out event. It starts at 4 pacific...

Read on for links about the debate at the center of copyright and tech, and maybe we'll see you tomorrow.