Measuring the size and state of the commons

Get the newsletter

At its heart, Creative Commons is a simple idea. It’s the idea that when people share their creativity and knowledge with each other, amazing things can happen.

It’s not a new idea. People have been adapting and building on each other’s work for centuries. Musicians sample beats from each other’s music. Artists create entirely new works from other people’s images. Teachers borrow each other’s activities and lesson plans. Scientists build off of each other’s results to make new discoveries.

We believe that sharing—sometimes it happens in an instant, sometimes it spreads across generations—is how society grows, how culture develops, and how innovation happens. We also believe that copyright can often get in the way, usually without the copyright-holder’s intention. That’s why we created the Creative Commons licenses.

Millions of creators around the world use CC licenses to give others permission to use their work in ways that they wouldn’t otherwise be allowed to. Those millions of users are the proof that Creative Commons works.

But measuring the size of the commons has always been a challenge. There’s no sign-up to use a CC license, and no central repository or catalog of CC-licensed works. So it’s impossible to say precisely how many licensed works there are, how many people are using Creative Commons licenses, where those people are located, or how they’re using them.

With this report, we’re taking a big step toward better measuring the size of the commons. We’re also sharing all of the data and methodologies that we used to find these numbers, and making a commitment to hone and update these findings in the months and years to come. We’re also telling the stories of events from 2014 that have impacted the size, usability, and relevance of the commons.

The size of the commons

How did we arrive at these numbers? Google provided us with the raw data, counting all of the websites in its cache that link to Creative Commons license deeds, which we used to make the estimates in this report. While pages may link to Creative Commons license deeds for reasons other than to license or attribute works under them, we reason that those are vastly outnumbered by pages that indicate a CC license choice without linking to the deed.

We’ve supplemented Google’s data with that of several websites that each have over a million CC-licensed works but aren’t reflected in Google’s data (see notes).

We’re also able to track usage data for the CC license badges. We found over 27 million license badges served in a single day. That’s even more amazing when you consider that that’s just a subset of all CC-licensed content: it covers only sites that hotlink our badges directly.

Even if we had access to unlimited data about how and where CC licenses are used, it would still be very difficult to establish a single number for all Creative Commons–licensed works. Where do we draw the line between one work and another? Every time a piece of content is reused on the internet, should it be counted again?

That said, by drawing these estimates as precisely as we can, we’re creating a baseline that can be useful for comparison over time.

Since Google’s numbers are intentionally conservative—and since there’s more variation in how licensors mark CC-licensed works than a search can account for—all of the numbers in this report should be considered lowbound estimates.

Today, there are over 882 million pieces of CC-licensed (or CC0) content on the web. Roughly 56% of that content is shared under CC tools whose terms allow both adaptations and commercial use (we commonly refer to those as free culture licenses).

In 2010, about 40% of CC-licensed works were under free culture licenses. The increase since then reflects the growing diversity in how and where CC licenses are used; we think it also speaks to a license user base that’s discovered the value and power in sharing work more openly, including national governments and major foundations.

It’s also striking to note that a full 76% of works counted allow adaptations, and 58% allow commercial use.

License breakdown

CC0: 4%

CC BY: 19%

CC BY-SA: 33%

CC BY-ND: 2%

CC BY-NC: 4%

CC BY-NC-SA: 16%

CC BY-NC-ND: 22%

This is an excerpt from the State of the Commons report from Creative Commons. For the full contents, see the report on the Creative Commons website.

1 Comments

I don't quite agree with your statement that actual CC content "vastly outnumbers" false positives if you only look at links. I have done research on blogs in German, and manual verification revealed that only 70% of the pages selected by detecting links to CC license deeds were actually licensed under CC terms. For example, it can happen that a given image bears such a link, because it is actually from another source, whereas the rest of the page is either copyrighted or cannot be considered to be under CC license.
That said, I found a comparable distribution of licenses, with an impressive amount of NC-ND restrictions.
An article summarizing the search for content and the manual verification step is available here: "For a fistful of blogs: Discovery and comparative benchmarking of republishable German content" https://hal.archives-ouvertes.fr/hal-01083750

Footer

The opinions expressed on this website are those of each author, not of the author's employer or of Red Hat.

Opensource.com aspires to publish all content under a Creative Commons license but may not be able to do so in all cases. You are responsible for ensuring that you have the necessary permission to reuse any work on this site. Red Hat and the Shadowman logo are trademarks of Red Hat, Inc., registered in the United States and other countries.