Creative Commons’ New Search Tool is Now in Beta, Pulls CC Images from Multiple Sources

If you’ve been wearing out Unsplash images on your blog, it’s time to take another look at Creative Commons. The site has just launched the beta of its new multi-source search interface. Unlike the current search tool, which will only search one source by sending the visitor offsite, CC Search loads the results from multiple sources onsite.

The Commons includes approximately 1.1 billion works in various formats – literary works, videos, photos, audio, scientific research, and other formats. As half of these works are estimated to be images, the prototype for the new search tool focuses on this format.

“Our goal is to cover the whole commons, but we wanted to develop something people could test and react to that would be useful at launch,” Creative Commons CEO Ryan Merkley said. “To build our beta, we settled on a goal to represent one percent of the known Commons, or about 10 million works, and we chose a vertical slice of images only, to fully explore a purpose-built interface that represented one type but many providers.”

In addition to the new search interface, the beta includes social tools that allow users to curate and share their own lists, add tags and favorites, and save searches. One-click attribution is built in, making it easy for users to properly attribute the works.

As Creative Commons is a small organization and fairly lean on resources, the new search was built by a single contractor over seven months. Software engineer Liza Daly was selected to research and build a proof-of-concept for CC Search, a project which she understood to be “a front door to the universe of openly licensed content.”

“CC Search is meant to make material more discoverable regardless of where it is hosted,” Daly said. “For this reason (and for obvious cost-saving objectives), we decided to host only image metadata — title, creator name, any known tags or descriptions — and link directly to the provider for image display and download. A consequence of this is that CC Search only includes images which are currently available on the web; CC is not collecting or archive any images itself.”

Daly built the search feature on AWS cloud infrastructure using Python, Django, Postgres, and Elasticsearch. The beta has estimated hosting costs of $1,400/month. She opted for Python, because she was most familiar with it.

“As the prototype evolved, we decided the opportunity for an engaging front door to the Commons lay in curation and personalization,” Daly said. “Because of its dedicated maintenance team and frequent patch management, I chose Django as the web framework.” She chose Elasticsearch over Solr (and other options) primarily because of the AWS’s Elasticsearch-as-a-service.

“CC Search is not, at this time, a particularly sophisticated search application; image metadata is relatively simple and when dealing with a heterogeneous content set from a diversity of providers, one tends towards a lowest-common-denominator approach — our search can only be as rich as our weakest data source,” Daly said. “There is much to be improved here.”

Daly also described an interesting idea for adding a blockchain-type architecture that would record licensing transactions, sharing, and gratitude in a distributed way. This idea falls outside of the scope of the MVP but may be something the project’s future developers will consider when implementing the final version.

“A long-term goal of this project is to facilitate not only search and discovery, but also reuse and ‘gratitude,'” Daly said. “A frequent complaint about open licenses in general — both for creative works and software code — is that contributing to the commons can be a thankless task. There are always more consumers than contributors, and there’s no open web equivalent to a Facebook ‘like.'”

Other future improvements that the team will consider based on user feedback include adding more content partners, more tools for customizing lists, allowing users to search from their own curated material, and giving trusted users the ability to push metadata back into the collection. Search filters may also be expanded to allow for searching by color, drilling down into tags, and searching public lists.

Yea that’s a fantastic idea actually but I wonder what sort of intelligence could be put in place to keep the CC repo from becoming cluttered up with non-useful things are already cluttering up media libraries everywhere. Logos, documents, etc. Maybe something as simple as only grabbing .jpg files (usually photos) which are of a minimum size and color count?

I’m not sure that the WordPress media library default functionality is set up in a way that would match the type of “simple image searching” they’re talking about. Flickr and other tools mentioned above have much more robust ways of handling those metadata fields and APIs for search tools that are already well tested.

Also, those sites are usually better positioned to host full size images than WordPress sites, where optimizing images to reduce load times is likely to be more helpful for overall performance.

I admire their effort.
1. I am curious how they cover the monthly hosting cost of $1400?
2. The search could be more intelligent. When I search for “Pokemon GO”, I get many pictures of the Chinese GO board game.

They have been doing fantastic work in the field of non-commercial and fair re-use attribution licenses for, well, forever. Think of it as a craft-your-own open source license specifically geared toward intellectual property and creative work.

If you aren’t familiar with them, you should be.. and definitelly consider supporting them.

Creative commons is as important to artists and creators as the GPL is to WordPress developers.

As a librarian and photographer who offers WP managed hosting at the school where I work and helps faculty & students learn how to do image research, this is great news!

I already use Flickr’s creative commons search / faves / tags extensively – most of the 12K+ photos on my own account are posted with a CC BY/NC license. Their search does have some features this may not have – Government image licenses, for example, or narrowing by color. But this will be a great tool to have available to people as I try to explain image licensing. :)

I’m not sure the comments here are noting that this does not use any WordPress sites as a source of images, only existing image databases:

CC Search currently pulls CC-licensed images from Rijksmuseum, Flickr, 500px, the New York Public Library, and the Metropolitan Museum of Art.

I used CC back in the day, but it became a pain to manage, as images would have their copyright changed by the owner (nothing wrong with that, but a PITA to manage and replace), or images would disappear altogether (leaving a nice “image not available” message on your post).

I use a mix of Unsplash and Pixabay (moreso the latter), and am more than happy with both quality of work and quantity of choice on display.

Just curious, what sort of tools were you using to find the images? If you were Hotlinking to any images on someone else’s server, that’s more likely to be problematic for the image not found issue regardless of license, since a lot of site hosts don’t like to encourage that sort of thing.

If you were downloading images and putting them into your own media library that doesn’t erase the risk of a copyright owner changing their mind about CC licenses, but I usually note where I found it and the date and don’t worry about it as much if I was honoring the license correctly when I got it. Have to do that if you’re using images for featured images or social thumbnails, anyway, since you can’t get the right crops or thumbnails if they aren’t processed through the media library.

Using images from Flickr Commons or other public domain sources like the ones you’ve mentioned also helps, much less likely to have issues that way.

Hi Emily, I was using a Flickr plugin for WordPress (previously Zemanta) that sourced the images from a search in your WP dashboard, and then you could drop the image in (so it was hosted elsewhere).

What I like about the Pixabay option is I can support the artist directly – use their work, and make a PayPal donation if/where applicable. I also find the quality better, as in more choice and not limited to stock-type photographs, which I often find is the case with CC.