Crunchbase meets the Semantic Web

Summary:Technology web site TechCrunch is one of those staples (like ZDNet, of course) to which we all turn for news and analysis on the companies shaping the Web. Their CrunchBase directory provides a wealth of information on the companies and people featured in their stories (and elsewhere, as it's editable by anyone), and they recently took the step of opening up an API to the data.

As he reports on his blog, Benji has created Semantic Crunchbase, an expression of the Crunchbase content as that 'Linked Data' about which Sir Tim Berners-Lee and others are currently so passionate. Remember,

“Linked Open Data is the Web done as it should be.”

Benji is continuing to add features to his demonstration, and will be blogging some of them (including the intriguing-sounding 'Pimp my API') in future posts to his blog.

"Imagine [writes Nowack] you are looking for a job in California at a company that is at a specific funding stage. CrunchBase knows everything about companies, investments, and has structured location data. CrunchBoard on the other hand has job descriptions, but only a single field for City and State, and not the filter options to match our needs."

"This is where Linked Data shines. If we find a way to link from CrunchBoard to CrunchBase, we can use Semantic Web technology to run queries that include both sources. And with SPARQLScript, we can construct and leverage these links. Below is a script that first loads the CrunchBoard feed of current job offers (only the last 15 entries, due to common RSS' limitations/practices, the use of e.g. hAtom could allow more data to be pulled in). In a second step, it uses the company name to establish a pattern join between CrunchBoard and CrunchBase, which then allows us to retrieve the list of matching jobs at stage-A companies with offices in California."

For more information on Benji, listen to a podcast interview he did with my colleague Danny Ayers earlier this year.

Paul has been involved with the web since its earliest days, addressing issues of technology and policy most recently at Talis and previously in a range of public sector positions.
At The Cloud of Data, Paul provides consultancy and analysis services to a wide range of clients concerned with the implications of the Semantic Web and Clo...
Full Bio

Disclosure

<p>Paul Miller was previously employed by UK-based Semantic Web technology company, Talis. Other than this relationship with Talis, Paul does not own stock or have past or current financial interests in other companies discussed in this blog. Paul’s work brings him into direct or indirect relationships with several of the companies that he writes about, or their competitors. Paul is committed to delivering independent reporting and opinion, and does not enter into agreements that would limit his freedom of expression.</p>