Please note this is a DRAFT and may change throughout the day (1 June 2011)

On June 17 I will be joining other researchers at a Patent Data Workshop jointly hosted by the USPTO and NSF at the U.S. Patent & Trademark Office in Alexandria, VA. This workshop, supported by the USPTO Office of Chief Economist and the Science of Science and Innovation Policy Program (SciSIP) at the NSF, will bring researchers together to share their ideas on how to facilitate the more efficient use of patent and trademark data, and ultimately to improve both the quantity and caliber of innovation policy scholarship.

The stated goals of this workshop include:

Creating an information exchange infrastructure for both the production and informed evaluation of transparent, high-quality research into innovation;

Creating a distinct community with well-developed research norms and cumulative influence; and

Championing the development of a platform to support a robust body of empirical research into the economic and social consequences of innovation.

Each participant planning to attend this workshop has been asked to prepare a blog post that outlines (a) our understanding of the most significant theoretical or empirical challenges in this space, and/or (b) where the frontier of knowledge is, what innovative things are being done at the frontier — or within reach of being done to solve the set of problems — and where targeted funding could yield the highest payoffs in getting to solutions. The purpose of this post is to offer some of my thoughts based on progress made by linked open government data initiatives in the US and around the world.

Background: The Tetherless World and Linked Open Government DataSince early 2010 the Tetherless World Constellation (TWC) at Rensselaer Polytechnic Institute has collaborated with the White House Data.gov team to make thousands of open government datasets more accessible for consumption by web-based applications and services, including mashups leveraging Semantic Web technologies. TWC has created an infrastructure, embodied by the TWC LOGD Portal, for automatically converting to RDF and enhancing government data published in tabular (e.g. CSV) format; publishing these converted datasets as downloadable “dump files” and through SPARQL endpoints; demonstrating highly effective methodologies for using such linked open government data assets as the basis for the agile creation of lightweight, powerful visualizations and other mashups. In addition to providing a searchable interface to thousands of converted Data.gov datasets, the TWC LOGD Portal publishes a growing set of demos and tutorials for use by the LOGD community.

The Data.gov/TWC LOGD partnership and similar international LOGD efforts, especially the UK’s Data.gov.uk initiative, have demonstrated the value and potential for innovation achieved by exposing government data using linked data principles. Indeed, the effective application of the linked data approach to a multitude of data sharing and integration challenges in commerce, industry and eScience has shown its promise as a basis for a more efficient, agile research information exchange infrastructure.

Recommendation: Create a “DBPedia” for Patent DataThe Linked Open Data Cloud diagram famously illustrates the growing number of providers of linked open data around the world. Careful examination of the LOD Cloud shows that most sources are sparsely linked, and a very few — most notably, DBPedia.org, are extremely heavily linked. The reason is that the Web of Data has increasingly adopted DBPedia as a reliable source or hub for canonical entity URIs. This means that as providers put their datasets online, they enhance their datasets by providing sameAs links to DBPedia URIs for named entities within these datasets. This enables their datasets to be easily linked to other datasets and increases their utility and value as the basis for visualizations and linked data mashups.

Providers embrace DBPedia’s URI conventions as “canonical” in order to make their datasets more easily adopted. Our objective with patent and trademark reference data and research information in general must be to break down barriers to its widespread use, recognizing that we may have no idea how it may be used. Linked data principles and the Web of Data emerging from them have re-written what it means to make data integration easy. Whereas even a few short years ago it was useful to simply provide a searchable patent database through a proprietary UI, next-generation innovation infrastructures will be based on globally interlinked graphs drive by concept and descriptive metadata extracted from patent records, research publications, business publications and indeed data from social networks. Scholars of innovation will traverse these graphs and mash them with other graphs in ways we cannot anticipate, and thus make serendipitous discoveries about the process of innovation we cannot predict today.

My DBPedia reference comes from the idea of identifying concepts and specific manifestations of innovation in the patent corpus. Consider an arbitrary patent disclosure; it can be represented as a graph of concepts and related manifestations. The infrastructure I’m proposing will enable the interlinking of URI-named concepts, not only with other patent records but also scientific literature, the financial and news media, social networks, etc. From a research standpoint, this will enable the study of the emergence, spread and influence on innovation in many dimensions.

Conclusions
The USPTO has already made great strides in improving access to and understanding of patent and trademark data; an excellent example is the Data Visualization Center and specific data visualization tools such as the Patent Dashboard which provides graphic summaries of USPTO activities. These are “canned apps,” however; the next generation of open government will require finer grained access to this data, presented as enhanced linked data and using open licensing principles. As USPTO datasets are presented in this way, researchers will be able to interlink this data with datasets from other sources, resulting in a more effective study of the causes of innovation and indeed the outcomes of government programs intended to stimulate innovation.

To this end, I am challenging our students — and indeed everyone within earshot — to participate in what I’ve dubbed the TWC LOGD Million Dataset Challenge: I’m challenging you to help us create a master catalog of more than 1 million open government datasets from around the world! In return, we’ll make the catalog publicly available through our TWC LOGD Portal, as RDF dumps and via a SPARQL endpoint.

To get this thing started and to make it as easy as possible, I’ve created a Google Form-based interface. Follow the link, add metadata, move on…

I’ve structured the form to accept both catalog and individual dataset entries. Just chose the right options in the form…

There comes a time in life, when you walk away from all the drama and people who create it. You surround yourself with people who make you laugh, forget the bad, and focus on the good. So, love the people who treat you right. Pray for the ones who don’t. Life is too short to be anything but happy. Falling down is part of LIFE…Getting back up is LIVING………Re-post if you agree; I just did

I’ve been trying to trace the origins of this meme using Google and focusing on the quote, Falling down is part of LIFE…Getting back up is LIVING; it seems to have been active on the Web for about a year, especially in the so-called “mommy blogs” and on Facebook.

Update:

My journey has taken me to KnowYourMeme, a site dedicated to tracking trends in Internet culture.

The Fall 2010 semester marked the beginning of the Tetherless World Constellation’s undergraduate research program at Rensselaer Polytechnic Institute (RPI). Although TWC has enjoyed significant contributions from RPI undergrads since its inception, this term we stepped up our game by more “formally” incorporating a group of undergrads into TWC’s research programs, established regular meetings for the group, and with input from the students began outfitting their own space in RPI’s Winslow Building.

Patrick West, my fellow TWC undergrad research coordinator and I asked the students to blog about their work throughout the semester; with the end of term, we asked them to post summary descriptions of their work and their thoughts about the fledgling TWC undergrad research program itself. We’ve provided short summaries and links to those blogs below…

Cameron Helm began the term coming up to speed on SPARQL and RDF, experimented with several of the public TWC endpoints, and then worked with Philip on basic visualizations. He then slashed his way through the tutorials on TWC’s LOGD Portal, eventually creating impressive visualizations such as this earthquake map. Cameron is very interested in the subject of data visualization and looks to do more work in this area in the future.

After a short TWC learning period, Dan Souza began helping doctoral candidate Evan Patton create an Android version of the Mobile Wine Agent application, with all the amazing visualization and data integration required, including Twitter and Facebook integration. Mid-semester Dan also responded to the call to help with the crash” development of the Android/iPhone TalkTracker app, in time for ISWC 2010 in early November. Dan continues to work with Evan and others for early 2011 releases of Android, iPhone/iPad Touch and iPad versions of the Mobile Wine Agent.

David Molik reports that he learned web coding skills, ontology creation, server installation and administration. David contributed to the development and operation of a test site for the new, semantic web savvy website for the Biological and Chemical Oceanography Data Management Office BCO-DMO of the Woods Hole Oceanographic Institute.

Jay Chamberlin spent much of his time working on the OPeNDAP Project, an open source server to distribute scientific data that is stored in various formats. His involvement included everything from learning his way around the OPeNAP server, to working with infrastructure such as TWC’s LDAP services, to helping migrate documentation from the previous Wiki to the new Drupal site, to actually implementing required changes to the OPeNDAP code base.

Philip Ng worked on a wide variety of projects this fall, starting with basic visualizations, helping with ISWC applications, and including iPad development for the Mobile Wine Agent. Philip’s blog is fascinating to read as he works his way through the challenges of creating applications, including his multi-part series on implementing the social media features.

Alexei Bulazel began working with Dominic DiFranzo on a health-related mashup using Data.gov datasets and is now working on a research paper with David on “human flesh search engine” techniques, a topic that top thinkers including Tetherless World Senior Constellation Professor Jim Hendler have explored in recent talks. Note: For more background on this phenomena, see e.g. China’s Cyberposse, NY Times (03 Mar 2010)

Many of these students will be continuing on with these or other projects at TWC in 2011; we also expect several new students to be joining the group. The entire team at the Tetherless World Constellation thanks them for their efforts and many important contributions this fall, and looks forward to being amazed by their continued great work in the coming year!

Since Summer 2010 I’ve had the privilege of working as a research engineer at the Tetherless World Constellation (TWC) at RPI, primarily helping the team in the execution of various projects related to their association with the Obama Administration’s Data.gov initiative. One of those projects is an applet for the Elsevier SciVerse Hub portal. The following is from the description page for our application.

Any user with the ability to search SciVerse Hub can use the US Government Dataset Search application. The application and the government data it exposes are made available free of charge. The US Government Dataset Search application is targeted at both SciVerse end users (researchers) and application developers interested in applying government datasets to their applications. Researchers utilizing SciVerse Hub are able to discover and access contextually relevant data from the US Government. Developers may utilize SciVerse Hub to identify RDF-converted data sets based on the US Government data and access this data in their applications through SPARQL endpoints or retrieve the datasets themselves.

How the US Government Dataset Search application works: For each SciVerse query the user makes, a keyword search across all current Data.gov datasets is made via a SPARQL endpoint at the TWC LOGD portal. A summary of these results is presented on the Hub search results page. Detailed results are presented in tabular form in the ‘Canvas’ (larger) view by clicking on any link. On the canvas view links are provided directly to the Data.gov dataset description pages as well as RDF-converted versions of these datasets at the TWC LOGD portal. Note that faceted search is not available with the application and only the original query in Hub willbe submitted.

This application is optimized for Firefox, Chrome and Internet Explorer 8.

For more information about creating mashups using Data.gov datasets, please check out RPI’s Linking Open Goverment Data (LOGD) Portal at http://logd.tw.rpi.edu

About the TWC Linking Open Government Data project: The TWC LOGD team investigates opening and linking government data using Semantic Web technologies. TWC LOGD actively develops tools for the large-scale translation of government-related datasets into RDF, linking them into the ‘Web of Data’ and providing demos and tutorials on various means for consuming linked government data, including creating mashups, applications and data visualizations. The TWC LOGD Portal was awarded second place (open division) at the 2010 Semantic Web Challenge, held during the 2010 International Semantic Web ConferenceISWC2010.

About the Tetherless World Constellation at RPI: The Tetherless World Constellation addresses the emerging area of Web Science, focusing on the World Wide Web and its future use. Faculty in the constellation lead explorations into the principles that underlie the Web; enhance the Web’s reach beyond the desktop and laptop computer; and develop new technologies and languages that expand the capabilities of the Web. TWC researchers use powerful scientific and mathematical techniques from many disciplines to explore the modeling of the Web from network- and information- centric views. TWC’s objectives include making the next generation web natural to use while being responsive to the growing variety of policy and social needs, whether in the area of privacy, intellectual property, general compliance, or provenance. The Tetherless World Constellation is designing new techniques to explore social, scientific, and legal impacts of the evolving technologies deployed on the Web.

UPDATE: I’m currently developing an iGoogle Gadget version of the SciVerse app, based on the same core queries. A screen shot of the “profile” view of that app appears below. In addition to enabling me to monitor the health of our systems from my desktop, it also enables me to test out possible features for the SciVerse app itself.

Professors and students in a nearby research group have been brainstorming a syllabus for a new, low-level computer science course. Normally I only “lurk” in such discussions, but this time I couldn’t hold my tongue. The following is my contribution, from my perspective as one who has interacted with “computer scientists” as a fellow team member, project leader, hiring manager, business partner and even corporate recruiter (interviewing mostly for other hiring managers).

This version has been edited slightly to make it better suited for a blog…

As an “old guy” who has interviewed his share of CS, CE and EE’s over the years (and hire and/or managed more than a few of them), here are my thoughts from an “outcomes” perspective…

It’s really exciting to work with a developer who groks the concepts to such a degree that specific languages and language boundaries simply don’t matter. Seeing a prototype done in Erlang because it was perfectly suited is SO much better than listening to whining over how it is hard to do it in Java or C# or Visual Basic N. They are usually curious about everything; the dude that coded a prototype NoSQL-style data store for our team in Erlang had been playing with it for a few months, “just because…”

Methodical problem solving matters. Which some would equate to Engineering(tm). But really it’s about gaining a ton of experience attacking problems. The number one thing I’ve looked for over the years is actual experience — through project work, interesting course projects, and esp. internships — in completing cool projects. And please, don’t wait to be assigned; always look for problems, and just do them.

Join the software ecosystem. The most impressive developers I’ve met over the years — some are currently undergrads at the Tetherless World Constellation at RPI — understand how to contribute to software ecosystem(s); usually this is through the open source community. They understand the tools, they understand how to engage with other developers, they understand how to analyze and improve other people’s code.

Here’s one way to think about it: if you aspire to be a professional musician (or artist), chances are you’ve participated in the “music ecosystem” in a wide variety of ways for many years, even before entering college. The best developers I’ve met — and those “computer scientists” who are developers at heart — have done the same (one guy I know built his first Linux kernel when he was in middle school).

Understand systems end-to-end. Now we’re back to the topic at hand 😉 The best contributors over the years have been those who had hands-on experience with absolutely every aspect of the “system.” This doesn’t mean going From Relays to Twitter in 10 Weeks, but it does mean understanding the relationships between all system elements.

I doubt very much that this is a problem for anyone on this list, because the very nature of PKI work requires one to have just this sort of broad and deep knowledge; plus, your professor and I have had a few conversations about this over the years…BTW, my daughter’s now at Southampton working on her Ph.D in numerical relativity and writing code on a supercomputer cluster 😉

There’s an increasing variety of data available as Linked Data coming from a range of different sources. I’m wondering what indicators we might use to judge the “quality” of a dataset…Clearly quality is a subjective thing, but I’d be interested to know what factors people might use to indicate whether a dataset was trustworthy, well modelled, sustainable, etc.

For starters, I think we can all agree at the highest level that the measure of data quality is subjective and that “beauty is in the eye of the beholder”: the quality of a dataset is measured by its fitness for use in specific applications. This question of determining and disseminating “fitness” scores is the rub!

In his answer to Leigh’s question, Tim Finin proposes adopting a PageRank-like mechanism, “LODrank” based on measured usage

We could define LODrank as a PageRank-like measure that was a function of the number of links to/from other LOD datasets weighted by their LODrank. Alternatively, it might divided by the number of linkable instances in the collection, so that large datasets did not have an advantage…

This approach scores data quality based on observed fitness as evidenced by discovered use and has the advantage of automation.

My replies went in a different direction, focusing instead on the subjective nature of data quality and the need to aggregate consumer-space rankings of datasets across a set of dimensions. In his 2005 white paper Principles of Data Quality [1] Arthur D. Chapman writes,

Data quality is multidimensional, and involves data management, modelling and analysis, quality control and assurance, storage and presentation. As independently stated by Chrisman [2] and Strong et al. [3], data quality is related to use and cannot be assessed independently of the user. In a database, the data have no actual quality or value [4]; they only have potential value that is realized only when someone uses the data to do something useful. Information quality relates to its ability to satisfy its customers and to meet customers’ needs [5]

Chapman goes on to enumerate a set of factors that contribute to fitness-for-use, citing Redman [6]:

Accessibility

Accuracy

Timeliness

Completeness

Consistency with other sources

Relevance

Comprehensiveness

Providing a proper level of detail

Easy to “read”

Easy to “interpret”

Each of these factors is fundamentally subjective, even if mechanisms exist within particular domains to take their measure “objectively.” Indeed, in some domains such ratings might only be done by humans, either through voting mechanisms or by individual reviewers.

I believe the greater linked data community needs to develop vocabulary terms for expressing metrics for data quality — consider the ten points above — and then within individual communities develop agreed-upon means to determine those values. Arguably this is a “Dublin Core” approach to the problem, in the sense that terms like completeness or consistency would be reused across domains with inherently different domain-specific meanings, but such reuse would facilitate consumers from other communities choosing datasets from outside their expertise. A non-physicist might then say, “The physics community says this dataset is accurate, by their measures.”

Some of these factors are even more deeply subjective and must be evaluated dynamically, based on the consumer’s immediate context. An example of this is relevance, which could be interpreted as equivalent to a recommendation.

A recent set of articles in the New York Times and elsewhere, including the Kurzweil book, prompted a friend to ask me for my thoughts on the Singularity Movement. Here is an excerpt of the email I wrote:

Regarding the Singularity Movement, I think economic arguments such as that presented by Robin Hanson in IEEE Spectrum (2008) carry more weight than the gushing futurist predictions from the likes of Ray Kurzweil. In the Spectrum article Hanson cites two previous singularities — the agricultural and industrial revolutions — and suggests that a revolution in machine intelligence is leading to a third that will take shape over the next half-century.

I tend to take most of what futurists say with a grain of salt, because they rely on a belief/assumption/confidence that the introduction of disruptive technologies into a society yields predictable results — for good or bad — which never happens. The combination of factors including technologies being human constructions, the fact that we as humans never make completely rational decisions, and the fact that all of this takes place within a fundamentally chaotic, only approximately predictable context, means that we simply cannot know what will happen in the future!

Here’s what I know: We humans are wired to build and use tools and, to the extent possible, adapt to the environments we build — or die trying. Google, while amazing, is still a tool; an engineered system that (given enough time) I can explain to you. Ironically enough, the reason Google works so well is because it’s actually based on simpler, but more fundamental principals than the systems which preceded it, closer to how naturally-occurring networks emerge and function. But the way Google has been adopted and applied in the “ecosystem,” while making sense in hindsight, could not have been predicted.

I’m currently reading Jonah Lehrer’s How We Decide, a wonderful exploration of the biochemistry of how we make decisions. Any such discussion naturally much touch on how various imbalances (e.g. dopamine, etc) effect that process, and how well-intentioned efforts by doctors to counteract certain imbalances leads to very unexpected and usually undesired results.

Lehrer’s book makes it profoundly clear that we never know for certain what will happen when we diddle with the decision-making processes in our brain, whether it involves extending the lower levels of the nervous system (the sensory level) or the higher level processes. Researchers do know that we seem to adapt well to lower-level, e.g. neural prosthetics, but each higher-level process involves a synaptic algorithm that we don’t completely understand — mostly because our brain is a distributed system, not a single “algorithm,” whose “result” is emergent.

That ultimately is my point: our brains are distributed systems that exhibit adaptive and unpredictable behaviors, and we can’t begin to understand what will happen when we explore higher-level prosthetics based on “intelligent machines.” Something will happen, but there is no reason to believe it will lead to either a Utopian or Dystopian existence any more than the agricultural or industrial revolutions resulted in one or the other. Indeed, the introduction of those practices to certain natural and economic ecosystems led to both regional successes and catastrophes.

On Intelligence, the companion site to Jeff Hawkin’s provocative book by the same name. The book introduces the concept of Hierarchical Temporal Memory (HTM) based on a layered hierarchical model of how the neocortex functions.

Recently the King Arthur Flour Company, a global provider of quality baking supplies based in my home town of Norwich, Vermont, proposed an expansion that would include a sewer extension. This issue is being debated locally, and I thought would provide good fodder for my blog…John Erickson

Since Jill and I moved to Norwich some 18 years ago, I’ve been troubled by what seems like a lack of support for sustainable economic development within our town. I’m proud that Norwich has a high-quality global company “like” King Arthur based here, a company that is employ-owned, successful and growing. At the same time I’m embarrassed that Norwich isn’t doing more to sustain the economic well being of the Upper Valley.

15 years ago this month partners and I began the process of launching a company called NetRghts. Loving Norwich and Vermont, I had a vision of starting a sustainable high-tech company that would be based here and would create local jobs. The inevitable question of where to base our company arose; being the Vermonter in the mix and drinking from the KoolAid of iconic successes like Green Mountain Gringo, I argued for us to set up offices in Norwich, Wilder or WRJ. My co-founders thought this was ludicrous; not only did they envision the (obvious to them) negative tax implications, but they also perceived no end of difficulty with infrastructure, etc. Since they had been successful with a previous Lebanon-based software startup, I went along for the ride and we set up shop in downtown Lebanon.

But I wouldn’t give up that easily. At one point Vermont eTV — remember them? — had a call-in with Gov Dean’s youthful, energetic director of economic development. Vermont had recently provided incentives for ETI’s expansion, and my direct question to “Slick” was: what can Vermont do to keep companies like ours in Vermont? Or, were my co-founders right, there (weren’t) any incentives to lure us to Vermont. His answer: regrettably, yes, my co-founders were right. If we needed money for bricks-n-mortar expansion to grow a widget-building business, yes, but since we were “knowledge-based,” nothing. Frankly, I was shocked, since this was during the same period that Gov Dean (who I’m a fan of!) was roaming the state advocating green high-tech businesses in cabins on mountaintops…

I’ve bored you with this ancient history in order to provide some context as to why I believe the citizens of Norwich should greet initiatives such as King Arthur’s with the question, what can we as neighbors do to help? Their opening proposal may or may not be ideal — I’m not saying “Roll over, little Norwich!” — but I do believe it is our responsibility to do what we can to foster economic development in this town, and this includes hearing their plans with an open mind.

I’m tired of Norwich not merely depending on, but assuming that other towns in the region will feed our hungry, host our homeless, pay our salaries, sell us our auto parts. Instead, we should be asking how we can help those among us with the initiative to bring it on home to Norwich…

Disclaimer: I am not affiliated with King Arthur Flour, but I do confess to loving their products and have been known to roam their jobs portal from time to time…

Chris Anderson’s newest book FREE: The Future of a Radical Price received some attention this summer, but I’ve actually been meditating on principles he laid out three years ago in his blog post, Scaling up is good. Scaling down is even better. In that post he marveled at Google et.al.’s ability to scale down, to run themselves efficiently enough to serve users who generate no revenue at all. Anderson’s principles are guidance on approaches to conducting business such that even if only a tiny percentage of ones visitors “convert” into paying customers, by ensuring this small percentage is of a very large number one can still achieve big-time profitability.

My goal with this post is to consider how these ideas might be applied to the domain of Linked Data, and specifically how they pertain to the provision of unique data that adds real value to the greater “Web of Data.”

In his blog Anderson gives us four keys to scaling down: Self-service, “Freemium” services, No-frills products and Crowdsourcing…

1. Self-service: give customers all the tools they need to manage their own accounts. It’s cheap, convenient, and they’ll thank you for it. Control is power, and the person who wants the work done is the one most motivated in seeing that it’s done properly.

“Self-service” applies to linked data services in oh-so-many ways! Self- service in this case is not as much about support (see “Crowdsourcing,” below) as it is about eliminating any and all intervention customers might need to customize or specialize how services perform for them. In principle, the goal should be to provide users with a flexible API and let them figure it out, with the support of their peers. Ensure that everything is doable from their side, and step out of the way.

Note #1 (29 Mar 2010): A great recent example of this is the OpenVocab Project, launched by Ian Davis of Talis. OpenVocab “enables anyone to participate in the creation of a open and shared RDF vocabulary. The project uses wiki principles to allow properties and classes to be created in the vocabulary.”

The (negative) corollary is this: if an organization must “baby sit” its customers by providing specialized services that require maintenance, then they own it and must eat the cost. If instead they allow specializations to be a user-side function, their users own it. But the users won’t be alone; they’ll have the support of their community!

Note #3 (29 Mar 2010): Derek Gordon just pushed out a great piece, The Era Of APIs, that argues “APIs are at work reshaping the ways in which we understand search today, and will challenge our profession to stretch, grow and change significantly in the coming years.”

3. No-frills products: Some may come for the low cost, others for the simplicity. But increasingly consumers are sophisticated enough to know that they don’t need, or want to pay for premium brands and unnecessary features. It’s classic market segmentation, with most of the growth coming at the bottom.

In the linked data world, achieving “no frills” would seem easy because by definition it is only about the data! For linked data a “frill” is just added complexity that serves no purpose or detracts from the utility of the service. Avoid any temptation to gratuitously “add value” on behalf of customers, such as merging your core graph with others in an attempt to “make it easy” for them. Providers should also avoid “pruning” graphs, except in the case of automated filtering in order to differentiate between Freemium and Premium services.

Note #4 (29 Mar 2010): Providers should weigh this very carefully. It might well be that a “merged” graph truly is a value-added service to users, for which they are willing to pay a premium. My point is simply to avoid the gratuitous and respond to customer needs!

4. Crowdsourcing: From Amazon reviews to eBay listings, letting the customers do the work of building the service is the best way to expand a company far beyond what employees could do on their own.

By now it is not only obvious, but imperative that providers foster the development communities within and around their services. Usually communities are about evangelism, and this is certainly true for linked data providers, but increasingly service provides realize well-groomed communities can radically reduce their service costs.

Linked data providers should commit themselves to a minimum of direct support and invest in fostering an active community around their service. Every provider should have a means for members of their community to support each other. Every provider should leverage this community to demonstrate to potential adopters the richness of the support and the inherent value of their dataset.

Finally: In a thought-provoking post Linked Data and the Enterprise: A Two-way Street Paul Miller reminds the skeptical enterprise community that they, not merely their user community, will ultimately benefit from the widespread use of their data, and when developing their linked data strategy they should consider how they can “enhance” the value of the Web of Data, for paying and non-paying users alike:

…[A] viable business model for the data-curating Enterprise might be to expose timely and accurate enrichments to the Linked Data ecosystem; enrichments that customers might pay a premium to access more quickly or in more convenient forms than are available for free…