Data vs Dual Licensing: Which Will Make More Money?

“By 2012, at least 70% of the revenue from commercial OSS will come from vendor-centric projects with dual-license business models.

80% probability. This is may true today but the lack of revenue among broader market OSS products compared to Linux isn’t large enough yet to make this one a done deal. What is clear is that the overwhelming majority of ‘commercial oss’ efforts are based on a dual license model – vendor prefer the ‘open core’ moniker because it sounds more OSS friendly but its essentially the same thing.”
– Mark Driver, Gartner, Open Source Predictions for 2010

This prediction from Gartner’s Mark Driver confused me, I’ll admit, when I first read it. Baffled me, actually. Looking at the market, it seemed clear to me that the practice of dual licensing was, if anything, in decline. I couldn’t see how we could look at the same market and come to such different conclusions. My view was similar to Brian Aker’s (as is Cloudera’s Mike Olson’s, notably), most recently of MySQL/Sun:

When MySQL pushed dual licensing, investors looked for this hook in every business model. I remember standing outside of a conference room in SF a couple of years ago and talking to one of the Mozilla Foundation people. Their question to me was “Is the nonsense over dual licensing being the future over yet?”. The fact is, there are few, and growing fewer, opportunities to make money on dual licensing. Dual licensing is one of the areas where open source can often commoditize other open source right out of the market. The dearth of companies following in MySQL’s dual licensing footsteps to riches, belabors the point of how niche this solution was.

Even for MySQL, long the standard bearer for the approach, the logistics of dual licensing were and are becoming increasingly problematic over time:

For smaller firms, the primary limitation [of dual licensing] is the development. Unlike non-dual licensed projects which need only concern themselves with the quality and provenance of code contributions from external parties, dual-license vendors need also consider the question of copyright ownership. Because dual licensing depends on ownership of copyright for the entirety of the asset in question, third parties must either assign or be willing to jointly hold the copyright for any potential contributions. Early in a project’s lifecycle, this is a minor concern because the project owner likely employs most of those qualified to improve it. As a project matures and becomes more popular, however, this is a more pressing issue. First, because it acts to inhibit community participation (see slide 18 of this deck produced by Monty), but second – and more problematically – it means that third parties can, in practical terms, offer a more complete product.

Jeremy Zawodny made reference to the practical implications of the dual license in a post from December of last year entitled “The New MySQL Landscape.” In it, he made the assertion that “You can get a ‘better’ MySQL than the one Sun/MySQL gives you today. For free.” This is the cost of the dual licensing model: in return for the right to exclusively relicense the code, you forfeit a.) the right to amortize your development costs across a wide body of contributors, and b.) the right to uniformly integrate the best patches/fixes/etc that are made available under the original license because you cannot always acquire the copyright.

This doesn’t mean that dual licensing is a uniformly bad strategy, but it does imply that it has costs, and that those costs escalate over time. This situation is the inevitable result of the dual license model over time as applied to a successful project. For those looking for perspective from a MySQL and Drizzle developer, I’d recommend reading Brian Aker’s piece here.

Even setting aside the disincentives to pursuing a dual licensing strategy, the basic math of the 70% argument didn’t work for me. Even at MySQL, remember, a fraction of the revenue is derived from the issuance of dual licenses. And even if we assumed, for the sake of argument, that the entire revenue stream was the product of dual licensing, that still wouldn’t be enough to meet the 70% projection. Not nearly so.

As Driver notes when he says “the lack of revenue among broader market OSS products compared to Linux isn’t large enough yet.” Linux, it seems clear, is the largest single open source commercial ecosystem, and due to the lack of centralized copyright ownership, it cannot be dual licensed by anyone. What Driver is saying, in other words, is that the open source commercial ecosystem has to be big enough that Linux doesn’t comprise more than 30% of it.

Consider the following back of the envelope calculations. Red Hat’s revenues in the year ending 2009 were $652 million and change. We know that, for copyright and licensing reasons, none of that money may derive from dual licensing revenues. If we assumed, counterfactually, that Red Hat represented all of the non-dual license revenue of the market – the leftover 30%, my math says that the total revenue picture would be around $2.17B. Meaning that we need a little more than two Red Hat’s more worth of revenue to emerge from dual licensees like MySQL.

Part of the problem is, I believe, semantics. Driver seems to be conflating what is sometimes referred to as “open core” licensing with dual licensing. Personally, I believe they are distinct. The former tends to refer to varying combinations of open source and proprietary codebases, while the latter is more generally used in conjunction with copyright mechanisms as they apply to a single open source codebase. This view is supported by my analyst colleagues over at the 451 Group.

Were we to grant Driver the more expansive definition of dual licensing, however, I still think that figure is wrong. Based on the conversations we’re having with vendors in the space, it seems more likely that revenue growth and expansion will come not from quote unquote dual licensing, but derived intelligence from gathered data and telemetry.

Judging by the almost universally poor conversion metrics – that is, the number of users of a given open source tool that are converted to paying customers – it seems reasonable to assert that there are ongoing and systemic issues in the commercialization of open source software. Hence the proliferation of alternative revenue models such as dual licensing, open source and even SaaS. It is far from clear, however, that these models satisfactorily align customer and vendor interests such that conversion percentage will elevate to levels where they are competitive with proprietary software.

At the end of the day, open source customers are generally paying for one or more of a.) break/fix/integration/support/etc services that they hope not to need, b.) withheld features that they need to pay to gain access to, or c.) the right to not observe the terms and conditions of the original license. The relative distribution of revenue within this set is skewed by the size and scope of the Linux community towards A, with B being the raison d’etre for open core and C the same for dual licensing.

But what if open source vendors could leverage their primary strength – distribution – more effectively as a direct revenue stream? I’ve been predicting for three years or so that they would do just that, via data aggregation and analytics. The alignment of customer and vendor goals is better in this model than in virtually any other. The simplest example of this model outside of open source is Google, who provides users with search at no cost, receiving in return massive volumes of data which they monetize both directly (contextual ad placement) and indirectly (algorithmic improvement, machine learning, intelligence for product planning strategy, etc). Why couldn’t software vendors employ a similar model, trading free software for user generated telemetry data? The answer is, they can. SpiceWorks, for one, is doing just that now, quite successfully, albeit not with open source software.

The strength of open source is in its ubiquity, and the volume it commands ensures that the telemetry returned would have substantial – potentially immense, depending on the project – value. Importantly, however, the value lies in the aggregation. A single user’s telemetry is likely to be relatively uninteresting. A hundred users’ telemetry, more interesting. A thousand users’, that much more so, and so on. Users, therefore, wouldn’t be surrendering anything of material value to a would-be vendor in the transaction. Better, analysis of the aggegrate could have enormous value to customers. How is my infrastructure performing relative to similar environments? What are the types of conditions that indicate a potential problem? What differentiates my architecture from the Top 10 best performing? These are answerable questions…if you have a big enough dataset. Most customers would not have that; an open source software provider aggregating and analyzing their combined telemetry would.

Privacy and trust will certainly be concerns, but if the right data is offered as an incentive and the appropriate anonymization assured, those can be addressed for most customers. And for those that remain concerned, they should have the ability to opt out understanding that they will in turn have no access to the resulting analytics, and might therefore be at a disadvantage relative to their competitors who were using the intelligence.

This direction seems nothing less than inevitable for me, and so it is no surprise that we’re beginning to see (and help) a variety of open source vendors move in this direction. Free and open source data has a bright future regardless of the revenue model, but as we see successful projects better leverage their traction via analytics, the result should be a win for ecosystems and customers alike.

Whether you believe as I do, however, that the money is ultimately going to come from data more than code, it seems clear to me that it is not going from what is commonly considered to be dual licensing. Because while it is not true that I am an enemy of that particular approach, I do believe it’s in decline. Not least because it’s poorly aligned with customers needs.

4 comments

MySQL’s dual licensing had structural flaws, but the key problem has been the sales people’s abuse.
Also, the enterprise subscriptions are separate from the dual licensing, and again that has structural flaws, but conceptually it is a good service model.

I dislike the basic topic, for two reasons:
1) data vs dual… there’s more choice than that.
2) which makes more money is irrelevant IMHO. I’d ask which has the most sustainable model over time and provides value for clients not just vendor.

[…] is likely, in my view, to be more profitable longer term than mechanisms such as dual licensing (coverage). Telemetry (coverage) is the obvious next revenue source for open source entitites. Customers […]