IoT Needs Open Discovery Scheme

The Internet of Things will require an open-source self-classification scheme if it is to scale to its full potential of trillions of nodes.

A great many people fail to understand the fundamental changes that will come with the Internet of Things. Instead, they expect traditional networking protocols such as TCP/IP and host-server architectures to carry on. After all, doesn't IPv6 make possible trillions of addresses? And isn't that the only problem that was left to solve in the Internet of Things?

That might be true enough if the only devices on the IoT were the PCs and smartphones familiar to us today. But that's not the future. Instead, there will myriad purpose-built machines, sensors, appliances, and actuators pumping out countless streams of data. This rise of the machines creates the need for an open-source solution in three ways.

First, most of these relatively simple devices will be cheap -- less than a few dollars per device. They simply can't absorb the costs of traditional networking.

Second, they will number in the trillions. No current manmade scheme can digest data from that many devices.

Third, their manufacture and sale will be completely decentralized. Millions of manufacturers sprinkled throughout the world will lack the manpower and expertise to interact with centralized databases or standards bodies, so there is no hope of distributing media access control IDs to these makers.

The answer to gaining useful information from this impending fire hose of IoT data is simplicity itself. It requires the use of publish/subscribe architectures in IoT servers. This is key because the most interesting data streams may be completely unknown to the organizations that need them. By searching for published streams of a particular type, streams from a defined area, or streams that vary in an interesting way, servers may build up information neighborhoods to extract and integrate knowledge from the IoT.

To discover which streams are of interest, we need to turn to nature, the one system that has already solved problems on a scale this massive. The fundamental concept in nature's systems is self-classification. By identifying themselves as a particular type of sensor, applicant, or actuator, simple IoT devices blindly bleating out values and readings can provide the necessary information to be incorporated into a higher-level picture of the world through the subscriptions of servers.

Some of these data streams might be known structures, such as those of an OEM that define the signatures of its own devices. But many more -- and perhaps the more interesting ones -- will be built up from previously unknown devices as servers subscribe to pertinent data streams.

Everyone and no one will manage this self-classification scheme. By making these self-classifying data structures open-source, they may readily be adopted by anyone who wishes to participate in the IoT. As I outlined in my book, this open-source taxonomy is combined with the capacity of private extensions to permit some parts of the data stream to be available to all, while other parts are restricted to those with the correct key.

Self-classification is the fundamental requirement for publish/subscribe architectures to work, and publish/subscribe will be the only way to make sense out of the emerging Internet of Things. But that self-classification must be open-source if it is to scale to the trillions of devices and millions of makers that are in our very near future.

-- Francis DaCostais an engineer, architect, and author focusing on scalable architectures for the emerging Internet of Things.

@Luddy0. More to come in the next blog. Some of it is described in my book, specifically the importance of a self classifying "chirp" protocol. Open Source communities will maintain and manage the chirp taxonomy but in organic fashion as in Nature, see the presentation made at the IOT world event, and on my linked in page (cannot post URL) And it will be covered in next blog entry and looking forward to your comments then. Thanks for posting your comments. Francis

I'd love to be optimistic about this, but I just don't see how it works from the point of view of the consumers of this data. How can I operate on a stream of data whose format and meaning I learn dynamically, by discovery?

It reminds me a little of the premise of XML: by adopting a 'universal' format whose syntax is self-describing, we can share information from all kinds of sources easily. But has this promise really materialized? Are there systems that process XML sources whose meaning (rather than syntax) isn't independently standardized? I don't see it happening, and I don't understand how this case can be any different. The syntax side of XML works well enough, but the dynamic semantic side has gone nowhere.

If I create a device that does something new, takes some kind of reading from its environment and then announces the availability of this data, how can I write a consumer of that data without having prepared ahead of time to receive exactly that kind of data? I guess I can believe that I might be able to adapt to different formats for the data, but different categories of data altogether? I just don't see it. It sounds too abstract to me. That's especially so considering the volume of data which it is claimed will be available.

This is apart from the question of building an infrastructure that is prepared for several orders of magnitude more devices to begin publishing their presence.

I see IOT being successful in restricted environments where a changing population of devices with very tightly prescribed capabilities either operates in concert or under the control of a server(s). This more abstracted vision of vague capabilities announcing themselves: hard for me to believe.