Therese Sullivan, editor of BuildingContext.me, and our ControlTrends’ eyes and ears from Silicon Valley, takes us on a data journey that begins at the ethereal origins of Artificial Intelligence (Dartmouth, 1956) and delivers us to metadata and Project Haystack — introducing some of the label makers in between.

One of the most re-watched episodes of the comedy series Seinfeld is ‘The Label Maker’ when Elaine’s gift to a friend at Christmas was re-gifted to Jerry before the Superbowl. True, a thing for tagging other things was not a fun or romantic gift in 1995. And, perhaps Bryan Cranston’s fictional dentist didn’t get it. But, Julia Louis-Dreyfus’s Elaine was way out ahead in her thinking. Label making is important! If you are a data wrangler today, you should appreciate any gift that helps you tag things with metadata labels.

Frank Chen of Silicon Valley venture capital firm Andreessen Horowitz (a16z) presents a timeline in his AI and deep learning mini-course that happens to plot label-making’s journey from elephant gift to tech’s newest cool thing. Released to all interested students in June 2016, it is a fantastic history lesson and primer on what is happening in artificial intelligence (AI) today. He writes:

“One person, in a literal garage, building a self-driving car.” That happened in 2015. Now to put that fact in context, compare this to 2004, when DARPA sponsored the very first driverless car Grand Challenge. Of the 20 entries they received then, the winning entry went 7.2 miles; in 2007, in the Urban Challenge, the winning entries went 60 miles under city-like constraints. Things are clearly progressing rapidly when it comes to machine intelligence. But how did we get here, after not one but multiple “A.I. winters”? What’s the breakthrough? And why is Silicon Valley buzzing about artificial intelligence again?

So you can say to your phone ‘show me pictures of my dog at the beach’ and a speech recognition system turns the audio into text, natural language processing takes the text, works out that this is a photo query and hands it off to your photo app, and your photo app, which has used ML systems to tag your photos with ‘dog’ and ‘beach’, runs a database query and shows you the tagged images. Magic.

Try it without labels (‘unsupervised’ rather than ‘supervised’ learning). Today you would spend hours or weeks in data analysis tools looking for the right criteria to find these, and you’d need people doing that work – sorting and resorting an Excel table with a million rows and a thousand columns, metaphorically speaking.

The eye-catching speech interfaces or image recognition are just the most visible demos of the underlying techniques.

The important part is not that the computer can find them, but that the computer has worked out, itself, how to find them.

Did you catch that? The speech and image recognition technology may be superficial eye-candy compared to the feat of putting together the underlying knowledge graph. In other words, how you classify and label objects is at the core of how well your AI works. Knowledge graphs for the World Wide Web are the domain of semantic web researchers. Three leading professors in the field from the University of Zurich, Rensselaer Polytechnic Institute, and Stanford University collaborated on the September 2016 article, A New Look at the Semantic Web. Here are some key excerpts from this long-form editorial:

Bringing a new kind of semantics to the Web is becoming an important aspect of making Web data smarter and getting it to work for us. Achieving this objective will require research that provides more meaningful services and that relies less on logic-based approaches and more on evidence-based ones.

Crowdsourcing approaches allow us to capture semantics that may be less precise but more reflective of the collective wisdom.

We believe our fellow computer scientists can both benefit from the additional semantics and structure of the data available on the Web and contribute to building and using these structures, creating a virtuous circle.

Frank Chen points out that the latest Google image recognition algorithms can chow down on the entire collection of videos on Youtube. But, when they do that, they get a graph that skews in favor of cats doing funny things. That doesn’t reflect the real world. The best knowledge graphs, metadata schema, neural nets—whatever you want to call this undergirding ML labeling technology that does the classifying—the versions that work best reflect the collective-wisdom and first-hand evidence of those with physical-world experience.

This brings us to Project Haystack, the open-source organization launched in 2011, devoting to developing a standard mark-up language and a tagging schema for devices in commercial buildings. Given the core importance to AI of getting standardized labeling right the first time, it is no surprise that Academia and big-IT picked up on the Haystack schema when they launched Brick schema. One way to look at it is that there is more industry, academic and government energy, focus and money being invested in label-making than ever before—what a gift! Seinfeld’s Elaine Benes would be such a supporter if she were here today. And even the dentist that became Walter White of Breaking Bad would not under-appreciate it. I hope more of those that hold the evidence and wisdom to contribute get involved. Silo-ing data was the way business was conducted in the last innovation cycle, but, it won’t work going forward in the age of AI and machine learning (ML).

Another reason to do your part: tomorrow, there may not be chief marketing officers and chief technology officers, but rather chief labelers of marketing things and chief labelers of technology things, etc. The labeling of training data for machine-learning algorithms is about to consume us all—at least everyone that works with computers, mobile phones, and Internet-of-Things devices. So, best to get ahead of the game.

Cade Netz, the key note speaker at the 2016 RealComm/ Icon conference talks about the technology trends he is seeing in Silicon Valley and how these trends will effect the landscape of technology used in Smart Buildings.

This is next-gen analytics! Dr. Igor Mezic, Ecorithm’s Chief Scientific and Technology Advisor & Co-Founder, and John Morris, VP Marketing and Sales, tell ControlTrends about what inspired Igor to develop Ecorithm’s technology, why Ecorithm is able to provide faster and more accurate results that other analytics companies, and why Ecorithm’s SaaS is necessary for the entire lifetime of the system.

“To come up with the solution, we had to think at both the micro and macro level and design the software platform to accommodate everything in between. We came up with a very modular design in which the underlying foundation filters through the noise of the massive data sets and recognizes key patterns. On top of that is a layer of domain expertise that includes the physics of how the ‘healthy’ devices and systems are supposed to operate. And resting on top of that is an interface to quickly tailor the spatial and physical connectivity of devices in the virtual database to match the configuration and operation of each physical building. That means exceptionally quick start up and customization, highly detailed insight and root cause analysis, and easy integration of new devices or changes in configuration. Also, this makes the platform readily extendable beyond buildings to other complex systems as well.”