Posted:June 30, 2014

New UMBEL Site Marks Shift

Structured Dynamics Moves to Integrate Key Initiatives

Our first release of the UMBEL site occurred in 2007 while UMBEL was still under development. That site used its own homegrown HTML. The release was followed in 2008 by the addition of our own Web services. The Web services were well-received, which caused Structured Dynamics to develop the more general structWSF Web services framework (most recently updated as the OSF Web Services). We subsequently migrated the earlier UMBEL Web services to this more general framework, and also migrated to Drupal as the standard content management and Web site component for OSF.

For most reasons, including all client work to date, our OSF framework (Web services + Drupal 7) has been performant and met client site needs. However, the operation of the UMBEL Web services was often problematic after moving to the Drupal (full OSF) version. Unfortunately, we have seen both performance and stability problems, though calculations over a full 28,000 node graph are a challenge in any environment.

Since the UMBEL structure was an order of magnitude larger than our client work to date, we have frankly adopted a posture of occasional monitoring and reboots to keep the UMBEL Web site up. This posture was not limiting use of UMBEL for general browsing purposes, but was limiting its usefulness as a working API.

Because the cobbler’s son is often the last to get shoes, we have let the UMBEL Web site chill to a degree in the background. But, now, with other imperatives underway and some dedicated time to look directly at performance of larger-scale ontologies, we have looked at these items anew. The report card on our current evaluations is contained in a newly released UMBEL Web site with services, which I summarize and provide context for below. What emerges is an interesting story of discovery and growth.

In essence, we have learned two important things about our prior practice with respect to making UMBEL Web services broadly available. First, for UMBEL, we do not need or want our standard configuration of having a Drupal front-end as the interface into OSF. Access to a knowledge graph does not need — and is ill-served — by having a complicated interface stand atop a large-scale concept model. APIs and Web services are the most important interaction points with the UMBEL knowledge graph, not a user-oriented Web site.

Second, in the various phases of our work, we had come to embrace the idea of ontology-driven applications (what we have termed ODapps). The compelling vision behind such structures is to place the emphasis on knowledge structures and data, rather than more software. Once one begins to unpack that vision, it can also become clear that software programming languages themselves that look toward “code as data” might be one way to be consistent with that vision.

Seeking a Sense of Harmony

For years I have been writing about data integration and interoperability and our company has been devoted to the topic. I have written extensively about the importance of RDF and description logics to how we organize and represent data. We were also some of the first to supplement RDF with a faceted text-search engine (Solr) to provide the most responsive query environment across structured to unstructured data. We have also adopted ontologies and the OWL 2 (plus SKOS) languages as standards to both foster and enable interoperability. We have explored native data structs to understand how wild forms of information can be efficiently pipelined into interoperable RDF and text forms.

All of this points to the ideal of the democratization of the information function in the enterprise. In other words, to the idea that how data structures get organized and represented (the ontology side of things) is something that knowledge workers can do themselves, rather than accepting the bottleneck of IT and programmers.

This is well and good except there is a critical “last mile” between data representation and data usefulness. This “last mile” deals with how actual data gets manipulated and then organized and presented (visualized). Query responses, reports, analysis and maps continue to be the choke points between knowledge workers and their IT support. And one need not frame this entirely from an enterprise perspective: these same challenges exist for the individual researcher or the small organization.

So, while one can focus on data and its organization and representation, until we address this “last mile” problem, we still are not likely addressing the largest source of frustration and lost opportunities in the knowledge function.

The reason that simple data struct forms and tools like spreadsheets continue to be popular is that they are empirically the best tools for the “last mile”. Web forms and services are increasingly showing their strengths in this realm.

Once one steps back and looks at the entire cycle from basic datum to actionable knowledge, it is clear that the question of data model is but one portion of the challenge. The remaining challenge is how (now) accessible information can be placed into context and acted upon. Further, if one premise is the democratization of the information function, then the challenge should also be how to provide productive capabilities for the last mile to the knowledge worker. Productivity is enhanced when there are the fewest channels and distortions between signal (problem) and channel (user chains).

Fred, in his investigation of functional languages, clearly saw that bringing the languages of code (programming) into the language of data (knowledge workers as expressed in our RDF world view) was one means to reduce the number and lossiness of the channels between problem (signal) and solution. A world view premised on the efficient representation and interoperability of data must logically support the idea of a coding (instructional or language) base aligned as well to problems. Moreover, since software guides the actual computer operations, a form of the software that supports the nature of the data should also provide a more performant framework for moving forward. In technical terms, this is known as homoiconicity.

Whether one looks to the intellectual foundations of Charles S Peirce or Claude Shannon (both of whom we do), one can see that the idea of signs and information theory means finding both data representations and code that minimize communication losses and promote the accurate transfer of the message. Lossless data transmission is one contributor to that vision, but so too is a functional representation for how the information is to be processed and transformed that aligns most closely with the information at hand.

Ergo, a better model for data is not enough. A better model of how to manipulate that data (that is, software) is also needed that aligns with the idea of coherence and structure in the underlying information. For our purposes, we have chosen Clojure as the functional language basis for these new UMBEL Web services. Not only is it performant, but it aligns well with the creation of domain-specific languages (DSLs) that also promise to democratize the computing function for the knowledge worker.

Bringing the Pieces Together

Fred and I founded Structured Dynamics a bit more than five years ago. But, we had worked together much earlier on UMBEL and Zitgist. For nearly ten years now, we have episodically emphasized a few different initiatives and passions.

One of those passions has been the structure of data and information. It is this perspective that brought us to RDF and data structs (and our irON efforts) at various times. The idea of structure is a basis for our company name, and represents the belief that structure can be brought to unstructured forms (via tagging, for example). Structure is perhaps the most common notion or concept in my own writings for a decade.

Another need has been the idea of making semantic technologies operational. We have been keen researchers of the tools space and algorithms and such since the beginning. We observed early on that many innovative and open source semantic programs existed, but most were the result of EU grants or academic efforts elsewhere. Thousands of tools existed, but very few had either been evaluated or stress-tested. By bringing together the best of class tools and integrating them, we could begin to provide a useful semantic platform for enterprises. This motivation was the genesis for the Open Semantic Framework, and has been the major source of our client support since SD was founded. We have finally created an enterprise-capable platform and have done much to transfer its technology. But, these concepts are difficult, and much remains to be done before semantic technologies are a standard option for enterprises.

Still, in another vein, our first love and interest has been knowledge bases. We first identified the need for UMBEL years ago when we perceived an organizing vocabulary would become an essential glue on the Web. We pursued and studied Wikipedia and how it is informing knowledge bases. Instance data and how it is represented is a passion for how these knowledge bases (KBs) get leveraged going forward.

As a smaller consulting and development boutique, we have needed to be opportunistic about when and where we devoted efforts to these pieces. So, over the months or years, we have at various times devoted ourselves to data models and ontologies (structure), the Open Semantic Framework (platform), or UMBEL or Wikipedia (KBs, knowledge bases). Depending on funding and priorities, any one of these threads did receive episodic attention and focus. But, truth is, each one of these pieces has been developed in (project-level) isolation to the whole. Such piecemeal development was essential until each component achieved an appropriate degree of maturity.

I could say we could foresee some years back that all of these pieces would eventually reinforce and bolster one another. Though there is a small bit of truth in that statement, the way things have actually unfolded is to show, as experience and sophistication have been gained, that there is a synergy that comes in the interplay of these various pieces. The goodness is that Structured Dynamics’ efforts (and of its predecessors) were building inexorably to the possible cross-fertilization of these efforts.

Once this kind of realization takes place — that data, code and semantics move hand-in-hand — it then becomes logical to look at the entire knowledge ecosystem. For example, it is not surprising that artificial intelligence, now in the informed guise of KB-backed systems, has again come to the fore. It is also not surprising that what software and programming languages we bring to bear also directly interact with these concerns. Just as Hadoop and non-relational database systems have become prominent, we should also investigate what kind of programming languages and constructs may best fit into this brave new information world.

What we have seen from that investigation is that functional languages (with their DSL offspring) somehow fit into the overall equation moving forward. SD has moved from a single-focus endeavor to one explicitly looking at integration and interoperability issues. What we had earlier seen as (largely) independent pieces we now see as fitting into a broader equation of related emphases:

We are seeing artificial intelligence moving in these directions. As a subset of AI, I suspect we will also see the semantic Web moving in the same direction.

We clearly now have the theory, the data, the understanding of semantics, and languages and data representations that can make these democratic interoperabilities become real. This new UMBEL Web site is the first expression of how these pieces can begin to work together into a compelling, accessible whole.

Structured Dynamics announced its new UMBEL Web site and set of Web services. The Web services are now written in Clojure and the Web site framework uses Bootstrap and plain ol' HTML. The site is the first implementation of SD's integration of structure, platform, knowledge bases and functional programming languages.