Erik Wilde's Web and Information Architecture Musings

Tuesday, February 10, 2015

A little while ago (almost three years), Mark Nottingham wrote his excellent article JSON or XML? Just Decide, and it seems that it's time to update that a little. But it also is a good opportunity to put it in context, and see how the context has changed at the surface, but not in nature.

XML was already trending downwards, and most people liked JSON better. To me, the main reason for that was always easy to see: XML is a document-oriented language and pretty good at being one, but its metamodel is a bit hard to grasp. JSON, on the other hand, maps very directly to the mental model of what most developers think structured data should look like, and therefore it makes them happy and productive.

During the transition phase from XML as the first big language for structured data on the web to JSON, many API designers were unsure what to choose. XML because of its established status, or JSON because it seemed to be trending and a better fit for most API data models?

The worst possible decision (and the one Mark was blogging about) was to avoid a decision and assume that you can do both. Mark nicely describes the problems, and it's worth to notice that the problems do not have anything to do with XML's or JSON's inherent strengths or weaknesses; they simple originate in the metamodel mismatch. Every metamodel has idioms and built-in bias, and when you try to make two masters happy, you end up in one of the bad situations Mark pointed out.

This gets much more pronounced when you are not just defining one relatively stable API under your control, but something that is supposed to be open and extensible. The reason for this is the same: Metamodels have their own ways of defining and allowing and encouraging extensibility, and these are often even more idiomatic and specific than the metamodels themselves.

In the end, Mark's message was to better not use XML, because it is tricky and developers try to avoid it anyway and then throw data binding tools at it, resulting in brittle code on their end (but they will still blame you when their code breaks).

But: It seems we're heading down a similar path now with JSON and RDF, and people claiming that you can safely do both at the same time. This time it seems that people think that JSON-LD is the magic that allows them to not decide on their metamodel. This is too much of a burden for JSON-LD, and more than has been designed for or can live up to.

Simply said, JSON-LD is data binding for people dealing with JSON data who prefer to have an RDF view of that data. It does a good job of allowing people to perform this mapping, in the same way as XML data binding tools isolated developers from the XML they did not want to process themselves.

In the end, if you want to create a robust model, it is impossible to avoid a decision which metamodel you want to build it on. If you try to ignore openness and extensibility hard enough, you may get away with it for a little while. But it will catch up with you eventually, and in particular in situations (such as open standards) where people will take your foundation and stretch it to its limits in all possible directions.

A clean way I would recommend given our experience with how metamodel trends change over time, and APIs are being used and extended, would be the following:

Start with JSON as your foundation, defining your model in ways that are easy to read and understand and implement for developers. The upcoming work on GeoJSON will be a good exercise in this. In particular, be clear about openness and extensibility, so that implementations know what to expect now and in the future. This also allows them to drop/fail when those rules are not followed.

Your extension model must clearly say how extensions are supposed to be exposed to applications, so that implementations can be build appropriately.

If there are developer communities invested in other metamodels, have a completely separate layer for them, where they can build data binding in any way they please. For GeoJSON, that's GeoJSON-LD (which happens to use JSON-LD to map into RDF), and if somebody felt the need to bind GeoJSON to XML, they would be more than welcome to for example use JSONiq to map GeoJSON into some XML model.

Have plenty of test cases that explore the limits of your JSON model, and the limits of your extension model. Document clearly for each test case how it is using the extension model, and what an application is supposed to see.

Have generic applications consuming your test cases, including those that implement the data binding layer(s). Make those clients produce JSON syntax out of what they get reported through the data binding layer, and compare this JSON to the original test data. Nothing should get dropped.

This probably sounds a bit complicated. But it really isn't all that hard to do, and should you decide to use a clean layering structure such as the one proposed here, then doing this kind of testing is necessary to avoid specification errors anyway.

But, whatever you do, do not try to serve two masters at the same time. Layering is a well-established pattern in software and protocol design, and there are good reasons for this. Trying to make two communities happy in this case will make none of them very happy, and very likely end up with a product that's neither robust nor evolvable.

Location has quickly moved into the mainstream of the (mobile) Web and it continues to be a strong driver of research activities, addressing a wide range of Web-related topics, including search, retrieval, mining, extraction, analytics, mobility, services, and systems. After the initial boost and consolidation of approaches based on the simple use of geospatial coordinates, we now see an increasing demand for more sophisticated systems, stronger retrieval, mining, and analytics solutions, and more powerful semantics. Location is playing a key role as a context factor for users, but also as the implicit or explicit place of resources and people. It also is an important factor in mobile and geo-social applications and is driving geospatially-aware Web data mining.

Following the successful LocWeb workshops in 2008, 2009, 2010, and 2014, LocWeb 2015 will continue this workshop series at the intersection of location-based services and Web architecture. Its focus lies in Web-scale systems and services facilitating location-aware information access. The location topic is understood as a cross-cutting issue equally concerning Web information retrieval, semantics and standards, and Web-scale systems and services.

The workshop is expected to establish a common, integrated venue where the location aspect can be discussed in depth within an interested community. We aim for a highly interactive, collaborative full day workshop with ample room for discussion and demos that will explore and advance the geospatial topic in its various relevant areas. New application areas for Web architecture, such as the Internet of Things (IoT) and the Web of Things (WoT) mean that there will be increasingly rich and large sets of resources for which location is highly relevant. We expect the workshop to further the integration of the geospatial dimension into the Web and promote challenging research questions.

LocWeb solicits submission under the main theme of Web-scale Location-Aware Information Access. Subtopics include geospatial semantics, systems, and standards; large-scale geospatial and geo-social ecosystems; mobility; location in the Web of Things; and mining and searching geospatial data on the Web. The workshop encourages interdisciplinary perspectives and work describing Web-mediated or Web-scale approaches that build on reliable foundations, and that thoroughly understand and embrace the geospatial dimension. We are also interested in work tying into ongoing W3C activities such as Web of Things, Social Web, Data Activity, and Geolocation WG.

Topics of Interest

Location-Aware Information Access

Location-Aware Web-Scale Systems and Services

Location in the Web of Things

Large-scale Geospatial Ecosystems

Standards for Location and Mobility Data

Location Semantics

Modeling Location and Location Interaction

Geo-Social Media and Systems

Location-Based Social Networks

Geospatial Web Search and Mining

Visual Analytics of Geospatial Data on the Web

Location-Based Recommendation

Geo-Crowdsourcing

Mobile Search and Recommendation

Submission Instructions

We solicit full papers of up to 6 pages and short papers of up to 3 pages describing work-in-progress or early results. Authors are invited to submit original, unpublished research that is not being considered for publication in any other forum.

Workshop submissions will be evaluated based on the quality of the work, originality, match to the workshop themes, technical merit, and their potential to inspire interesting discussions. The review process is single blind.

Accepted workshop papers will be published in the WWW companion proceedings and available from the ACM Digital Library. This may be regarded as prior publications by other conferences or journals.

For inclusion in the proceedings, at least one author of the accepted paper has to register for the conference. At the time of submission of the final camera-ready copy, authors will have to indicate the already registered person for that publication.

Presenters are encouraged to bring demos to the workshop to facilitate discussion.

Monday, September 01, 2014

With the iWatch hype reaching all-time highs, it seems clear that the landscape of wearables will change quite dramatically over the next year or so. But as usual, it's much harder to predict how things will change, instead of just saying that they will change.

In order to find out at least for myself, I recently started playing around with more services and devices than already usual. I always have been a e-gadget addict, but now I am reaching new heights: On yesterday's long trail run in the Sierra, I realized that I was carrying around 4 GPS devices: a Garmin 910XT GPS watch, my iPhone, my GPS-enabled camera, and my SPOT satellite tracker. That's probably a bit more than needed, but the only ones I actually use are the Garmin (displaying basic data such as moving time and distance), and the camera (and I never enable its GPS, because it takes way too long to get a good GPS fix). The SPOT is for emergencies only, and the iPhone is just for running the Moves app.

How did it get that bad? About a month ago I started using a Basis watch, mostly to start collecting data for our research activities around wearables and personal fitness goals. I stopped wearing it pretty soon because the band causes irritated skin, and the watch turned out not to be waterproof. I also disliked the incredibly dim display. But even during the short time I used the watch, I discovered that what I found most intriguing was something that I never would have coinsidered.

The Basis is pretty smart at figuring out what a person is doing. Not in many different ways, but good enough to distinguish walking, running, and cycling. I really liked how when I was getting around town, I could simply look at the watch and it would tell me that I had been walking or cycling for however-many minutes.

Since the Basis hardware had various issues, I recently switched to the Moves app. It does everything that I liked in the Basis. It does not give me heart rate or the other two nonsense measurements (perspiration and skin temperature) that the Basis records. But more importantly, it has GPS and thus is much better at being a meaningful diary (they also team up with Foursquare, so the app will suggest meaningful place names for many places).

A day in town may look like the picture shown here. Have some coffee in the morning, then stay at the office most of the day, pick up the car from the mechanic in the afternoon, and get some pizza in the evening. Moves has a great UI to make it as easy as possible to fix some of things that the app may get wrong, so ending up with such a geo-diary takes almost no effort. But it still means that I have to go to the Moves app and clean things up. Which at this point, I am willing to do because it's for the greater good of research, but this is where wearables (remember? this is where i got started...) enter the picture.

What a good wearable could add here are two main things that right now are missing from the Moves app:

I am not a big fan of carrying my phone everywhere. If an iWatch/iPhone app combo is done well (more on that later in a more technical post on wearables, WoT, and REST), then all I need to carry is the wearable, and it'll sync as soon as it gets in touch with the mothership again (such as requesting place names from Foursquare, and so on). What's most important here is a wearable app that's very specifically designed to work well when not paired with a phone.

Instant feedback on the wearable a bit like the Basis: Tell me that I have been walking/cycling/driving for some amount of time, and let me fix it if the wearable app guessed wrong (Basis does not allow fixing at all, Moves takes a bit of work to get to the UI place where I can do it).

In the end, what users end up with is a very personal, very contextual way of looking at their history. Of course this then could be used for Big-Data-style sensemaking, but that's a different issue (more about this in a later post). Mostly, it's a great way of building up a diary of places and tracks. And with a convenient feedback unit on your wrist, it's easy to see how it's building up, to intervene with direct feedback/corrections, and to not have to carry around the phone everywhere.

One of the interesting questions is where this data is being kept, and who gets access to it. I'd prefer this to be something that I own and control, which is not the case with Moves. I would be perfectly happy to pay money for this, but I guess as usual, that will not even be an option, and you end up paying with giving up privacy as the only accepted currency.

Being a major location geek, I am excited about the prospect of a better and more flexible wearable, and the iWatch may turn out to be for wearable space what the iPhone was for the smartphone space (even though the latter was probably even more trailblazing, by redefining so many aspects of the category). I have been using my GPS for about 5 years now, but only for sport activities. The Strava heatmap shown here is the result of many runs trying to cover all Berkeley Hills roads (blue means less coverage, read means more runs along the same road). Building up a similar personal geography for everyday activity may not just be fun, it also can be useful in terms of better understanding one's life, and figuring out things such as how to better get around town to get things done.

As said initially, I guess it's safe to say that the iWatch will change the landscape of wearables. Personally, my hope is that it focuses on being a good platform, and not so much on getting one single application right. That may be the best and maybe even the only way to avoid the current fate of most wearables, which is that they are novelties that early adopters get and play around bit, and then they get discarded.

Current wearables get discared quickly because they are too narrowly focused on doing just one thing, and there is no platform thinking backing them up. If there is one single thing that Apple is great at, it's platform thinking, so the iWatch very well could be another game changer, 7 years after the iPhone changed the smartphone category.

Tuesday, July 01, 2014

At the recent W3C Workshop on the Web of Things in Berlin (workshop report blog post coming soon!), one of the obvious and non-trivial questions of course was: what, exactly, is the Web of Things? And how does it relate to the Internet of Things, which is another term that is used quite a bit (and quite a bit more often than WoT, at least for now)?

To me, this question eventually will be answered by one of my beloved Wittgenstein quotes: The meaning of a word is its use in the language. However, there always is some time until use (and thus meaning) converges, at least in a substantial part of the language community. And it seems like we are not at that point yet. So here is one attempt to say what WoT is and isn't, and I am sure some people will agree, and some will disagree.

We had various discussions during the workshop, both in the forum and in informal conversations. My answer always was a very simple one: The difference between IoT and WoT is the same as the difference between the Internet and the Web. This was always the fun part for me, because that was always my introductory question in my Web Architecture courses, and surprisingly few of our students knew the answer. The answer comes in one sentence:

The Web is a resource-oriented information system that is based on uniform interactions with globally identified resources, and uses the connectivity provided by the Internet as its interaction fabric.

With this definition in mind, WoT can be defined rather easily. It follows the principles of Web Architecture, which in today's Web are mostly embodied in URI and HTTP as the two central pillars of identification and interaction (a.k.a. REST).

This definition also allows to rather cleanly separate the three main communities that were part of the workshop:

What I call Connectivists, who mostly care about the last mile, i.e. how to connect things such as sensors. These I would put squarely in the IoT camp, and while they are of course essential for WoT to even exist, they are not so much concerned with Web architecture.

What I call Interactionists, who mostly care about WoT according to the definition above, and who don't really care so much whether sensors are directly connected or via gateways, as long as there are well-identified resources with uniform interaction models and self-describing representations.

What I call Modelers, who are mostly concerned with modeling the WoT space. This is what I would call the Semantic Web of Things (SWoT), according to how the Semantic Web community relates to the Web.

It remains to be seen what (if any) activity will happen as a result of the workshop. These were very interesting two days, and sometimes it was challenging to establish a shared vocabulary when discussing with people there, because many of us had and have different perspectives and definitions in mind.

Maybe treating the IoT/WoT separation in the same way as the Internet/Web separation might help, because that latter one is well-defined, and thus could help to come up with a simple and well-defined separation. But in the end, it will all go according to Wittgenstein: WoT will mean whatever the majority thinks it means, and maybe over the next one or two years, such majority will actually form. I don't think we're there yet.

Unsurprisingly, the collective answer to this question was yes, and the panelists provided some examples where and how business strategy was improved by making data-driven decisions.

Again, I felt a little uneasy about the fact how pretty much all discussions and scenarios are focused on centralized scenarios, and thus essentially are about Big Data. Few seem to even consider decentralization and service-driven approaches, but maybe that's fair since the event is called Data Edge and not Service Edge. But then again, given that at least some talks explicitly mentioned Data Science, I would have hoped for more focus on ecosystem thinking, and less focus on building/running data crunching systems.

Stamen focuses on using Data to build visualizations, but they don't claim to do Data Science. Instead, they work with Data Science teams which then generate data for visualization. The goal is to create beautiful, engaging, and accessible projects that delight and inform the public.

Alan is presenting many different visualization, mostly focusing on geographical data and maps. This definitely makes you think a lot about how many options there are to present and perceive data, which can make a very big difference even if the underlying data and maybe even analytics are the same.

The panel starts by talking about 9/11 and Snowden. Panelists point out that Big Brother couldn't even have dreamed of smartphones. And yet we pay for them.

One decidedly absurd line of argument is about how only because in retrospect you can find some data pointing to something you know did happen (in this case, 9/11), it makes any sense at all to conclude that you could have predicted this event, had you only had more resources and/or more willingness to violate people's rights. The simplicity and shortsightedness of this argument has bothered me since it was first brought up after 9/11, and yet people can still get away with it. This was a rather political panel, and thus that's probably why absurd claims were made, and nobody objected.

More specifically, when it comes to managing one's own data: does sharing have to mean giving away all information and all control over it? Is it possible to imagine that there's a different model out there? Maybe people can have tethers to all the information, still owning their information. Google shouldn't own the data people are generating, they should just have a license to use it, and people can revoke that license. But is it really possible to control information this way, given that much/most data cannot be easily tied to just one owner? One more overly simplistic and rather absurd claim, but again panelists got away with it.

An interesting story about better understanding your audience and then being able to tweak your data capture. In this case, the story is about the Obama 2012 campaign, and how the online donation campaign was improved by observing errors, understanding how they originated, and then make small improvements to the online form (such as help texts on the online form), which can result in significant reduction of errors.

The company's data scientist team acts as internal consultants, working in areas such as user acquisition, customer retention, monetization, strategy and competitive analysis, and game design.

Data can be used to inform game developers, looking at user feedback to design elements of games. This is possible because of the online nature of games, and cannot be done in the same way in more traditional games on consoles or computers. However, the secret sauce is combining this feedback with the art of game design to make and improve designs.

Justifying decisions you've already made with data is not the best way to do Data Science. In hindsight, you'll always find some explanation to justify or explain a decision. The good way to use Data Science is to design experiments that allow you to make decisions that change the way you do business. The problem is how fast you can go through this cycle.

Success is the combination of Science and Art. Using only mathematics will get you stuck. You need crazy ideas to allow you to move towards more interesting ways to do business.

Data science needs to be cheap, so that experiments can be cheap, and failure of those is cheap. In such an environment, the ratio of success and failure still results in a positive ROI.

I think this was my favorite talk of the conference. What I liked was the fact that it did not put the cart in front of the horse. Instead of saying that data science and big data are great, now what can we do with them, it asked about valuable questions you might want to ask, that it's not always easy to find and ask those questions, and that data science and big data simply are tools that then allow you to answer those questions, and maybe to do it with questions you could not answer before.

Little to say here other that I absolutely enjoyed the conversation as a story how some pretty smart guy moves through an amazing number of interesting places throughtout his career. Also, this conversation made me want to join Pivotal.

The goal is provide evidence that your result is correct. Is it possible to generate evidence of correctness without making things completely reproducible?

OSTP guidelines have affected the thinking about practices and tools when it comes to data management and publishing. It seems like the topic of reproducibility has gone mainstream, and it is even mentioned in non-specialist publications.

Fernando talks about how there are some offerings in the curriculum, but there is not enough training, and based on the current structure, these offerings do not meet the demand. More content is needed in terms of matching various education levels, and various scientific disciplines where data science methods are relevant.

The not very surprising answer to this question is: It depends. Data may allow us to learn interesting things, if we look in the right places. But you need to look, and it does not necessarily help you with explaining what is going on.

Since I have a lot of interest in geospatial data and services, I thoroughly enjoyed this talk about Berkeley's Geospatial Innovation Facility. Like Alan McConchie's talk, this talk illustrated that humans are very spatial creatures, and that a lot of data that we have does have a spatial component to it. Combine these two facts, and it becomes apparent that geospatial data and services are a very large and important space, when it comes to Big Data and/or Data Science.