Monday, December 21, 2009

Last week, I gave a talk to my nephew's geography club at. He's in middle school, and it gives me immense pride that he is so fascinated by geography. When I was a bit younger than him, I was thrown out of a geography contest in my class. Apparently it wasn't fair to the other team to have me be able to identify all the U.S. states by shape. I have high hopes for my nephew that the same thing will happen to him some day.

It was amazing to see this club though. About 25 or 30 students, all eager to see a demo of Google Earth. When I give a talk now, no matter the age group, it's the same result. But these kids instinctively understand the 3d nature of the application, and are curious about subjects like "How do you get the imagery to conform to the terrain when you drape it over?" "Geometry," I said, getting a big smile from the teacher. All of the questions were at least as smart as the questions adults ask me, leading me to think that either these are special kids, or there is hope for America's schools.

I'm pushing 41 this coming February, and I'm old enough to remember when maps were only on paper. There was a presence to maps, holding it in your hand, tracing it with your finger. You could put a map of the city in your pocket, literally carrying the city around with you. The way you folded it was really important, a good test of whether someone was a maps person or not.

I confess the other way I loved maps was a gamer. Those hexagons on a map, moving chits around to conquer territories. I loved it. As an early teen, we'd read game maps and make up pronunciations for city names and argue over them. Back when people argued about facts instead of looking them up.

Don't get me wrong, I love Google Maps on my computer, I love the slippy nature of it, and the ability to manipulate it in ways that I couldn't have with a paper map. But working with mobile mapping has brought back some of that old feeling. Sticking the phone with the map open into my pocket. Moving it around with my finger. There's a physicality to it that I miss on my non-mobile computer. Sure, I don't have to learn a special folding technique. But still, I can carry the city around with me. And best yet, when I pull it out, it'll be right there, showing me where I am without me having to figure it out. OK, maybe I'll miss the figuring it out part.

Wednesday, September 30, 2009

In my last post, I mused about the state of GeoWeb standards and wandered off into a discussion mostly on findability and linking between different files. I've been thinking more about the HTML web and how we use it as a model for what we call the GeoWeb.

The HTML web famously starts with Tim Burners Lee, Robert Cailliau, and their famous 20 tags, 13 of which we still use today. HTML sprang out of SGML, the Standard General Markup Language, but was far simpler, and, well, usable. We all know that the web took off like a rocket, and now the Web is synonymous with the Internet in the eyes of many people. HTML is the vehicle, but as developers know, it is only a part of web. Granted, it's the part that the other parts all depend on, but here's a partial list of the technologies without which the Web as we know it would be fundamentally different:

1) HTML

2) HTTP

3) JavaScript

4) XML

5) JSON

6) Flash

7) CSS

OK, Flash is controversial, I'll give you that, but you have to agree, the Web would be really different without it. Some of you would say better, but very different. Of course, there are other technologies people could put in there, like ASP, PHP, Python, etc. My point isn't the specific technologies, though I'll talk about JavaScript in a minute, but the fact that any developer could come up with this list. I'd like to compare that to a list an enterprise apps developer might come up with:

1) Java

A much shorter list. Again, you could argue about the content of the list, but the fact is, for many other domains, a developer can learn one language, one piece of technology, or perhaps two, and that's it.

So the web is different. It is hard to imagine a serious web designer now who doesn't know JS and HTML and probably PHP or Python or some other server-side scripting language. Maybe Flash instead of JS, but you get the idea. I call it the HTML Web because without HTML the rest couldn't work. But, it's hard to imagine a serious web site without using these other technologies.

I was thinking about this because I am asked all the time, why doesn't Google add JavaScript or some other scripting capabilities to KML. Of course, we no longer own KML, we gave it to the OGC, so we can't just "add" something to KML, aside from our own language extensions. But the question shows that there is a fundamental desire on behalf of developers to have that functionality. Serious GeoWeb developers learn a variety of technologies too. For a basic Google Maps mashup (or Bing, or Yahoo! Maps too), you need at least HTML, CSS, and JavaScript. More complicated mashups will use KML, GeoRSS, or GeoJSON, maybe Flash instead of JS, and wait. Notice the first one of those technologies, HTML. Still fundamentally, the GeoWeb relies on HTML. Sure, with a GeoWeb browser like Google Earth, you don't need HTML, though you can use it in the description balloons. But most mashups require HTML as the carrier.

So for the GeoWeb, we've learned the lessons of the HTML web. The question I'm asking, I guess, is, is this the right thing? Is building on the HTML model, with it's confusing amalgam of technologies, many of which don't work easily together, really the best model for the GeoWeb. I don't have the definitive answer to that question, but I think it's worth asking. Would it be better if we had a single technology to create Maps and distribute them?

OK, I have a preliminary answer, yes it would be better to follow the HTML model, and have an amalgam of technologies than to be locked into a single one.

Tuesday, September 22, 2009

Google is putting on Partnership Exploration Workshops and a hackathon in a couple of weeks. This DC event is going to be great. How do I know? I'm helping to plan it. The focus will be on Geo and Visualization technologies. Anyway, you should come. Here's more info:

Partnership Exploration Workshops: Your Mission & Google Technology

We are happy to invite you to an interactive session with Google product experts and Google.org members, featuring a keynote address by Craig Fugate, FEMA Administrator, and followed by an optional developer hackathon.

There's no doubt that technology, developed both in-house and by 3rd parties, plays a role in helping your organization. With the convergence of GPS and mobile devices, the emergence of crowdsourcing tools, and improving large data set visualization capabilities, new opportunities to guide, manage and empower your field operations are opening up everyday.

Join us for a day of interactive workshops where we explore how Google geo & data visualization technologies can further your mission. Craig Fugate, FEMA Administrator, will give a keynote address on technology and disaster preparedness. Google product experts will provide overviews and analysis of recent developments and upcoming innovations for Google Earth, Maps, Map Maker, Fusion Tables and more.

Before the workshops, we will work with you individually to identify projects that might benefit from Google technology. Select attendees will present and lead discussions on their geo & data visualization technology needs. In addition, we will host an optional developer hackathon the day after the workshop to kick-start your integration of Google technologies. We look forward to seeing you there!

Tuesday, July 28, 2009

Andrew Turner recently asked for our thoughts on GeoWeb standards, and I thought I'd put it as a post here instead of cluttering too much of his comment stream.

I've been thinking about the different standards and their place in the world a lot recently. I'm not someone who takes strong stances on anything, and you're not, I hope, going to read this post and think that I'm a KML partisan, and that it's only because I work at Google that I think positive thoughts about it. I prefer instead to explore the problem space.

The problem isn't adoption, clearly. It's findability.

Adoption rate:

There's no question that KML has a phenomenal adoption rate. Michael Jones went over the numbers during his GeoWeb talk, but in case you missed it:

More than 500,000,000 KML/KMZ documents on the Internet

More than 250,000 Internet websites hosting KML/KMZ content

2 billion placemarks accessible on the public Internet

Those are staggering numbers, especially compared to just last year when Google announced it had indexed tens of millions of KML files on a hundred thousand unique domains. Growth by a power of ten in a year is a lot.

GeoRSS has also expanded rapidly. I don't have numbers on it, but I'm sure it's also a very large number.

There are other formats, too, like GeoJSON, that are great, and I really look forward to seeing what happens with them.

Findability

Frankly, I think we can do a much better job. Fundamentally, one of the problems is that geographic data doesn't lend itself well to linkablity. Sure, you can link within the data, but few people do. A limited number of KML files link to other KML files. GeoRSS can contain a variety of links, but often not to other geographic data files, but rather to HTML or binary media.

KML has been described as the HTML of geographic data. Whether that's true or not is a matter of some discussion, though I happen to think it is (more on that in another post I guess, after lots of people tell me I'm full of it). But one of the principle characteristics of HMTL is linking, which is weakly implemented in KML. Linking happens in two places, the Atom link element, and in the description balloon. Atom links usually refer to HTML media, as in "this is the site credited with authorship." In the description balloon, you're operating in essentially an HTML environment, leading people to be less likely to author KML with links to other KML files, but rather with links to HTML. When authors do put links to KML, it's mostly within their site, not to other KML files elsewhere.

My point isn't to encourage people to link to KMLs created by others, but rather that for findability purposes, on the HTML web we rely primarily on a link structure. The early web was made up of pages that delivered content, and linked to other sites. Whole pages developed early as directories of other sites, and they linked to other directories. Google web search was built on using the number and authority of links of others to rank pages. The "GeoWeb" isn't really a web in the same way. It uses the technologies that built the web, that live on the web, but it itself doesn't constitute a web in a meaningful sense. The vast majority of links to geographic data that I've found are HTML links within full HTML pages, with the next set being programmatic.

Is that the nature of geographic data? Or have we just not found the true linkability of it? I tend to think it's the former. Geographic data is heirarchical, it is ontological, it is content rich, it is combinable. It is linkable through common ontologies. But geographic data doesn't lend itself to easy linking in the same way. It's the nature of structured data, it must relate to a structure. Ontologies are almost the antithesis of linkablity outside the domain.

So that suggests that we need to find another mechanism for findability. Deep searches are possible, but generally when you want geographic data, you either want points on a map, "This is where that thing is" which is fairly easy to do, and I think we've done it well. Or, you want a metadata search of some kind, "Give me all the polygons that fall within this bounding box, and where property X is between Y and Z." That no one does well on a global scale, only within limited sets of data. Searching on text is great for web pages, because they are composed primarily of text. But searching for data is a whole other problem not easily solved by our current mechanisms.

Some people have written about using Semantic Web technologies to provide the linking, and particularly Andrew notes LinkedGeoData in his comments on his blog post. I've always been of the opinion that the Semantic Web is too complex. One of the joys of HTML is the ease by which you can link pages. The authoring tools aren't really there yet either. I'd be happy to be proved wrong. I used to think that standards, like RDF, that have languished for so long will never take off. However, the explosion of Ajax in the last few years has made me less skeptical. I don't know if the Semantic Web is the technology of the future and it always will be, or if it will actually take off. I remain fairly skeptical however, and as yet there's no widely adopted viewer for it either.

Combinability

Perhaps the true value of XML based formats comes from their combinability. Whether it's Atom (or RSS) and GML to make GeoRSS, or Atom and KML to produce a Google Data API, or Atom and KML to produce, well, KML containing Atom. This greatly increases their usability, and I think I sense another post coming on since this one is getting long. But my point being the XML standards provide the only really good way of doing this while retaining proper namespaces. The downside is, of course, the verbocity of XML and the pain of XML schema.

Wrap Up

Don't get me wrong, I think that KML and GeoRSS are great, as are a lot of other formats I haven't mentioned, like GeoJSON and others. Andrew asked also about other interesting topics, like expressiveness and durability, which I haven't gotten to. Ultimately, though, if we can't solve the findability problem, other technologies will come in that do.

Monday, July 6, 2009

I took a vacation immediately after GRUPP, so I didn't write a follow-up post to my Day 1 post. But I did want to just follow-up with a couple of resources that stood out for me beyond what I already wrote:

Open Data KitBuilt on JavaRosa, it's designed specifically for Android, and to take advantage of all the awesomeness that is an Android phone.

I've decided in the coming weeks to play with ODK and blog about it, and try to think about other solutions for mobile offline/online data collection. I haven't done mobile development before, so this will be interesting.

Tuesday, June 23, 2009

I'm at the GCamp@RUPP meeting in Phnom Penh right now, hosted by the Royal University of Phnom Penh. The subtitle is "Exploring Emerging Technologies to Address Emerging Infections."

Wow, data collection is hard. I knew this from spending 10 years designing and implementing systems in non-profits in California, but I had no idea what the challenges were in the developing world. Many of us here from Google, primarily engineers, came here with some naive notions of how we can help. Here are just some of the issues we've been rocked by:

Lack of connectivity: So, it is common knowledge that SMS is used the world over, aside from the US, often more than voice connections. But in some emerging countries, it isn't. Think about this: Khmer, the language of Cambodia, isn't yet represented in Unicode. No UNICODE! Apparently, they just didn't have people representing Khmer during the meetings that setup Unicode. Klingon and Tolkein's Elivish are in Unicode, but not Khmer. This has a deep impact on technology: SMS doesn't work in Khmer. Plus no phones are made with Khmer keyboards, and to even produce an app, like a form, you have to render Khmer as an image file. Fortunately, this is being remedied, but it'll be a long time before phones and computers can support it.

Incentives: What incentive do people have to provide information? User generated content needs to give something back, and not create perverse incentives.

Unavailability of data: Many in the Open Source community want all data to be free. But many governments have incentives not to share. For instance, a disease outbreak can cause a catastrophic drop-off in tourism. So if the government doesn't share the information that there is one, then people are less likely to stay away. See also: Privacy

Thursday, March 12, 2009

I've been thinking about this a lot. I've noticed that my private blogging, which is not available to the world only a few friends, has dropped precipitiously since I started using Twitter. And this blog, which I thought I would use more, hasn't had a lot of posts either. This is definitely the Twitter effect on me, and I kind of like it. Feeling Entropy has been thinking about this too, likening quick easy blogging to fast food, as opposed to longer well crafted posts.

Here's the difference, for me:

I try to use the blog for long, well thought out posts. I don't really care for posts that are simply a link and "Hey, look at this!". Not that I mind linking to other sites, as long as the post does a nice job explaining why it's important that you examine this particular link. I prefer to use blogs as a discussion, an explanation of a topic.

Twitter, on the other hand, has become my fast food post. The 140 character limit allows me to just do a quick "Hey, look at this!" post. Also, people seem to be scanning their Twitter feeds more frequently than their blog readers. Personally, I have almost 1000 unread blog posts in Google Reader. I say almost, because I fight to have it under 1000 as Reader flips a bit and says 1000+. Apparently, I like to know the numbers exactly. But I do scan almost all the 143 people I'm following on Twitter daily.

So, where is all this leading us? Are we truly going down the route, as Feeling Entropy says, of a fast food blogging culture? Now that it is easy, we'll produce less quality? I don't know. But strangely, I feel as if my Twitter feed is having more impact than this blog.

Or maybe that's just because the Twitter effect means I write here less.

Saturday, February 14, 2009

For those of you still trying to find love, or who want to get a date for next year, I put together this sample app. Oh, BTW, careful, some of the images might not be safe for work. Sometimes people put risque things in their profiles.

I queried personal ads in Google Base using the Base API. Several different personals sites push their ads into Base to make them more discoverable. Base allows for location based queries, including a bounding box query that looks like this:

That query returns an Atom feed of personal ads within the bounding box. Don't worry, they don't put addresses in the ads, only City, State/Province, and Postal Code. And Base allows you to directly geocode within the query, returning additional g:latitude and g:longitude elements, to save you the hassle.

KML provides a convenient View Based Refresh. Simply put a viewRefreshMode of in a Link element, and Earth and Maps will send query parameters defining the bounding box of what is visible to a server. So, I put a simply python script up on an App Engine application, let it parse the bounding box parameters, and generate the queries, returning new KML every time the view pauses for more than a few seconds. Then I created a simple Earth API page, nothing fancy since it was getting late and I was tired, and loaded up the KML.

I could have used one of the Google Data Client Libraries, OK really only the Python one because I was using App Engine. But frankly, it was such a simple query, and I really love raw XML, I decided to stay with direct querying and DOM parsing.

If you want more info on View Based Refresh, look at the KML reference for viewFormat.