Archives

**Keep this in mind: Anyone familiar with the game of Jeopardy knows that an integral part of getting the answer right is the ability to quickly frame it as a question….

A couple of sources I’ve recently read have tried to make a valid argument against any seamless implementation of a semantic or natural language web. An arugument? People are trained to keyword search. I hesitate to consider really the root of their beef–are they saying we won’t make the shift from phrases to questions? That’s the bare bones difference between a keyword (Google-type) search query and a natural language (Hakia, Powerset) search query. Query being the actual thing you type into the search field, the “prompters” intended to stir up the information for which you’re looking.

Okay, I know that journalists and pundits are in the biz of creating talk, the more controversy the better, but an arugment like this–that we’re already patterned to work in a keyword inspired environment, just seems lame to me. It also relegates our species back to that of caveman. Remember the Jeopardy rules…

Steven Pinker, in his brand new book, The Stuff of Thought, argues that even as children we are driven to instinctively emulate simple to complex language patterns even well before we have formal instruction. Linguistic pattern as a right of belonging, as part of our human-ness, is elementary, it is fundamental in our collective DNA.

So, NO, we’ll catch on quickly if and when a natural language engine rises to the top. AND if our businesses rest on it, you better believe the learning curve will be short-lived. I mean look at AdWords….when it first launched only the very maverick web marketers and SEOs took to it–were able to instantly rope and tie it. But now, just a few years later (and, yes, many millions of businesses make their money from it) it’s an absolute essential component in any savvy business strategy. If people can learn: how to build a marketing strategy with AdWords, how to navigate and sell everything on eBay, and shop for anything online, then casting a search query in natural language, framed as a question, seems quite….um, natural.

Virtual Barn-Raising Underway–How to feel as if you’re part of something big

The next generation web, whatever catchphrase will finally be attached to it, is hardly a magical divination. There is currently very tedious work going on, especially in deeper recesses where labels like “semantic” and “natural language” are pitched out onto the field.

Private beta versions of well known natural language projects:

Powerset

Textdigger

True Knowledge

have employed legions of invitation-only volunteers chomping at the bit to count themselves as contributors to search history. And they are eyeball deep in seriously, tedious work–do the demo search queries deliver? are there ample utilities to allow users to pursue a thread? bug after bug after bug is unearthed .

Language is so much more expansive, so attached to meanings that the undertakings seem ultimately like nothing more than pipe dreams intended to keep the curious busily chipping away at search query tasks and random discussions, ad nauseum.

Is it possible to corral linguistics, to ultimately fill the search engine brain space with the whole cornucopia of semantic meaning, so nothing, not even the smallest nuance, is left out?

If everyone pitched in…imagine… perhaps the task could be dispatched with more promptly, eh? Take Wikipedia–currently the single largest source for most of the online, cataloged reference material. And still no end in sight to the wealth of topics yet to be explored or expanded upon.

You can help: why not pitch in to the Wikipedia project? The next best thing to a 60s “happening” we have right now. Did you realize that anyone can add information to Wikipedia? What do you know that the rest of us don’t? Maybe you won’t be accurate with some bit of data? Don’t worry–editors come behind and clean up, other contributors fill in the blanks, so to speak. But I’ll bet everyone has something he or she could contribute. I guarantee you’ll feel as if you’re part of some vast experiment, a cosmic odd job that has much deeper meaning than you realize.

Welcome to the virtual barn-raising, folks. Pick up a hammer and a nail…

Accuracy of new semantic applications may be directly related to semantic literacy.

Checking out AskWiki inspired me to consider how the new generation of semantic applications — AskWiki and Freebase, among others –will rely heavily on user-generated “knowledge.” But what will happen if users don’t completely understand the semantic templates or commit to comprehensive and concise data contributions?

According to AskWiki — a semantic vehicle with which users may pull Wikipedia information– Wikipedia’s shortcoming right now is in its template info boxes. If you are familiar with the Wikipedia layout then you have no doubt seen the information boxes located along the righthand side of each Wiki page, but they are quite different, one to another. One of AskWiki’s proposals includes a standardization of the Wiki Info box. Here’s why:

“AskWiki often grabs data from Infoboxes. There are thousands of infobox types, many of which are misused. For example, it would be great for all ‘actor’ pages to use the ‘Infobox actor’ template, instead of the generic ‘Infobox person’ or ‘Biography’. Also, some cities have a standard ‘Infobox City’, some others have ‘Settlement’, some have an external-page-included infobox. Switching all these items to a standard Infobox would improve the accuracy of AskWiki.” (AskWiki, How it Works)

Right now here is a template for a basic info box in Wiki:

Wikipedia reports that its infoboxes “may be ‘freeform’ or partially automated based on parameters.” (Wikipedia: Category:Infobox Templates).

If user-generated applications must be standardized and concise then instructions and tutorials that invite the Everyman to add his/her data must: 1. welcome input, 2. emphasize conciseness without being overly technical.

Making a natural language search engine for the masses–keep it on the DL.

True Knowledge is an internet search company that has produced as a key product, a new natural language search engine, although they don’t say that. Also missing is “semantic.” I only mention this because it seems to me that True Knowledge and Freebase before it, have put high-level techno terms like these at arm’s length. Maybe it makes them more palatable, maybe it is less intimidating, maybe everyday folks will be a lot more inclined to add such a search engine to their daily search habits if it looks and smells not too unlike another.

Here’s what True Knowledge proposes to do: deliver “direct” and concise answers to natural language queries. For example, TK’s example “is jennifer lopez single?” received a direct “NO” — parsed from the data existing in the SE’s database, plus highlighted results for the natural language query, PLUS ordinary keyword search results in case TK is unable to find the information–“at least you’ll have the results you’d have on any other given day.”

Right now the engine has just launched a private beta version, so it’s anyone’s guess how long it will be before she’s open for business. But if you’d like a preview of the API and the technology behind this newest search engine effort, then take a look at the site. There’s a video introduction that gives about a 7 minute run-down of the features and benefits of True Knowledge.

An Alpha database casually inviting everyone to add their special know-how.

Freebase, an intended database of “the world’s knowledge,” powered by MetaWeb, may appear to be many different things depending upon your perspective/vantage point.

In its own words: “Freebase is an open database of the world’s information.” The goal: collect and structure the universe of data for an unending variety of uses.

Freebase does not use the word “semantic” anywhere to describe itself or what it’s uses are; it’s others on the outside who’ve identified it as inherently semantic. Why? Maybe for the way it unabashedly takes type sets and invisibly transforms them to the building blocks–schema– that are used to construct semantic-based applications.

Freebase dreams of being every developer’s next crush. While it’s potentially sewing together the universe of types, it comes off as a toy for the Everyman: you can add data, I can add data, we all can add data. And if you need a quick lesson on data types Freebase delivers in pared down, layman’s terms. This may be its greatest asset, it’s appeal, it’s friend-next-door quality. Drop by and drop a new data set while you’re at it, in your free time.

If you’re wondering what you

can do with Freebase here’s what they say to questioning passersby:

“There are five primary things you can do with Freebase:

1) Browse and read information…..

2) Edit existing data….

3) Upload new data sets….

4) Build applications that use Freebase data….

5) Suggest and create schemas for new data….”

If this doesn’t exactly sell it, then try this: if you know things about a particular location or a fictional world (domains), Freebase invites you to share your information, for the eventual use of others. You’d add the types to the domains. For example: to the domain “fictional universe” you could share your fictional character (type) and then the character, Dumbledore, for example. right now over 15,000 fictional characters exist under this type, but there is a world of yet to be created types.

This charismatic appeal was noted by O’Reilly Radar, too: “users don’t think they are providing metadata — they think they are just providing data.”

Right now there already exists enough data on topics like art and entertainment to produce correct and comprehensive results for “complex queries.”

GoPubMed is a new semantic search engine designed to deliver the ultimate research muscle to the biomedical, medical, and life sciences realm. Researchers, scientists, and general users may gain quicker more “cross-pollinated” search results for deeply layered data requests.

One of the more intriguing features of GoPubMed is the “Hot Research.” This is a nifty trend-spotter: the site’s example offers a side-by-side graphical comparison that maps trends in alzheimer’s research. There are also tools to pan for statistics on certain study authors, cities, and journals. For instance, find out who is most feverishly carving away at research related to, let’s say, neurolinguistics, from which city(ies) the most published work is coming from on the topic, and plot all of it on a cluster map of the world. Or find out who is collaborating with whom on neurolinguistics–see it visually presented as a cartographic mash-up of scientific collaboration, a partially connected mesh network, if ever I saw one.

Search results are presented with highlighted text. There are quite a few options for targeting more relevant results: A left-hand navigation field displays a “semantic” list of category choices collected from Gene Ontology (GO) and Medical Subject Headings (MeSH) classes. Use this to pinpoint results more rapidly. Also search results are accessorized with button options: link to related Wikipedia articles, “toggle” document views, and even flag a result that does not belong.

GoPubMed is on the leading edge of accessible search tools for life sciences and biomedical work groups. Many corporations already possess robust semantic middleware behind closed doors. GoPubMed attempts to give users a variety of data formats. Might we see these same components slightly renovated for other types of semantic tools? Could it be that we may see the semantic web and natural language search grow more rapidly when the medical and corporate demands (knowledge “needers”) lead the way?

Some search engines vie for fairy dust, others just ante up the goods.

Semantic search engines vie to harness the same fairy dust as did Google–once upon a time. But charismatic, enigmatic, and dismissive geeky upstarts that make billions upon billions of dollars of course earn as many foes as they do dough. My point is that Google’s limelight is still enviable and new search engines like the mysterious Powerset and Hakia are in line.

But, outside the West Coast Search celebrity there are other semantic web forces to be reckoned with that have actually been chipping away at the natural language lexicon for years.

Cyc, “the world’s largest and most complete general knowledge base and commonsense reasoning engine,” is already a working product for corporate and industrial users. I’ve just happened on it and am still trying to digest the literature, but it reminds me of a mini IBM WebFountain, without the supercomputing gusto, but, a powerful engine, nevertheless, that has already “learned” <…..THIS much……>

“What does Cyc know?” According to founder and developer, Doug Lenat, Cyc is able to negotiate this question: “Which American city would be most vulnerable to an anthrax attack during summer?”

Where can I get my very own Cyc?

Like most open source systems designed for industrial applications, Cyc is not nearly as consumer-friendly as a Powerset or Hakia. There is no intuitive, slick little interface. Instead the Cyc main page smacks of the magic language of software developers and Unix users and computer wizards. This is middleware land, Middle Earth, Hobbit-ville. Which makes it all the more enigmatic–I want one.