I'm a power hater. I don't hate often, but when I do, I do it with gusto. So I have to say, this pile of vaporware called "The Semantic Web" is really starting to tick me off...

I'm not sure why, but recently it seems to be rearing its ugly head again in the information management industry, and wooing new potential victims (like Yahoo). I think its trying to ride the coattails of Web 2.0 -- particularly folksonomies and microformats. Nevertheless, I feel the need to expose it as the massive waste of time, energy, and brainpower that it is. People should stay focused on the very solvable problem of context, and thoroughly avoid the pipe dreams about semantics. Keep it simple, and you'll be much happier.

First, let's review what the "Semantic Web" is supposed to be... A semantic web is about a system that understands the meaning of web pages, and not merely the words on the page. Its about embedding information in your pages so computers can understand what things are, and how they are related. Such a beast would have tremendous value:

"I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A ‘Semantic Web’, which should make this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The ‘intelligent agents’ people have touted for ages will finally materialize." -- Tim Berners-Lee, Director of the W3C, 1999

Gee. A future where human thought is irrelevant. How fun.

First, notice that this quote was from 1999. Its been ten years since Timmy complained that the semantic web was taking too long to materialize. So what has the W3C got to show for their decade of effort? A bunch of bloated XML formats that nobody uses... because we apparently needed more of those. By way of comparison, Timmy released the first web server on August 6, 1991... Within 3 years there were 4 public search engines, a solid web browser, and a million web pages. If there was actually any value in the "Semantic Web," why hasn't it emerged some time in the past 18 years?

I believe the problem is that Timmy is blinded by a vision and he can't let go... I hate to put it this way, but when compared against all other software pioneers, Timmy's kind of a one trick pony. He invented the HTTP protocol and the web server, and he continues to milk that for new awards every year... while never acknowledging the fact that the web's true turning point was when Marc Andreessen invented the Mosaic Web Browser. I'm positive Timmy's a lot smarter than I, but he seems stuck in a loop that his ego won't let him get out of.

The past 10,000 years of civilization has taught us the same things over and over: machines cannot replace people, they can only make people more productive by automating the mundane. Once machines become capable of solving the "hard problems," some wacky human goes off and finds even harder problems that machines can't solve alone... which then creates demand for humans to solve that next problem alone, or build a new kind of machine to do so.

Seriously... this is all just basic economics...

Computers can only do what they are told; they never "understand" anything. There will always be a noticeable gap between how a computer works, and how a human thinks. All software programs are based on symbol manipulation, which is a far cry from processing a semantically rich paragraph about the meaning of data. Well... isn't it possible to create a software program that uses symbol manipulation to "understand" semantics? Mathematicians, psychologists, and philosophers say "hell no..."

The Chinese Room thought experiment pretty clearly demonstrates that a symbol manipulation machine can never achieve true "human" intelligence. This is not to imply human brains are the only way to go... merely that if your goal is to mimic a human you're out of luck. Even worse, Gödel's Incompleteness Theorem proves that all systems of formal logic (mathematics, software, algorithms, etc.) are fundamentally error-prone. They sometimes cannot prove the truth of a true statement, and other times they prove the truth of false statements! Clearly, there are fundamental limits to what computers can do, one of which is to understand "meaning".

Therefore, even in theory, a true "semantic web" is impossible...

Well... who the hell cares about philosophical purity, anyway? There are many artificial intelligence experts working on the semantic web, and they rightly observe that the system doesn't have to be equivalent to human intelligence... As long as the system behaves like it has human intelligence, that's good enough. This is pretty much the Turing Test for artificial intelligence. If a human judge interacts with a machine, and the judge believes he is interacting with a real live human, then the machine has passed the test. This is what some call "weak" artificial intelligence.

Essentially, If it walks like a duck, and talks like a duck, then its a duck...

Fair enough... So, since we can't give birth to true AI, we'll get a jumble of smaller systems that together might behave like a real, live human. Or at least a duck. This means a lot of hardware, a lot of software, a lot of data entry, and a lot of maintenance. Ideally these systems would be little "agents" that search for knowledge on the web, and "learn" on their own... but there will always be a need for human intervention and sanity checks to make sure the "smart agents" are functioning properly.

That raises the question, how much human effort is involved in maintaining a system that behaves like a "weak" semantic web? Is the extra effort worth it when compared to a blend of simpler tools and manual processes?

Unfortunately, we don't have the data to answer this question. Nobody can say, because nobody has gotten even close to building a "weak" semantic web with much breadth... Timmy himself has said "This simple idea, however, remains largely unrealized" in 2006. Some people have seen success with highly specialized information management problems, that had strict vocabularies. However, I'd wager that they would have equivalent success with simpler tools like a controlled thesaurus, embedded metadata, a search engine, or pretty much any relational database in existence. That ain't rocket science, and each alternative is older than the web itself...

Now... to get the "weak semantic web" we'll need to scale up from one highly specialized problem to the entire internet... which yields a bewildering series of problems:

Who gets to tag their web pages with metadata about what the page is "about"?

What about SPAM? There's a damn good reason why search engines in the 90s began to ignore the "keywords" meta tag.

Who will maintain the billions of data structures necessary to explain everything on the web?

What about novices? Bad metadata and bad structures dilute the entire system, so each one of those billion formats will require years of negotiation between experts.

Who gets to "kick out" bad metadata pages, to prevent pollution of the semantic web?

What about vandals? I could get you de-ranked and de-listed if you fail to observe all ten billion rules.

Who gets to absorb web pages to extract the knowledge?

What about copyrights? Your "smart agent" could be a "derivative work," so some of the best content may remain hidden.

Who gets to track behavior to validate the semantic model?

What about privacy? If my clicks help you sell to others, I should be compensated.

Will we require people to share analytical data so the semantic web can grow?

What about incentives? Nobody using the web for commerce will share, unless there's a clear profit path.

I'm sorry... but you're fighting basic human nature if you expect all this to happen... my feeling is that for most "real world" problems, a "semantic web" is far from the most practical solution.

So, where does this leave us? We're not hopeless, we're just misguided. We need to come down a little, and be reasonable about what is and is not feasible. I'd prefer if people worked towards the much more reachable goal of context sensitivity. Just make systems that gather a little bit more information about a user's behavior, who they are, what they view, and how they organize it. This is just a blend of identity management, metadata management, context management, and web trend analysis. That ain't rocket science... And don't think for one second that you can replace humans with technology: instead, focus on making tools that allow humans to do their jobs better.

Of course, if the Semantic Web goes away, then I'll need to find something else to power hate. I'm open to suggestions...

I agree completely... the "semantic web" is just a solution in search of a problem. Its like they're saying "once we tag the bejesus out of every web page in existence, robots will emerge that 'understand' humans and will then choose to serve us, and make the internet easier for our customers to use!"

Um... sure... or you could just make your web site easier to navigate... or beef up your customer service department... or write better documentation... or invest in educational marketing programs... or about a bazillion other things that are easier to build than Artificial Intelligence.

"if people have to baby-sit it, it hasn't reach the potential desired in the first place, so someone will call it a failure and scrap it"

That's what I find so disappointing... people implement software solutions expecting technology to solve all their problems.

If you have an automation problem, then technology can improve efficiency tremendously, without much babysitting. But, if you have a communication problem, then the technology will need a system of human incentives and training before people will find value.

"Nobody can say, because nobody has gotten even close to building a "weak" semantic web with much breadth..."

"I'm sorry... but you're fighting basic human nature if you expect all this to happen... my feeling is that for most "real world" problems, a "semantic web" is far from the most practical solution."

Delightful. A happy coincidence (and surely not spite) will guilt/prod you to eat those exact words by Feb 12, 2009 (Darwin's 200th B-Day). Within 10 days we will launch an initiative that has succeeded in doing exactly what the Semantic Web 'intended' on doing - only we have it done. And it is dirt easy.

The author emailed me about this, and it was posted to Reddit... but then was taken *down* from Reddit... which is a highly unusual action for them to take. It didn't seem like much: just Twitter-like hashtags for web pages. Which -- frankly -- is probably just fine when left on Twitter.

Auto extraction of metadata can be a time saver, as long as people realize that such systems require extensive setup, and weekly maintenance. Otherwise, you will get bad metadata in your system... bad metadata is like a virus that destroys the value of the entire search collection, even if half of your content has good metadata.

Embedded metadata is pretty cool... and there are multiple ways to do the besides microformats. I like it because the metadata stays with the content as it moves between systems. However, because of SPAM concerns, you can only occasionally trust the metadata to be accurate. Also, because of security concerns, you usually want to remove embedded metadata once its too far removed from the original system.

In my opinion, these techincal problems are mostly solved... what we need now are good best-practices for their use.

Bex, love the post. I disagree a bit, but only in small details. I think that we will have the semantic web one day. What it needs to be created is simple, a smarter search engine.

Of course, such a search engine does not exist. As I told people back in 2001, my kids will sell you that system. Of course, my sons are now 4 and 1, and I'm not shifting my prediction yet. It will come, but we need a few more decades.

In the meantime, there is the question, what will it solve. Simple. It will allow for my final vision of knowledge management. I ask it a question and I get the answer derived from all sources, internal/external and structured/unstructured. What goes along with that great search engine is an equally sophisticated input device, one that not only takes what you ask, but detects the nuances in the question.

It'll get here, but hopefully I'll be sitting back drinking beers on a more full-time basis. Until then, I'm with you. People should just start working on the things that we can do in the next few years and let us evolve towards the coolness of the semantic web (as if it'll still be called that in 20 years).

" I think that we will have the semantic web one day. What it needs to be created is simple, a smarter search engine. Of course, such a search engine does not exist. As I told people back in 2001, my kids will sell you that system. Of course, my sons are now 4 and 1, and I'm not shifting my prediction yet. It will come, but we need a few more decades."

perhaps... but I think people who chase AI in general and the semantic web in particular kind of miss the point. Its about the human-computer interaction layer. I have a post on that in the works... but essentially there will ALWAYS be a gap between what computers can do, and what humans can do... this gap is sometimes in favor of computers, but never in any meaningful way.

The semantic web is just my latest punching bag to prove this point.

Even in a fantasy world where computers are sentient, humans will always have the edge. Why? Cyborgs. Robot athletes, robot soldiers, robot scientists will never be able to compete with humans... because in order to that to happen, we would nee drastic advances in AI, which would necessitate drastic advances in understanding conciousness, which would require vastly greater understanding of neurobiology... at which point, many people would say "who cares about AI? Just give my brain a spare memory bank and some extra CPUs! While you're add it, give me the Astrophysics and Judo modules, and a M-16 attachment."

In short, humans with embedded computers will be a much more practical option before Artificial Intelligence ever is... That never-ending drive to be "what's next" -- whether its arrogance, vanity, or hubris -- is really the only thing that will ensure human survival.

It's also why I think the premises behind movies like The Matrix and The Terminator are fundamentally flawed...

Hi,
I completely disagree, but not because the claims are wrong, but because your interpretation of the vision and end-goal is wrong.
The semantic web is not intended to be some kind of super-brain to replace human thinking - it was never intended to be something like that. Whoever claims it should be that misses the point of the semantic web.
It is true that the term "semantic" may be a bit confusing in this sense, since it's a widely overloaded term; btw, TBL has even stated himself, that this may have a poor choice for a name. The "Data Web" may have been a better name. See also here: http://www.youtube.com/watch?v=mVFY52CH6Bc&eurl=http://sachbak.blogspot.com/search/label/semanticweb

But setting the name issue aside, the vision and current practice of the semantic web does not evolve around making machines think, but rather on letting machines make the connections between apparently disparate data sources. It's about letting computers do what they're best at - processing large quantities of data fast. Which is something that we humans will never do as good as computers. So instead of us humans working hard to make the connections and sifting through huge amounts of data to find recurring patterns or matching metadata - why not let the machines do that? and let them do it on world-wide scale - on the internet. We're at a time where information production rates are only increasing. It's best to acknowledge that we can't really handle these huge amounts of information - this is why we build tools (machines\computers) to help us. Humankind has done exactly that through the ages - we have invented tools to help us handle our environmental challenges better.
If you take environmental and medical research as examples, you realize that data gathering and producing data is not the issue - the problem lies in going through huge amounts of data and making the necessary connections.

So if you understand the vision and direction, the question that remains is more of an engineering nature - how can this be achieved?
Enter the semantic web technologies (which you refer to as bloated XML specifications). These are essentially tools that should help us reach our goal - allowing machines to communicate data and make inferences generically enough, on an internet-scale systems. This is challenging and I bet we still have a lot to learn, but I would hardly discard these tools as inappropriate or irrelevant. Granted, they may not be perfect, and there's probably a lot more work to be done, but that usually the way with these things.
By the way, other, more qualified, people have said it even better than me: http://www.mindswap.org/blog/2007/11/21/shirkyng-my-responsibility/

The semantic web is not intended to be some kind of super-brain to replace human thinking - it was never intended to be something like that. Whoever claims it should be that misses the point of the semantic web.

Incorrect. That is exactly what the inventors and proponents of the semantic web claim it is suppose to be. They may have scaled down their dreams because a "semantic web" is pretty nearly impossible... but then they should stop calling it "semantic." I'd be all over calling it the "data web," but even that has serious limitations.

So instead of us humans working hard to make the connections and sifting through huge amounts of data to find recurring patterns or matching metadata - why not let the machines do that? and let them do it on world-wide scale - on the internet.

Because in the past 20 years we've already tried that... twice... it failed both times for the exact same reasons... the "Semantic Web" is will also fail for the exact same reasons. Its like frigging Groundhog Day...

Am I the only one alive who remembers this exact same claim being put forward by the proponents of XML? And the proponents of the "Keyword" meta tag? It was the same song and dance. They claimed that if we embedded easily parsable data structures in our web pages, then machines could more easily extract the data, and run smarter searches. Both failed for the same social reasons, and the "semantic web" has done nothing to address these limitations:

It requires a great deal of time and energy to collect and maintain good data, especially if you use automated metadata-extraction tools. Not everybody is willing to put in the extra effort... and those that do would always achieve better success by adding human information architects.

Bad metadata by spammers, hackers, and newbies pollute the value of the entire system, causing many to no longer trust it.

Most attempts to achieve a "data web" failed miserably, and were usually scrapped. Some had marginal success in controlled environments, however I maintain those people would have achieved greater success with an add-on database application, and standard data mining tools. Such tools have been web-enabled for well over a decade... so why wait for bizarre and untested "semantic web" vaporware to be released, when you can solve this problem right now?!?!

Anyway... thanks for the links... if I find something else to power hate in them, I'll blog further...

The "semantic web" isn't about making computers actually understand data, it's about making it easier for computers to sort through and categorize data. It's about tagging data according to what they represent rather than (or in addition to) how they should be displayed. And the point isn't to evolve the WWW into some giant, self-aware, artificially intelligent consciousness, but rather to make it possible to extract useful data from web pages and re-use it in ways that the original authors never imagined.

Any of the objections you raised (copyright, spam, vandals, novices, etc.) could have been raised relative to the WWW too. Indeed, the WWW suffers to some degree from all of them. And yet, the WWW is an unbelievable, rip-roaring success because the benefits outweigh the problems.

There won't be just one technology that makes the semantic web possible. There will be many, especially in the beginning... some will use XML, some will use XHTML, some will use microformats, and some will use ideas that haven't yet been imagined. The most useful of these will become de facto standards, and at that point we'll be off to a good start.

You can continue to rail against the notion of an uber-intelligent distributed hive-mind if you like, but it'd be a lot more helpful if you'd spend a few minutes to wrap your posts <rant></rant> tags so that the rest of us can start indexing them appropriately.

the point isn't to evolve the WWW into some giant, self-aware, artificially intelligent consciousness

You'd be surprised... I've seen some pretty insane theories from "semantic web" boosters that believe precisely this is a possibility.

The "semantic web" isn't about making computers actually understand data, it's about making it easier for computers to sort through and categorize data.

And how is that different than the failure of META tags? How is that different from the failure of XHTML? My point is that I see TONS of people saying "wouldn't it be nice if we all did X, then Y would be possible," but they never look into the PRACTICAL limitations of getting everybody on the planet to do X, do it honestly, and maintain it.

Case in point... when XML was trendy, people though it was only a matter of time before folks made all their web content available as "easily sortable and categorizable" XML. But that never happened on a large scale, did it? Folks called XML the "Esperanto for the Web," with no apparent irony about the failure of Esperanto. Probably the best success story of XML on the web is RSS...

Some folks will get limited use from microformats, tags, etc., in controlled environments... but unless the big problems are solved by much smarter folks than who they got running the W3C, then I see this being of limited value on the greater web.

(I know this is an old post but I just stumbled upon it and it's an interesting debate)

Technically, Bex is correct. The point about Gödel's Theorem is true. Predicates in facts expressed by RDF must be treated as axioms, so two facts directly contradicting each other would satisfy the Principles of Explosion, making Gödel's theorem pertinent (BTW, this is not always the case). However (as with lots of the problems of the Semantic Web listed in the article) the W3C specifications only lay out what can be implied from a URI, and that the predicate is the responsibility of the creator. Therefor the issue becomes one of interpreting the linked data.

So yes, it would be impossible to create actual intelligence, but no, it is not impossible to automate tasks using data which is machine interpretable. So if Bex still thinks that the Semantic Web can't work then, IMHO, Lior and Caleb are correct too; you have missed the point.

So if Bex still thinks that the Semantic Web can't work then, IMHO, Lior and Caleb are correct too; you have missed the point.

The title of the post is intended to be confrontational... I meant two things by it, which I hope was clear in the post:

It is theoretically impossible for the "Semantic Web" to do everything its proponents claim it can do

It is demonstrably impractical to use it for the kinds of data mining and automation current being done with other tools

I'm not saying it "can't work," I'm simply saying "nobody who ever used it could demonstrate it was the most efficient solution to their problem." It's possible to create a web-application with FORTRAN, but it's probably not practical.

If relational databases ever get replaced, they will probably be replaced with highly unstructured data blobs being processed by massive numbers of computers in the cloud. The whole RDF nonsense just adds layers and layers of restrictions, without any apparent value.

we surely are moving ahead in semantic web, with newer technologies, and more people joining the group. be it google app engine's datastore, which is based on EAV model, or brainwave's interesting Semantic database

I'm totally with you on the amount of effort that will go into creating and maintaining the ontologies. Anyhow, who is to say which ontology is the defacto if more than one is created for any term, or who is even to say that any ontology is right. The human language is ever changing and our knowlege and understanding is being constructed constantly, so we'll never be able to keep up.

Computers can only do what they are told; they never "understand" anything. There will always be a noticeable gap between how a computer works, and how a human thinks. All software programs are based on symbol manipulation, which is a far cry from processing a semantically rich paragraph about the meaning of data. Well... isn't it possible to create a software program that uses symbol manipulation to "understand" semantics? Mathematicians, psychologists, and philosophers say "hell no..."

There is no significant distinction between the electrochemical neural network (brain) and a sufficiently advanced artificial counterpart.

Searle's argument is a bogus appeal to absurdity.

Also your sweeping statement about "mathematicians, psychologists and philosophers" is questionable. Plenty disagree, I'm sure. You didn't even mention neuroscientists.

The Chinese Room thought experiment pretty clearly demonstrates that a symbol manipulation machine can never achieve true "human" intelligence. This is not to imply human brains are the only way to go... merely that if your goal is to mimic a human you're out of luck. Even worse, Gödel's Incompleteness Theorem proves that all systems of formal logic (mathematics, software, algorithms, etc.) are fundamentally error-prone.

Oracle machines, e.g., are a way around the kind of Gödelian limitation you are talking about.

There is no significant distinction between the electrochemical neural network (brain) and a sufficiently advanced artificial counterpart.

I'm talking about computers and machines; not cylons. If somebody assembled together artificial neurons that behaved exactly like human brains, then yes, its possible to create an artificial brain. But that would not be a "computer."

Searle's argument is a bogus appeal to absurdity.

Many AI researchers say so, but I've never seen a convincing reply... other than "who cares?"

I feel that Searle is just trying to remind everybody that even tho Turing Machines are cool, symbol manipulation is not "real consciousness" as we define it. If Turing Completeness is the only requirement for consciousness, then scratchings in the dirt have consciousness. By this, I mean that if you scratch "The Game Of Life" in the dirt, and give it a certain initial configuration, it behaves like a Turing Machine.

Does that mean the dirt is conscious? Or, does it mean that it is just a system that can output data that simulates consciousness?

Again, at this point the only correct answer is, "who cares?" Even if it only appears to be conscious, then that's still pretty dang useful... but in my opinion... there are always easier ways to solve a computing problem, other than trying to invent AI.

Came to your posts poking around for information on Semantic Web and Oracle after seeing that the Semantic Web for Dummies author is an Oracle Fusion worker, like yourself I believe, and quite a Semantic Web evangelist at least from a business perspective. Interesting post and lots of fun, though neglects some of the actual focus of semantic web technologies on solving particular business problems in data interoperability and reasoning over particular business objects, as it appears to be used at Oracle. There seems to be quite a focus and need now for semantic web technologies in defense and other sectors where perhaps standard database technologies prove inflexible or intractable and don't provide some capability to reason over disparate entities. Whether those specific uses of semantic web technologies will translate into addressing larger dreams of making some sort of useful self-analyzing web would be TBD, probably in the "vaporware" sense. As for why its taking so long to get these technologies off the ground, it'll take business adoption, which necessitates business need, and now is when we're seeing companies interested in sorting through seas of data, especially using more statistical data mining techniques, so I suspect you'll see increased adoption of semantic web technologies as a way of solving particular business tasks that may not be solved using statistics, legal, compliance, insurance, etc, in the coming years, maybe through the bundling of those technologies with popular products, Oracle's for example.

I feel like there SHOULD be a formal proof "semantic web is impossible". Unfortunately for a proof of mathematical or logical sort, the thing to be proved must be defined precisely - which you will not find anywhere for semantic web.

It is like that elusive mirage in the desert - you know it exists in your mind, but does it really exist in reality?

Unfortunately, it's pretty nearly impossible to "prove" that a thing is impossible... at best you can start with the premise "the semantic web is possible" then derive a contradiction... It's easier to prove "the semantic web is unnecessary" by taking anything it claims to do, and implement that with less complex / easier to maintain tools.

but I disagree with your "no real substance" claim... the "semantic web" is a fundamentally impractical pipe dream. And it appears that since Tim Berners Lee is claiming credit for the entire internet, it's a reasonable assumption that he'll claim credit for whatever comes next as well... and call it the "semantic web," regardless of what it is.