Posted
by
CmdrTaco
on Tuesday April 06, 2004 @09:39AM
from the something-to-think-about dept.

rtmyers writes "A really simple yet radical idea: break web pages down into sentences, and then have the browser walk through sentences and do useful sentence-level things. This is the paradigm shift behind the product called Infowalker, which unfortunately is implemented as an IE toolbar, but would be fabulous as a feature built into Mozilla or Opera.
Currently implemented features include sentence-level interfaces for TTS, translation, large-type display, and the funkiest of all, dynamic display of an image pulled off the web based on keywords extracted from each sentence -- hey, turn all your web pages into slide shows today! Then there's the feature to show an Amazon product related to the sentence you're reading -- which presumably is the revenue model behind the product, but turns out to also be surprisingly useful.
This might not be for everyone, but it could just be the first real change in the browsing model since the earliest browsers starting throwing text up on the screen more than a decade ago. And apparently, Infowalker's architecture allows for pluggable third-party sentence-level "behaviors", with the potential for the development of a whole ecosystem of sentence-level functionality in browsers. And it seems Infowalker can also be controlled by strategically placed custom CSS tags within the HTML, raising the possibility of a new class of web pages especially tuned for this sentence-based approach."

If you really want to browse the web, try this: The world's best Surf Engine
IconSurf.com [iconsurf.com],
where you surf the web using website icons.
When you want to explore instead of doing targeted searches
or visiting the same old sites, this is the way to do it.
Shameless plug of my new toy, but it's pretty revolutionary as well.
I'd appreciate any feedback, as the site is just a little over 1 week old.

When harvesting icons, also harvest the home pages, and add the titles to the title attributes of the image tags; e.g., instead of title="www.iie.hva.nl", use "title="www.iie.hva.nl - Hogeschool van Amsterdam: Instituut voor Information Engineering - Algemeen".

Duplicate the title information in the alt attribute for each img tag (for accessibility).

Feh. That is just as fucking stupid as the insanely moronic "toolbar" in the article, shamelessly whored by the asshat (rtmyers) [slashdot.org], who brought this little useles bastard of a "toolbar" into a meaningless, unwelcome existence without bothering to note that importanrt fact in his goddamn annoying little marketdroid pimp-speak sell-fest story text.

I want that 5 minutes of my life back and, judging from the other posts here, I could start a class action to try to get it. Of course, that would just end up wi

Depends upon your browser. In IE 6 most load. Mozilla, Firebird and Opera virtually all load. The icons are actual icons, so some browsers do not support them. You're probably only seeing the icons that happen to be gifs or jpegs.

They're not random clicks. Presumably you click on icons that you like or that catch your eye. A nice icon will likely lead you to a nice site too. So the votes are for the icons themselves, not the sites that host the icons per se. Thanks for the bookmark, see you again soon hopefully.

But that's not what this Infowalker thing does. It simply displays the text of the page in a separate window, and optionally reads it out, with or without translation. You can adjust the display/voice speed and some other things as well.

Great for visually impaired people, but otherwise I see little use for it.

I do not support the idea of ad-fetching based on a per-sentence reasoning, because it means more ads and interruptions (browser interstitials, really). It's totally inefficient for end users and it only gives the advertisers a hard-on because they get to really psychologically assault surfers (which is a huge turn on to advertisers because they feel like they are super-human if they can fuck with our heads... it's fucking sick if you ask me). I prefer Slashdot's method of bonus features that subscribers can get by chipping in. Why can't advertisers come up with better concepts for selling their product (perhaps by word of mouth because it's a good product, not because we're always tripping over an ad about it).

In my books, the more ads I see about a product, the less I want the product, because the product must be sold at an inflated cost to pay for advertising, or it must be a poor product if they are pushing it so hard. Word of mouth is best.

The sentence "would be fabulous as a feature built into Mozilla or Opera" resulted in an advert for Wagner CDs, and "show an Amazon product related to the sentence you're reading" - popped up a little map of Brazil, and an advert for "Live Sentence" by Alcatrazz on CD.

Methinks they've a little more work to do if they're going to make their related advertising into an effective revenue stream... at the moment it's more like "hey, here's some random CDs with the same words as your sentence!"

Actually, like a lot of buzzwords, "paradigm shift" used to mean something. Real paradigm shifts are wondrous, exciting things. They also don't happen very often. I'd say only three have happened in computing in my lifetime: the switch from timesharing systems (mainframes and minis) to PC's as "what computers are" in the public eye, the change from CLI's to GUI's as the standard method of interacting with computers, and the way the Internet has subsumed the old hodgepod

I do not subscribe to the idea of "paradigm shifting". I think things just naturally evolve due to need or pure serendipity. Every thing has a practical purpose. We are going back to mainframe systems (100% of Fortune 500 companies use Citrix for something). I have an Apple Powerbook with OS X, I have a GUI and a terminal session open at all times. I like the best of both worlds. The internet sort of grew out of necessity to link together all of the BBSes and their users. IRC comes close, but it is still a

The web grew out of a BBS-like need. The Internet is just a larger-scale implementation of the machine-to-machine networking that was already in place. Normally I wouldn't bother making the distinction, but in the context of what you're saying, it matters.

I also don't see the connection between using Citrix and using mainframes. In fact, most (probably all) Fortune 500 companies aren't going back to mainframes -- they never left in the first place. Are you perhaps trying to relate terminal usage and other

How do you not see the connection between Citrix and a mainframe? Citrix is basically a mainframe program. You have a thin client and/or a dumb terminal access applications and files stored on a central server. That is about as mainframe as you can get. Defining a mainframe by throughput is like defining a car by paint. I said the internet was an evolution of the existing infrastructure and community. At least I tried to imply as such.

I do not subscribe to the idea of "paradigm shifting". I think things just naturally evolve due to need or pure serendipity.

A paradigm is a model, a set of assumptions about the way the world works. A simple example is the Copernican model of the solar system. The Earth revolves around the sun. Previously, the common view was that the Sun went around the Earth.

Paradigms certainly do shift, and it's a very different process to natural evolution. When Copernicus said "Hang on a minute, the Sun doesn't go

I wish. However, I believe Thomas Kuhn and "The Structure of Scientific Revolutions" far precedes, and will far outlast the.com bubble (first published 1962).

If you ever want to claw your own eyes out and need some motivation, just read that book. He comes up with the concept of "Paradigm Shifts" and explains them in exceptionally excruciating detail.

To be fair, it was a fairly revolutionary concept of it's day -- perhaps the best proof of this point is that it took managers 30 years to latch onto the concept and suck all the usefulness out of it. Managers then, of course, proceeded to use it incessantly and inappropriately to describe any change they needed to implement, revolutionary or not.

out of a non-semantic web. As units of language, sentences are still context sensitive, so this will very quickly get mired in throwing up offensive and inapropriate results. Imagine an article 'Man driven to suicide by music of Justin Timberlake' followd by 'Buy Justin Timberlake CDs on Amazon'.

Ah yes, that beautiful word "paradigm shift." I forget the exact wording, but the general quote goes something along the lines of "any time you hear that phrase, smile and nod while slowly backing away."

...funkiest of all, dynamic display of an image pulled off the web based on keywords extracted from each sentence -- hey, turn all your web pages into slide shows today!

Sweet Zombie Jesus, I did not spend my time turning off animated gifs, turning off Flash, stopping those stupid "download this plugin" buttons [slashdot.org] from popping up, using Google instead of Antarti.ca [antarcti.ca]'s let's-fly-over-the-web-in-a-low-flying-fighter-jet search engine, and running search-and-destroy missions on the remaining dancing baloney just to turn every web page into a goddamned sentence-by-sentence Powerpoint(tm) presentation!

Most natural language processing is done at the sentence level, so this is quite common. And plenty of work has been done on information retrieval using sentence and paragraph context.

It's less common to expose this to users, but it isn't clear that the sentence level is what should be exposed anyway. Better semantic markup of web pages into related sections or topics might be useful. But given that we can't even get authors to generate correct HTML the way it is, it's doubtful much would come of such a

Ever see dashboard [nat.org]? It takes information gathered during IRC, IM, web browsing, e-mail, and more, does a lot of backend cluepacket mojo, and returns a lot of useful information while you work. If "bug 1565" comes up during your work, it'll fetch information in dashboard about the bug without needing you to click on a bug link. Microsoft is working on the same thing, called "implicit query" or some such. Look at the Windows Longhorn screenshots so far... It looks like they are taking the classic IE information sidebar and altering it to work in this way.

Why do I get the feeling that this turns rich web pages into bit-size powerpoint bullets? (Confession, I have no windows machines so have no way of testing this thing). Maybe they will create a version that converts webpages into Flash animations -- showing... you... one... word... at... a.... time.

On the other hand, this type of content decompostion technology highlights the superiority of markup langages (e.g., HTML) over page layout languages (e.g. PDF). HTML retains more of the meaning of the content while PDF is basically a fancy way of converting content into a screenshot. Try extracting sentences from a PDF, what a PITA.

Interesting idea. And if you can read the intent behind each sentence then you can build on this to provide personalized or customized content to the reader based on text read off the page and compared to the user profile. I've been thinking about a tangent to this for a long time now, but can never figure out how to actually build it.

It's the idea of each thought being an atom of the content tree, captured in either a sentence, sentence group or paragraph. If each thought is a unique object, then each can

I would look at the following technologies:WordNet [princeton.edu] is well known although not that powerfull.Common sense [mit.edu] is really a beta but still its a big database.Cyc [opencyc.org] is really cool, but not all free. Look at cycL the language they developped.

I think a simple thing like having integrated access to wikipedia articles or dictionny.com from the browser would be cool. Amazon I don't know.

Seriously, can someone turn off the corpspeak, and actually state what this thing does? It sounds more like a total product plug than an actual review of anything useful. The best I can tell is that it just links each sentence of a page to a product--big whoop.

<10% of total internet users don't really count for much. By supporting another browser, they'd have to double their work to get another 2% machines it could work on. That's hardly good business for anyone.

If you step outside of the mainstream, you can't expect everyone else to follow you;)

Come on. This is *Stupid. This is goofy-stupid. This is dumber than "push technology" was.

Have you *noticed the signal to noise ratio we're dealing with here?

This is the most tedious and worthless 'enhancement' I've heard of in at least a year.

Unless they somehow developed a scheme for automatically detecting useful content in a webpage, I'm going to keep visually skimming them with my own two eyes until i find the one tidbit of useful data mixed in with all the dross.

I can just see this tool. "Ok, no, that sentence didn't help. Nope, not that one eather . . . no . . . no . . . . no . . . . god this is boring . . "

there are a large number of people who would use the sentence tag so that they could finally get two spaces after the end of a sentence and still have clean (x)html and css. right now, the cleanest way to accomplish this is {sentence 1}{non-breaking space}{breaking space}{sentence 2}. This does not provide any seperation of style from content.

This hogwash is the kind of pointless, ludicrous nonsense thought up by the same kind of jackass nutcase fools who come up with shit like the "whole language" method of teaching children to read*. They start with a totally unsubstantiated premise which, in their ignorance, they take as fact and base an entire line of reasoning upon it. Eventually, they end up with a sophisticated (and usually expensive) system that doesn't really do anything, much to their surprise. What am I supposed to do with a whole raft of pictures keyed to each individual sentence? What possible use is this, besides targeted marketing?

What they've come up with is an ingenious method of directing advertisements, but they've completely failed to provide any reason for consumers to use it. Hey, I've got an even better idea! Let's give away a set-top box that hooks up to your cable/satellite receiver and overlays small ads while you watch TV! Advertisers will love it because they can target ads based on what people are watching. Now all we have to do it get people to hook this box up to their TV. Perhaps if we have it overlay the time and temperature as well, people will want it for its utility....yeah...that's it...

* "whole language" is where you don't teach kids to read at the phonetic/letter level, but instead just let them learn whole words "naturally" by following along in their own book while the teacher reads aloud. If this seems ridiculous and nonsensical, that's because it is. It was dreamed up by a fool who "observed" that when one reads, one doesn't sound out individual letters, and then assemble the letters into words; no, one just reads words. The logical flaw here is the assumption that there is no letter-level parsing when, in fact, there is-- it's just not noticable as a distinct step because we do it so efficiently.

"whole language" is where you don't teach kids to read at the phonetic/letter level, but instead just let them learn whole words "naturally" by following along in their own book while the teacher reads aloud. If this seems ridiculous and nonsensical, that's because it is. It was dreamed up by a fool who "observed" that when one reads, one doesn't sound out individual letters, and then assemble the letters into words; no, one just reads words. The logical flaw here is the assumption that there is no letter-l

> The logical flaw here is the assumption that there is no letter-level parsing when, in fact, there is-- it's just not noticable as a distinct step because we do it so efficiently.

Actually, we do take in whole words at a time when we already know the word, but it's largely based on recognizing the letters at the endpoints. Taht is the mian rseoan tihs snctecne is sltil smwoaht rdaelbe.

I'm sorry to say that there's little objective information to go on as far as condemning "whole language" -- the phon

Actually, we do take in whole words at a time when we already know the word, but it's largely based on recognizing the letters at the endpoints. Taht is the mian rseoan tihs snctecne is sltil smwoaht rdaelbe. I'm sorry to say that there's little objective information to go on as far as condemning "whole language"

Well, the problem isn't limited to "word" vs. "letters" thing. Whole Language says that words only have the meanings that the reader brings to them, and that so long as the internal symbology is

This account of whole language is wrong on several counts. First of all, whole language, correctly taught, relies on a variety of strategies, including phonics. Secondly, whole language has a basis in developmental psychology, i.e. what we know about how kids learn. Thirdly, it actually works. I have ten years of experience as a parent aide in a school that uses whole language, and the results are phenomenal.

What usually happened in places where whole language "failed" is that a bunch of bureaucrats or

Their web site says, "Display the best on-line shopping sites for items mentioned in each sentence."

Swell. So it's adware. Cleverly disguised as.. uh.. adware?

Sure, it's based on a novel idea... but i'm betting this idea was spawned in a thinktank where the single goal was to find a new form of targetted advertising, and the biggest challenge they faced was giving users a reason to download it. of all the "cool things" it says it can do, the only item that seems worth pursuit is the inline language transl

A sentence, just like a word, derives part of its meaning from context.

So far as I'm concerned, this just feeds into the "sound-bite" culture vortex that television has been sucking us into for the last 2 decades. Why do we feel the need to strip the nuance and subtlety from everything?

This study [yahoo.com] seems to confirm what I've always thought about our soul-less Info-culture. I love technology, but we need to be careful that it doesn't strip away our humanity.

This study is no shock to me, though I do wonder how much of the TV they showed the test subjects was "good" stuff like Sesame Street (I miss the Electric Company) and how much was "bad" stuff like commercial-laden Saturday morning cartoons. And I have to say, gay or not, The Teletubbies moved at a MUCH slower pace than most kids' shows, even when I WAS sober. Good stuff, I think. "La-la, BALL!";-)

Don't get me started on the writing on the web. (Whatever happened to Grammar Nazi?) My degree isn't in Com

"Our mission is to the enhance the world's web browsing experience. To do this, we bring the latest innovations from psycho-ergonomics, neuro-economics, and compu-architectonics together in the form of completely free products which are ridiculously easy to use and amazingly useful."

Yet more proof that the internet we've come to know and love will be horribly massacred and have its soul sucked out by marketing types and management looking to use it for their own disgusting means. Just like what happened in the early 90s.

A sentence is basically a linear construct. But the way our brain processes a sentence is non-linear. We skip forward, we refer back. We process the sentence in this "looping" fashion until we comprehend it (or not, sometimes we just read it in linear fashion and move on to the next sentence in hopes that it will provide more context for us).

What's really funny is that I read his post thinking "It sounds like he's selling this to us, not just telling us about it. I bet it's his." So I looked at the comments, knowing that if that were true, someone would have checked up on it.

Too much meaningless junk in the article. Here's Word's take on the matter:

"A really simple yet radical idea: break web pages down into sentences, and then have the browser walk through sentences and do useful sentence-level things.

Currently implemented features include sentence-level interfaces for TTS, translation, large-type display, and the funkiest of all, dynamic display of an image pulled off the web based on keywords extracted from each sentence -- hey, turn all your web pages into slide shows t

I don't know what these guys are doing, the blurb on their site is funny but I don't want an IE plugin. However, it is possible they are a couple computational linguistics grads..

I did a survey of literature, coming at it from a layman developer's angle, and it seems the one area of natural language recognition (hence their name naturally open?) where computers are trustworthy and even exceed humans, is pronoun extraction. Not semantic recognition where meaning is understood, but just getting the who/where/what of proper nouns and being able to also link pronouns to them correctly. It's somewhere around 95% accurate and apparently better than a human volunteer in average accuracy, in one test.

This is accomplished not by dividing into sentences but looking at passages of multiple sentences. Perhaps theirs does some of this too, but even a very simple product searcher could just look for words not in its dictionary and google them. So it is not obvious what the merits of their approach are. Personally I'm interested in text-based interaction and news retrieval with open NLP tools.

I'd love to be able to process displayed web page text through an E-lisp variant prior to the page being displayed in Mozilla. I've found a project for something like that on Sourceforge but it doesn't look like it's gone anywhere.

...the business bastards don't try and use it to provide us with "new and innovative marketing experiences, I'm OK with this. We need a new right in this modern age, the right to NOT be marketed to at all. It used to be an unspoken right, but with new technologies like this one, it needs to be written down somewhere. I'd like to just be able to jump on the internet and not see one ad, get any spam, register just to read an article or participate in any surveys. Is that so wrong (said in that classic Har

That's a good idea. I'm going to make an addon for IE that does spell checking and auto-fixing. Now if we can just get the spelling and grammar trolls to use it so we don't have to listen to them grumble.

I don't get it. I copy and paste sentences into Google all the time, either to find out if something's been plagiarized or to show someone that the garbage they just sent me is yet another urban legend or internet scam. If you're using Opera (or IE with a Google toolbar, I suppose, or whatever cool new search thing is used in Mozilla), you can do it in two moves and it opens in a new tab. What am I missing here? Besides a new way for the marketing trolls to try to grab my eyeballs?