Posted
by
Hemos
on Tuesday January 25, 2005 @08:00AM
from the finally-tipping-their-hand dept.

prostoalex writes "Google will start indexing previously aired content from ABC, PBS, Fox News and C-SPAN and offer it as part of its Web search. No fancy speech-to-text recognition, just the closed captioning provided by the television networks, and no direct links to videocontent either." Right now, most of the channels are SF Bay area stations, but obviously more will be coming along. I saw a demo of this about six months or so ago - it's pretty cool, and interesting to see how far it has come.

Search engine analyst Charlene Li of Forrester Research said Google's latest innovation is likely to disappoint many people because it doesn't provide a direct link to watch the previously broadcast programming.

Google instead is displaying up to five still video images from the indexed television programs, as well as snippets from the show's narrative. The search results also will provide a breakdown on when the program aired and when an episode is scheduled to be repeated. Local programming information will be available for those who provide a ZIP code.

Google instead is displaying up to five still video images from the indexed television programs, as well as snippets from the show's narrative. The search results also will provide a breakdown on when the program aired and when an episode is scheduled to be repeated. Local programming information will be available for those who provide a ZIP code.

Hey, even that is an great service. Of course, the closed captioning is rarely very good. I never understand how, on a show that was produced weeks before it was aired, the captions are often messed up, or missing key words. Captions (also on DVD subtitles) seem to be shorthand summaries of what was said, when it's usually possible for them to be exact transcripts.

Sometimes it's not a big deal, but sometimes they miss an important point or nuance.

What'd be great, though, is real honest-to-god searching of the audio. I've seen demos where you can literally type in "helicopter," and you'll get hotlinks to the exact times in the video wherever that word was said. It's fscking amazing. Not sure it's a publicly available technology yet, tho...but the capability is definitely out there, and I'm sure we're not the only people playing with this.

I have added subtitles to a few videos (I work on a video production place, and sometimes we get a video in english that a company wants subtitled in spanish for their people to see, or a video we made for them in spanish subtitled to english to distribute internationally to their clients), and subtitles/captions most of the times must be shorter than what was said (specially in fast dialogue) or most people will just not have enough time to read what was said. The general rule of using text in video is that it must be there on screen at least enough time to read it twice at a leisurely pace. Of course, this can't be used when doing subtitles or captions, but you can't really expect people to read as fast as it's spoken or more often than not they won't have finished reading by the time it switches to the next piece of text.

Not sure if you've seen it, but you should see some of the spanish subtitles I've read... sometimes even entire pieces of conversations are changed because the correct translation would take too long on the screen to read... and of course there are the odd translations that are completely off the mark (I remember a version of the wing commander movie I saw where the name of the main ship, the Tiger's Claw, even if it was written several times on the movie, kept being translated at the "Tiger's Clock")

Where I'm from, Mexico, people watch practically all movies subtitled. I've been living in the US for a few years and I was surprised that many, if not most Americans, really dislike subtitled movies. I've heard that they find it very difficult to watch the movie and read the subtitles at the same time.

Then I realized that my brain does an amazing job at doing both things at the same time because I have no problem whatsoever when I watch subtitled movies.

Keep in mind that not everybody is a highly trained speedreader. Sometimes you must summarize, otherwise you end up with either a screen full of text, or the captions flashing by like subliminal messages.

Of course there is no excuse for errors in subtitling if they had plenty of time for checking it.

1. Many times life time deaf people can not read as fast as hearing people.2. Captions have a limited bandwidth. usually 60 chars a second.3. For the pop up style captions on most recorded TV shows there is first a build time follows by a display command. The build can not during a commercial brake so you have to wait until the show starts again.4. To do a good job captioning takes a long time. As much as 10 hours to do one hour of captioning. Corners get cut.5. Text takes space on the screen.Captioning does provide a good way to search video. I would love to see a hack for say myth tv where it monitors cnn, or msnbc or the news channel of your choice for key words. When it finds them it starts to record.

About 60 cps, that is only NTSC line 21 (EIA 608) captions. In the digital TV world (ATSC), EIA 708 captions have much more bandwidth. But few people are making 708 captions directly today, generally they are produced from existing 608 captions.

That is correct. Frankly as long as there is analog TV captioning will tend to be done at the 60cps rate. Frankly I am ticked that captioners still use all upper case. Mixed case is much easier to read and some of the captioning software does a lot work involved with case for them. Heck some of the software will even handle the numbers for you as well.

Here (in France) you get movies in theaters either in "VO" (version originale, original version, with subtitles) or VF (version française, dubbed into French). Most of the time I go see them in their original version whether I speak the language or not because whenever I've seen both versions, I always felt the dubbing was horrible (of course I could just avoid seing the original version).

Our mission is to organize the world's information, and that includes the thousands of programs that play on our TVs every day.

Also not as good as it sounds, apparently "the world" only extends to a few of the major US TV networks.

BBC already has video online, and they add subtitles to all content broadcast on BBC1 and BBC2, so it should have been easy to include them in the test. Given BBC's attitude towards the internet and making information freely available compared with most commercial broadcasters

Not exactly, at least for right now. If you search for a show, and click the link, there is currently a line that says "Video is currently not available". Does this mean that google eventually plans to link up the transcripts with the videos as well?
Something like that would be really useful.

A perfectly situated "I'm feeling lucky" to the torrent would mean that direct video is irrelevant.
Also, interesting that you can read 3 pages of a book searched by google, yet the ip implications of putting video on would make it nearly impossible.

I know googles mission is to index all the information out there - and they're on the right track. This is probably a step in the right direction, but IMHO it's too early.
I'd much rather have them to spend time presenting the currently indexed information. It's almost impossible to find information on any piece of hardware these days without having to walk through dozens of pages trying to sell that piece of hardware.

Then what if I'm trying to locate a dealer that sells that specific part? It's rare, but it does happen. But you're right, in many situations it would be nice with a "No dealers, just information" checkbox.

I used to manage the Discovery Channel Canada's web site at a time when we were transforming the site from an online science news magazine to a video-on-demand supplier of Discovery Channel Canada material. One of the things a few of us were interested in doing was offering up transcripts of aired programs. Doing it was simple, even then, since most TV tuner cards were capable of grabbing the captioning info from a vertical interval and dumping it to a text file.

Not only that, but I recall this being done off of laser disc-based TV archives in the early-mid '90s at the MIT media lab. Does it really take us 10 years to adapt this kind of thing to the real world?

As mentioned earlier, I think they have a ways to go. Can you understand anything on this [google.com] page? Although I guess closed captionning a live event is difficult, as I'm sure someone will attest to.

now how will C-Span's coverage of White House speeches deal with teh great use of English literature such as the following?

Bush:
"nucular"
"abu.. abu.. abu.. abu grabby prison"

Rumsfeld:
"here are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns -- the ones we don't know we don't know."

As cool as Google is, I also think Blinx.com's search [blinkx.com] deserves mentioning. According to their white paper [blinkx.tv] they transcribe video content on the fly, and you can even set up "smart searches" which notifies you when new content matching your search becomes available.

This apparently only applies to video content available on the web, but I guess it could potentially be done with TV content as well. It seems to me like this -- if it works -- is one step ahead of Google's approach.

First we hear of Google taking an interest in video distribution in the U.K., now they're showing us a completely new integration of the web and video. Google is going to be a force to be reckoned with in the media industry.

It must be nice to have, for all intents and purposes, no practical limit to your storage capacity or bandwidth.

This will be great to grab the latest soundbytes from when Newscasters completely blow their commentary.. Like the woman that said that President Clinton may have been gay, when she meant to say Lincoln.

"
at 1 minute
A fox news Alert. A major capture in the war on terror. Fox news confirmed that a senior aide to Iraq's most wanted terrorist has been captured. The man who has Ben working with Abu-musab Al-zarqawi is said to be responsible for 30 or more car bombings. Freezing temperatures and thousands left without power."

Rival search engine Yahoo Inc. (YHOO) also has been tinkering with a product that finds video available for Webcasts. Hoping to counter Google's entrance into the space, Yahoo planned to step up the promotion of its video search tools Tuesday by linking to the service from the home page of its heavily trafficked Web site.
Yahoo counter Google....damn that's funny.

Unless I'm mistaken, yahoo is just indexing the name of the video file and perhaps the surrounding text if it is on a webpage. Altavista has done that for quite some time now too, and Hotbot did it almost a decade ago, but Googles indexing of the actual video content via closed captions is slightly more impressive.

Or just not subscribe to cable. When/if I move out of my parents house likely I will have net + phone but no cable.

Cable [like white collar industry] is largely a scam. They basically ripoff last weeks ideas, cliche up a script and sell drivel as "shows" [re: anything reality labeled] then early morning and weekends they show informercials.

Infomercials are great [aside from being funny and overtly scammish] they pay the station money to show the commercial and I pay the station to see the commercial....

It seems like once a week there is a press release where google has bought some obscure company so they can do some random thing no one wanted but is kinda cool. Does anyone else think they may be overextending themselves, or just doing these random things to generate a press release and make their stock go up another 2 points? I have yet to see any of their new ideas that diversify thier income (98% advertising or whatever).

Yeah, cool. But I'd much rather see them fix "Google Groups" (previously known as Usenet). Or - just for fun - fixing Web search so that it at least can search words with flexible endings (search = searches = searching) or to provide options to excluded "Buy! Buy! Buy!" spamming... I'm sure their suggestions box is full.

Speaking of Groups, anyone here found a way to use the old groups interface still? The new one is missing half of the results (ie you click it and it says it doesn't have it after all) it gives when I do a search.

The results seem to be skewed when the search term is a person or character in the show: check out the search for Carson [google.com] and notice how almost every result is the Carson Daily show with hardly any news on Johny Carson --because every second line in the closed captions is "Carson >".

It appears to me that fox has already found a way to use this feature to inundate us with even more adverts... when you do a search for the Simpson's, the only relevant content you get are the episodes shown on UPN. Anything from fox is a series of 5 snippets of their advertisers...

Regis and Kelly:
Kelly: let me ask you a question. Did you Google that? Caller: no, I asked my son-in-Law. Kelly: oh, you asked your son-in-Law. Oh, because sometimes googling is very useful. I'll hear suddenly people will be very silent, do you know what I mean? Caller: I don't know how to Google. Regis: we don't do googling. Kelly: I don't Google either.
You heard it here first, folks, Regis Philbin does not Google.

Not so strange. You spelled it wrong. A search [google.com] for "South Park" (note the space) turns up a few hits. If you were hoping for lots of hits from the show itself, you might want to read this page [google.com] to see which networks they're working with so far.

There is a tool that does do this called TVEyes [tveyes.com]. It is used by PR Agencies and politicians to track how they are talked about in the covered broadcasts. From what I understand, they basically have a program transcribed and searchable in about 30 seconds after airing. Pretty cool stuff. But as with most things worthwhile, it is expensive and not available for free on the Internet - just as Factiva [factiva.com] and a whole host of other services aren't.

I was going to post about TVEyes doing this, but you beat me to it. I always thought that idea was pretty neat, and I'm actually a little surprised it took google this long to do this.

TVEyes is more based around email alerts than manual searching, but if Google offers every data analysis tool for free, what's going to be left to make money on? I'm starting to get a creepy "Google is the MS of data" feeling.

Can anyone suggest some software that will read/decode closed captioned text from television? It would be nice if there was an Open Source package that did this, however I'd be interested in commercial alternatives as well.

They are way ahead of Google in this space - they actually allow you to view the clips you can search for! Furthermore, they have an alerting service which will allow you to get an email *seconds* after the keyword you want is mentioned on TV - and then you can watch the clip!

I am doing something very similar in my apartment: an always-on mini-itx media server that (among other things) records free-to-air TV with teletext and provides me an interface to the teletext. While teletext isn't completely accurate, it makes for a huge body of searchable content.

Google instead is displaying up to five still video images from the indexed television programs, as well as snippets from the show's narrative. The search results also will provide a breakdown on when the program aired and when an episode is scheduled to be repeated. Local programming information will be available for those who provide a ZIP code.

I think Google is aiming to stay within fair-use boundaries. (And also avoiding taking on a needless bandwidth burden serving video).

It would be possible for people to use "Google Video Search" to identify interesting TV content outside their local area, then request snippets a P2P manner from users whose computers were in the local area of the broadcast.

What are the fair-use guidelines for recording and sharing of free-to-air TV content, can someone say?

I worked on a similar research project more than five years ago that essentially did the same thing. It even provided links to the video files. Unfortunately, since it was research, and we didn't want a lawsuit, the system was only available to the company intranet and a few other (mostly governmental) organizations.

The system was called the Broadcast News Navigator, and more information is here:

I'm starting to get an uneasy feeling about Google. I still think they are an awesome and ethical company and I understand that their recent IPO left them flush with cash allowing them to expand their horizons. Let me be perfectly clear that I do not think they are out to do evil. Of all the companies out there, I think Google is one of the more ethical ones.

What is Google up to? Their indexing of video using closed captioning is a great and wonderful concept and I can see the value in it. With broadb

Sorry, am I the only one concerned about how much data - and markets - Google is getting into? I might be doing the "oh shit fingers-in-everything" dance here but I'm finding it hard to think of a market Google *haven't* got into... how long before a Google store springs up on your corner (that can find a bag of rice *cheap*)!

Google is making it easier for people to find those clips online. It's good for them to do this, as it makes those websites more useful. If you can't find information online, why bother putting it online in the first place?