Over the last two years I have conducted research into the use of Autonomy IDOL's and DRE's along with Softsound's technology to allow voice control of services including automatic transcription (Although I have started fancying Nuance as a better contender for the voice to text side of this 'kit') which then leads to a raft of services based on the input from the account holder. From letter writing just by talking to asking and getting answers... this is the automated switchboard that links profile match with profile match as with Autonomy's use regarding the Commonwealth Institute... and much much more... My question/s
If voice controlled services were available 2 years ago why is it that I am still stuck with keyboards or having to ask other people and hope they know the right answer?
Why can't I just talk to my phone and have my data stored and worked on so I can find out how much of head banger I was 3 months ago? Why can't I ask my phone a question and have this kit go off and find me the answer?
Why is voice such a little used part of this new society? Why is there no way to brainstorm on your own and get feedback unless you have more than just a simple phone? Talk control means All of the peoples of this earth could theoretically join in now rather than after they all learn writing and computers... why isn't it happening? Why is the ability to instantly respond to world situations so badly coordinated? This kit has designs that focus on the routes from a to b so that, in event of a disaster this kit becomes a coordination tool with multiple thousands/millions of real time coordinated inputs and outputs... Why isn't it here now? Is this promotional? I suppose it is in that I need this kit now and so do many many disabled people, carers, low income and even those of you who get to read this... Imagine talking and having things happen... I can't wait. help get it going soon please...
Please get me some answers to the above
Mark Aldiss projectbrainsaver

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States.
Privacy

Processing your response...

Discuss This Question: 5 &nbspReplies

There was an error processing your information. Please try again later.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States.
Privacy

You've asked several questions wrapped up in your overall query.
To start off with, the simplest - voice recognition is simply not at a point (and may not be for a very long time) where casual speech is recognizable - and more to the point - "actionable" - meaning that some system ("kit" you said) would be able perform tasks based on the request.
One of my professors said about 30 years ago that 'A computer can do anything if you define it precisely enough.' I've yet to see him proven wrong. However, your question gets into areas which have only begun to be explored seriously.
I said that voice recognition was the simplest part. The question that you've asked is far more complex. To begin with, you'd have to get enough agreement overall on the desireability, practicality, and most importantly intent.
A project/capability on the scale you suggested would require the alignment of thousands of various agendas - many of them hostile or irreconcileable.
Additionally, many people (myself included) are going to be distrustful of such a utopia/dystopia that you've proposed. Your basic premise seems to point at a common human purpose - when many people's common purpose is to control or dominate others.
Don't hold your breath,
Bob

There are actually two issues here. The first has already been discussed, and that is that voice recognition is not yet ready for prime time: get a voice recognition system and try it out--you'll see what we mean.
The second is parsing the results of the voice recognition. Given the multitude of regional differences in the use of language, not to mention the overlay of the native grammar of persons who use English (or any other language) as a second language, the task becomes overwhelming for non-associative processors, like computers.
Then there are the context-sensitive situations. For example, it is 1PM and a co-worker sticks his head in the door and says "Jeet yet?" We know what he or she is saying in that particular context, but we might not understand it at, say, 9:30 AM, and a voice recognition program and parser would be at a total loss, given the current state of such software.
There will, of course, be a time in the future when this is all possible, and it will require a different breed of hardware/software than we have at present. Something more like the brain, I suppose.
Cheers,
Phil

A - Voice recognition in one language for multiple speakers is still limited to simple words. To increase the vocabulary you reduce the variables by limiting it to one speaker who trains the program to his/her voice. English is a hard language because of the synonyms/homonyms, Chinese (Mandarin) is multitonal and pitch changes meaning. The 'Romance' languages tend to acquire the first consonant of the next word when the current word ends with a vowel sound. Try reading a passage in a book and then listen to the same thing spoken. Look how long it has taken 'Optical Character Recognition' to get the fourth 9 in accuracy, and by comparison that is trivial. It may take another decade level increase in processing to approach the adaptive recognition and noise filter between your ears.
B - That said, you misstate your question. "God answers prayers, even when the answer is no." Wanting a 'positive' reaction, implies we are withholding a more acceptable answer. Voice Control systems of two years ago and still this year are limited to numbers, yes/no, and spelled words. A vocabulary of a hundred words is hugely expensive.
C - 'Smart Agents' can implement your proposal. Setup a voice recognition package for yourself, train the program to your cadence and pronunciation, (avoid colds) and call your agent whenever needed. Then multiply your investment of $1,000 by each person you want to access your system. It will work. The telcos spent millions to eliminate operators and are not quite there yet.
Good luck.

Thanks for replying.
Autonomy can offer projectbrainsaver 100% understanding of what is said and, with minor amounts of questioning, quickly build a bank of that clients pronunciation to enable a 100% transcription rate 27/7/365. Aungate is an example of Autonomy realtime understanding of voice content. Your example "it is 1PM and a co-worker sticks his head in the door and says "Jeet yet?" is one of a type of example that Autonomy feature regarding their guessing technology that would understand, using Bayes theory and Shannon's law, what was being said.
Both Nuance and Autonomy/Softsound have the ability to understand vast amounts of colloquial speech (and accuratly guess at the rest in Autonomy's case) and both can optomise any language - #300,000 project cost get you an optimised language from Autonomy... enabling any user of that language to talk and achieve... it also adds the ability, in times of disaster, to use that constantly building understanding of that language to do such things as join members of families back together just by handing them a mobile phone - family members have something in common regarding their speech, father and son - mother and daughter - as do village members and regions.
Part of the build of projectbrainsaver was a firewall for human beings... a way of protecting the most vulnerable from the difficulties modern society can place with regard to information overload, dishonest practices, misinformation, best or worst specialist, etc.. even power over people people. Butterworth's Law is catalogued by Autonomy and would be capable of building in to this kit as a whole enabling the system to know when something isn't right before it pops through an elderly person's door - As would all the laws, rules and regulations for any industry, country, organisation, etc - Aungate already covers litigation in the US, UK and other countries. Because of the build of this technology from Autonomy it is possible for it to 'feed' itself; Autonomy has the ability to automatically configure itself - it is known as atomated infrastructure technology.
One use of Autonomy technology can diagnose a patients illness within 95% accuracy first time on just listening to the patient's description of their symptoms.
No, this can be done now... when you add a personal account that hold a long term databank of what you add into it... all that data to analyse regarding the actual meanings of the words that the client says.... Mmm.
Filip De Graeve, ex manager of the Belgian EGovt build knew the core of projectbrainsaver kit could be done in February of 2003 using Lernout & howsbie technology - its the real time transcription that used to cause the problems but that died with P4 processors and above... fast computers with lots of hard drive and memory.
Autonomy can offer real time understanding of voice. So can Nuance... IBM have come out with an open source offering that does the same... understands what i am saying in real time and with serious accuracy.
Add VoIP, SAN's, Data Centres and Call centre's and you have a high powered piece of kit that can be used cheaply by individuals and groups for enhancing their lives without having to aim for the whole world one people bit when their individuality or group offering would other wise be lost as it is now, more often than not...
So, I reiterate, why am i typing? (-: thanks for the replies so far
Mark

There are a lot of people who aren't / aven't been in the ASR business and are off-base.
1- Bad new, Nuance was sold to Scansoft. Still have the name around for a while, but I would expect that will change over time.
2- The Scansoft ASR engine is very good. A little better than Nuance's these days. I've been in projects putting voice activated dialing in the US for AT&T (now Cingular) and Sprint's Voice Command. (Almost one for the UK's TMobile group too). The technology has really turned the corner. There are some people who just have a odd inflection and have trouble, but generally it works very well.
3- What you asked for has much harder hurdles in allocating a reaction to the thousands of grammars (phrases) that can be uttered. Try writing the phrases for a specific action, and then cover all the actions you would want done. This is the start of the magnitude of understanding natural language.
4- There is a service which I believe is also available in the UK (where I assume you are from) called Angel.com - you may want to look at that for an automated switchboard.
5 - You can talk and have it captured by computer. There is a product from ScanSoft that does this (Used to be Dragon's product).
-Paul (no longer in the telecom business as of this year!)

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States.
Privacy

Processing your reply...

Ask a Question

Free Guide: Managing storage for virtual environments

Complete a brief survey to get a complimentary 70-page whitepaper featuring the best methods and solutions for your virtual environment, as well as hypervisor-specific management advice from TechTarget experts. Don’t miss out on this exclusive content!

Share this item with your network:

To follow this tag...

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States.
Privacy