Archives

Categories

Connect

Posts Tagged ‘biometrics’

Finovate is one of those shows where you get up on stage and give a short intro and live demo. They are selective in who they allow to present and many applicants are rejected. Sensory demonstrated some really cutting-, perhaps bleeding-, edge stuff by combining animated talking avatars, with text-to-speech, lip movement synchronization, natural language speech recognition and face and voice biometrics. I don’t know of any company ever combining so many AI technologies into a single product or demo!

Speech recognition has a long history of failing on stage, and one of the ways Sensory has always differentiated itself, is that our demos always work! And all our AI technologies worked here too! Even with bright backlighting, our TrulySecure face recognition was so fast and accurate some missed it. With the microphones and echo’s in the large room, our TrulyNatural speech recognition was perfect! That said, we did have a user-error… before Jeff and I got on stage he put his demo phone in DND mode, which cut our audio output – but quickly recovered from that mishap.

On the same day that Apple rolled out the iPhone X on the coolest stage of the coolest corporate campus in the world, Sensory gave a demo of an interactive talking and listening avatar that uses a biometric ID to know who’s talking to it. In Trump metrics, the event I attended had a few more attendees than Apple.

Setting aside the question of whether rogue robots will create a dystopian future, there is one area that artificial intelligence (AI) in movies all seem to coalesce on: biometrics will take over for keys and passwords. There are over 200 movies that show the use of biometrics – here’s a list of 184 of them, and here’s a compilation of clips from several dozen movies.

Whether its fingerprint, voiceprint, iris, retina, face, or other biometrics, there always seems to be some sort of physical scanner in Hollywood depictions of biometrics in action. They have to hold their face or hand up to a device and the device often shines a laser and makes a noise. When they speak, a pass phrase like, “My voice is my password,” is typically required. In other words, the biometrics aren’t particularly fast or easy. The devices don’t just know who people are; they need to be queried and some sort of physical analysis needs to happen after the query.

Since the beginning, Sensory has been a pioneer in advancing AI technologies for consumer electronics. Not only did Sensory implement the first commercially successful speech recognition chip, but we also were first to bring biometrics to low cost chips, and speech recognition to Bluetooth devices. Perhaps what I am most proud of though, more than a decade ago Sensory introduced its TrulyHandsfree technology and showed the world that wakeup words could really work in real devices, getting around the false accept and false reject, and power consumption issues that had plagued the industry. No longer did speech recognition devices require button presses…and it caught on quickly!

Let me go on boasting because I think Sensory has a few more claims to fame… Do you think Apple developed the first “Hey Siri” wake word? Did Google develop the first “OK Google” wake word? What about “Hey Cortana”? I believe Sensory developed these initial wake words, some as demos and some shipped in real products (like the Motorola MotoX smartphone and certain glasses). Even third-party Alexa and Cortana products today are running Sensory technology to wake up the Alexa cloud service.

Sensory’s roots are in neural nets and machine learning. I know everyone does that today, but it was quite out of favor when Sensory used machine learning to create a neural net speech recognition system in the 1990’s and 2000’s. Today everyone and their brother is doing deep learning (yeah that’s tongue in cheek because my brother is doing it too! (http://www.cs.colorado.edu/~mozer/index.php). And a lot of these deep learning companies are huge multi-billion-dollar business or extremely well-funded startups.

So, can Sensory stay ahead now and continuing pioneering innovation in AI now that everyone is using machine learning and doing AI? Of course, the answer is yes!

Sensory is now doing computer vision with convolutional neural nets. We are coming out with deep learning noise models to improve speech recognition performance and accuracy, and are working on small TTS systems using deep learning approaches that help them sound lifelike. And of course, we have efforts in biometrics and natural language that also use deep learning.

We are starting to combine a lot of technologies together to show that embedded systems can be quite powerful. And because we have been around longer and thought through most of these implementations years before others, we have a nice portfolio of over 3 dozen patents covering these embedded AI implementations. Hand in hand with Sensory’s improvements in AI software, companies like ARM, NVidia, Intel, Qualcomm and others are investing and improving upon neural net chips that can perform parallel processing for specialized AI functions, so the world will continue seeing better and better AI offerings on “the edge”.

Curious about the kind of on-device AI we can create when combining a bunch of our technologies together? So were we! That’s why we created this demo that showcases Sensory’s natural language speech recognition, chatbots, text-to-speech, avatar lip-sync and animation technologies. It’s our goal to integrate biometrics and computer vision into this demo in the months ahead:

A key measure of any biometric system is the inherent accuracy of the matching algorithm. Earlier attempts at face recognition were based on traditional computer vision (CV) techniques. The first attempts involved measuring key distances on the face and comparing those across images, from which the idea of the number of “facial features” associated with an algorithm was born. This method turned out to be very brittle however, especially as the pose angle or expression varied. The next class of algorithms involved parsing the face into a grid, and analyzing each section of the grid individually via standard CV techniques, such as frequency analysis, wavelet transforms, local binary patterns (LBP), etc. Up until recently, these constituted the state of the art in face recognition. Voice recognition has a similar history in the use of traditional signal processing techniques.

Sensory’s TrulySecure uses a deep learning approach in our face and voice recognition algorithms. Deep learning (a subset of machine learning) is a modern variant of artificial neural networks, which Sensory has been using since the very beginning in 1994, and thus we have extensive experience in this area. In just the last few years, deep learning has become the primary technology for many CV applications, and especially face recognition. There have been recent announcements in the news by Google, Facebook, and others on face recognition systems they have developed that outperform humans. This is based on analyzing a data set such as Labeled Faces in the Wild, which has images captured over a very wide ranging set of conditions, especially larger angles and distances from the face. We’ve trained our network for the authentication case, which has a more limited range of conditions, using our large data set collected via AppLock and other methods. This allows us to perform better than those algorithms would do for this application, while also keeping our size and processing power requirements under control (the Google and Facebook deep learning implementations are run on arrays of servers).

One consequence of the deep learning approach is that we don’t use a number of points on the face per se. The salient features of a face are compressed down to a set of coefficients, but they do not directly correspond to physical locations or measurements of the face. Rather these “features” are discovered by the algorithm during the training phase – the model is optimized to reduce face images to a set of coefficients that efficiently separate faces of a particular individual from faces of all others. This is a much more robust way of assessing the face than the traditional methods, and that is why we decided to utilize deep learning opposed to CV algorithms for face recognition.

Sensory has also developed a great deal of expertise in making these deep learning approaches work in limited memory or processing power environments (e.g., mobile devices). This combination creates a significant barrier for any competitor to try to switch to a deep learning paradigm. Optimizing neural networks for constrained environments has been part of Sensory’s DNA since the very beginning.

One of the most critical elements to creating a successful deep learning based algorithm such as the ones used in TrulySecure is the availability of a large and realistic data set. Sensory has been amassing data from a wide array of real world conditions and devices for the past several years, which has made it possible to train and independently test the TrulySecure system to a high statistical significance, even at extremely low FARs.

It is important to understand how Sensory’s TrulySecure fuses the face and voice biometrics when both are available. We implement two different combination strategies in our technology. In both cases, we compute a combined score that fuses face and voice information (when both are present). Convenience mode allows the use of either face or voice or the combined score to authenticate. TrulySecure mode requires both face and voice to match individually.

More specifically, Convenience mode checks for one of face, voice, or the combined score to pass the current security level setting. It assumes a willingness by the user to present both biometrics if necessary to achieve authentication, though in most cases, they will only need to present one. For example, when face alone does not succeed, the user would then try saying the passphrase. In this mode the system is extremely robust to environmental conditions, such as relying on voice instead of face when the lighting is very low. TrulySecure mode, on the other hand, requires that both face and voice meet a minimum match requirement, and that the combined score passes the current security level setting.

TrulySecure utilizes adaptive enrollment to improve FRR with virtually no change in FAR. Sensory’s Adaptive Enrollment technology can quickly enhance a user profile from the initial single enrollment and dramatically improve the detection rate, and is able to do this seamlessly during normal use. Adaptive enrollment can produce a rapid reduction in the false rejection rate. In testing, after just 2 adaptations, we have seen almost a 40% reduction in FRR. After 6 failed authentication attempts, we see more than 60% reduction. This improvement in FRR comes with virtually no change in FAR. Additionally, adaptive enrollment alleviates the false rejects associated with users wearing sunglasses, hats, or trying to authenticate in low-light, during rapid motion, challenging angles, with changing expressions and changing facial hair.

We are pleased to announce that Sensory’s TrulySecure technology has earned first place in this year’s CTIA E-Tech Awards. We believe that this recognition serves as a testament to Sensory’s devotion to developing the best embedded speech recognition and biometric security technologies available.

For those of you unfamiliar with TrulySecure – TrulySecure is the result of more than 20 years of Sensory’s industry leading and award-winning experience in the biometric space. The TrulySecure SDK allows application developers concerned about both security and convenience to quickly and easily deploy a multimodal voice and vision authentication solution for mobile phones, tablets, and PCs. TrulySecure is highly secure, environment robust, and user friendly – offering better protection and greater convenience than passwords, PINs, fingerprint readers and other biometric scanners. TrulySecure offers the industry’s best accuracy at recognizing the right user, while keeping unauthorized users out. Sensory’s advanced deep learning neural networks are fine tuned to provide verified users with instant access to protected apps and services, without the all too common false rejections of the right user associated with other biometric authentication methods. TrulySecure features a quick and easy enrollment process – capturing voice and face simultaneously in a few seconds. Authentication is on-device and almost instantaneous.

TrulySecure provides maximum security against unauthorized attempts by mobile identity thieves from breaking into a protected mobile device, while ensuring the most accurate verification rates for the actual user. Compared to published data by Apple, the iPhone’s thumbprint reader offers about in 1:50K chance of a false accept of the wrong user, and the probability of the wrong user getting into the device gets higher when the user enrolls more than one finger. With TrulySecure, face and voice biometrics individually offer a baseline 1:50k false accept rate, but can each be made more secure depending on the security needs of the developer. When both face and voice biometrics are required for user authentication, TrulySecure is virtually impenetrable by anybody but the actual user. As a baseline, TrulySecure’s face+voice authentication offers a baseline of 1:100k False Accept Rate, but can be dialed in to offer as much as a 1:1Million False Accept Rate depending on security needs.

TrulySecure is robust to environmental challenges such as low light or high noise – it works in real-life situations that render lesser offerings useless. The proprietary speaker verification, face recognition, and biometric fusion algorithms leverage Sensory’s deep strength in speech processing, computer vision, and machine learning to continually make the user experience faster, more accurate, and more secure. The more the user uses TrulySecure, the more secure it gets.

TrulySecure offers ease-of-mind specifications: no special hardware is required – the solution uses standard microphones and cameras universally installed on today’s phones, tablets and PCs. All processing and encryption is done on-device, so personal data remains secure – no personally identifiable data is sent to the cloud. TrulySecure was also the first biometric fusion technology to be FIDO UAF Certified.

While we are truly honored to be the recipient of this prestigious award, we won’t rest on our laurels. Our engineers are already working on the next generation of TrulySecure, further improving accuracy and security, as well as refining the already excellent user experience.

Cybersecurity was an important topic at Mobile World Congress Shanghai. I was invited to join a panel with cybersecurity experts from Intel, Huawei, NEC, Nokia, and Ericsson with commentary by a McKinsey analyst. Peter O’Neil, a biometrics industry expert and CEO of FindBiometrics, led the panel. Interestingly, Peter was given a late invitation to lead a Keynote discussion on biometrics (in addition to our pane) when the GSMA decided to put more emphasis on biometrics in response to the broad interest in improving cybersecurity.

I’m about to tell you the painful irony in all this. But first, to get into China I needed a Chinese business visa, and a business visa requires an invitation from a Chinese organization. I was offered an invitation from the GSMA and they had a very effective system for filling out an online form and submitting it to them, all in the process of registering as a speaker. This quickly produced a formal invitation that I could use for my VISA application.

Rich Nass and Barbara Quinlan from Open Systems Media visited Sensory on their “IoT Roadshow”.

IoT is a very interesting area. About 10 years ago we saw voice controlled IoT on the way, and we started calling the market SCIDs – Speech Controlled Internet Devices. I like IoT better, it’s certainly a more popular name for the segment! ;-)

I started our meeting off by talking about Sensory’s three products – TrulyHandsfree Voice Control, TrulySecure Authentication, and TrulyNatural large vocabulary embedded speech recognition.

Although TrulyHandsfree is best known for its “always on” capabilities, ideal for listening for key phrases (like OK Google, Hey Cortana, and Alexa), it can be used a ton of other ways. One of them is for hands-free photo taking, so no selfie stick is required. To demonstrate, I put my camera on the table and took pictures of Barbara and Rich. (Normally I might have joined the pictures, but their healthy hair, naturally good looks, and formal attire was too outclassing for my participation).

There’s a lot of hype about IoT and Wearables and I’m a big believer in both. That said, I think Amazon’s Echo is the perfect example of a revolutionary product that showcases the use of speech recognition in the IoT space and am looking forward to some innovative uses of speech in Wearables!

Here’s the article they wrote on their visit to Sensory and an impromptu video showing TrulyNatural performing on-device navigation, as well as a demo of TrulySecure via our AppLock Face/Voice Recognition app.

If you’re an IoT device that requires hands-free operation, check out Sensory, just like I did while I was OpenSystems Media’s IoT Roadshow. Sensory’s technology worked flawlessly running through the demo, as you can see in the video. We ran through two different products, one for input and one for security.

Summary: The industry is embracing biometrics faster than ever and many CE companies and app developers are embracing face and voice biometrics to improve user experience and bolster security. Face and voice offers significant advantages over other biometric modalities, notably when it comes to convenience, and particularly in the case of our TrulySecure technology, accuracy and security.

Sensory’s TrulySecure technology has evolved dramatically since its release and recently we announced TrulySecure 2.0 that actually utilizes real world usage data collected from our “AppLock by Sensory” app on the Google Play store. By applying what we learned with AppLock, we were able to adapt a deep learning approach using convolutional neural networks to improve the accuracy of our face authentication. Additionally, we significantly improved the performance of our speaker verification in real world conditions by training better neural nets based on the collected data.

Overall, we have been able to update TrulySecure’s already excellent performance to be even better! The solution is now faster, smarter and more secure, and is the most accurate face and voice biometrics solution available.

I saw an interesting press release titled “EyeVerify Gets Positive Feedback From Curious Users”. I know this company as a fellow biometrics vendor selling into some of the same markets as Sensory. I also knew that their Google Playstore rating hovered around a 3/5 rating while our AppLock app hits around a 4/5 rating, so I was curious about what this announcement meant. It made me think of the power of all the data in the Google Playstore, and I decided to take a look at biometric ratings in general to see if there were any interesting conclusions.

Here’s my methodology…I conducted searches for applications in Google Play that use biometrics to lock applications or other things. I wanted the primary review to relate to the biometric itself, so I excluded “pranks” and other apps that provided something other than biometric security. I also rejected apps with less than 5,000 downloads to insure that friends, employees and families weren’t having a substantive effect on the ratings. I ran a variety of searches for four key biometrics: Eyes, Face, Fingerprint and Voice.

I did not attempt to exhaust the entire list of biometric apps, I searched under a variety of terms until I had millions of downloads for each category with a minimum of 25,000 reviews for each category. The “eye” was the only biometric category that couldn’t meet this criteria, as I had to be satisfied with 6,884 reviews. Here’s a summary chart of my findings:

As you can see, this shows the total number of downloads, the total number of apps/companies, the number of reviews and the avg rating of reviews per biometric category. So, for example, Face had 11 applications with 1.75 million total downloads and just over 25,000 reviews with an average review rating of 3.89.

What’s most interesting to me about the findings is that it points to HIGHER RATINGS FOR EASIER TO USE BIOMETRICS. This is a direct correlation as Face comes in first and is clearly the easiest biometric to use Voice is somewhat more intrusive as a user must speak, and the rating drops by .16 to 3.73, though this segment does seem to receive the most consumer interest with more than 5-million downloads. Finger is today’s most common biometric but is often criticized by its 2-hand requirement and that it often fails, requiring users to re-swipe, consumer satisfaction with fingerprint is about 3.67. Eye came in last, albeit with the least data, but numbers don’t lie, and the average consumer rating for that biometric comes in at about 3.42. If you consider the large number of reviews in this study and the narrow range of review scores (which typically range from 2.5 to 4.5), the statistically significant nature becomes apparent.

The results were not really a surprise to me. When we first developed TrulySecure, it was based on the premise that users wanted a more convenient biometric without sacrificing security, so we focused on COMBINING the two most convenient biometrics (face and voice) to produce a combined security that could match the most stringent of requirements.