Blog

Once more unto the (data) breach by Dr Cornelius Glackin 1 in 4 companies will experience a data breach in the next 12 months according to the Ponemon[1] Institute’s “2017 Cost of Data Breach Study: Global Overview”. The perception is that the vast majority of data breaches involve on-premise infrastructure. As such, many companies […]

Once more unto the (data) breach

by Dr Cornelius Glackin

1 in 4 companies will experience a data breach in the next 12 months according to the Ponemon[1] Institute’s “2017 Cost of Data Breach Study: Global Overview”. The perception is that the vast majority of data breaches involve on-premise infrastructure. As such, many companies prefer to employ the cloud for storing their data; it makes sense in principle to outsource cyber security to a professional cloud provider. It is also lower in cost. However, some of the largest and most costly breaches have been for cloud-based systems e.g. Apple iCloud, Dropbox, LinkedIn, Microsoft and Yahoo[2], each resulting in millions – and in some cases billions – of accounts being compromised.

Cloud computing means organizations allowing access to business-critical applications and sensitive data over the Internet. Recent advances in deep learning have revolutionised image and speech processing, making exciting new applications possible. Many of these applications require the support of cloud computing infrastructure to centralise the necessary computing power required to process video and audio data. There are numerous emerging examples of this such as Amazon’s personal assistant Alexa which employs cloud processing to support its voice recognition and dialogue management functionality. Whilst no breaches of this system have been reported, the implication is that unencrypted audio data must reside on the cloud, to enable it to be processed, and hence carries a substantial risk.

Earlier this year, an open database containing links to more than 2 million voice messages recorded on cuddly toys was discovered[3]. Personal pictures of celebrities were breached from Apple’s iCloud offering. In the majority of cases, cloud providers typically urge their customers to use stronger passwords, and add notification systems that look for suspicious activity.

Whilst personal photos of Jennifer Lawrence are seemingly of interest to hackers, the implications for leakage of audio data could be even more serious. Perhaps the largest unknown in this scenario, is what the future capabilities of deep learning will have on analysis of biometric signals like voice.

Dr Rita Singh from Carnegie Mellon University and her colleagues pieced together a profile of a serial US Coastguard prank caller solely from recordings of his voice[4]. This included a prediction of his height and weight, and also the size of room he was calling from, leading to his apprehension by the authorities. Dr Singh’s team are using this research to identify a person’s use of intoxicants or other substances, and also the onset of various medical conditions the speaker may not even be aware they possess. For instance, the biomarker for Parkinson’s Disease can be detected in a person’s voice long before any other symptoms arise. This raises the prospect of using voice recognition in the medical field to diagnose diseases with speech-related biomarkers.

This recognition of the usefulness of voice biometrics is now utilised by some banks to “secure” accounts. Banking has embraced voice authentication in order to make the banking customer’s experience frictionless. However, a recent BBC article detailed a voice biometric breach that occurred when a journalist gained access to his twin brother’s HSBC bank account. Whilst this flaw was attributed to legacy voice biometric solutions, one should be cautious with relying on voice as the principle mode for authentication, for no other reason than it is not difficult to record someone’s voice, and in the near future to use that recording to synthesise that voice to say anything. Start-ups like Lyrebird[5] are working on ways to replicate a voice using just a minute of recorded speech. In the very near future, any sample of your voice could be used to realistically impersonate you.

The implication is that the future will feature a significant arms race between AI-equipped adversaries’ intent on breaching cloud-based systems, and the intelligent algorithms designed to protect such systems. So, what is the answer? Well, first of all, organisations must understand the probability of being attacked, how it affects them, and even more importantly, which factors can reduce or increase the impact and cost of a data breach. One such way to mitigate the effects of a breach of audio or video data in particular is to encrypt it.

For sensitive data, there is the option of using encryption for the secure storage of data in the cloud. However, while we have become increasingly good at encrypting data at rest, in order to process the data on the cloud we first need to decrypt it, which in turn excludes the possibility for using the cloud’s resources to process sensitive data, unless it can be done in a secure way. Cryptography research has made some innovative strides with this issue in recent years.

Searchable Encryption (SE) is a relatively new form of encryption that enables encrypted data to be searched with encrypted keywords. In this way, the idea is that the cloud can be used to store sensitive data that has been encrypted. An authenticated user can then search that data using search terms that are also encrypted, and the Searchable Encryption protocol residing on the cloud is able to compare the encrypted search terms and match it to the relevant encrypted data without ever understanding either what was being searched for, or what data it contains. It is no surprise that the seminal paper[6] from Senny Kamara, the inventor of this revolutionary cryptosystem, is one of the most-cited security papers since 1981.

Searchable Symmetric Encryption (SSE) is also the basis of the Intelligent Voice’s encrypted search product CryptoSearch, with which large volumes of a users’ encrypted speech transcripts and their corresponding encrypted audio can be outsourced to the cloud for storage. For review, the audio database and its associated encrypted transcripts can be searched, and once the pertinent audio file has been found it can be downloaded and decrypted behind the client’s own firewall – without the need to download everything, decrypt it, find what you are looking for, re-encrypt and re-upload. At no point does the cloud server ever see the data or the search terms in the clear. In the event of a breach any data retrieved is encrypted and can only be decrypted with either prohibitively computationally costly brute force decryption, or the user’s private encryption key.

Ultimately it is advances such as Searchable Symmetric Encryption and Fully Homomorphic Encryption that will be the cloud defender’s most valuable asset for safeguarding our data in the cyber security threat climate we can expect in the very near future.

These days, no-one wants to pay for anything: free email, free search, free storage, free social media, free everything. But let’s face it, it’s costing someone, somewhere. I got thinking more about this after a chance remark from someone who visited us at the AI Finance Summit this week in Zurich (https://theaisummit.com/finance/). He said that […]

I got thinking more about this after a chance remark from someone who visited us at the AI Finance Summit this week in Zurich (https://theaisummit.com/finance/). He said that his large insurance company was “overrun” with offers of free proof-of-concept systems from (and what he said is important) “VC-backed software vendors”. He also pointed out that a lot of people had been “burned” by these “free trials”.

What it made me realise is that not only are the VCs paying, but so are the companies who are taking these “free” products on.

Getting a company off the ground is hard, whether you have $20 in the bank or $20 million. No-one ever wants to be the first person to buy your new product, especially in the software space. Even if you convince the business that what you are offering is genuinely the best thing since sliced bread, you then must convince the IT team, who are usually wedded to the current way of doing things, and will often throw every sort of FUD known to man in your path.

And the worst thing you can do is offer to do it for free.

That may sound counter-intuitive: Surely you want to make it as frictionless as possible for your prospective customer to take on your software? You don’t want them to have to go cap in hand to their boss asking for money for something that may be completely untried.

The problem you face, though, is that people do not ascribe any value to something they get for free. So that means you cannot get the buy-in from all the stakeholders, because there is nothing at stake. Stick some money in the pot, though, and suddenly you have everyone’s attention and motivation to make the project a success.

At this point, someone will no doubt give me an example of how they have offered a free proof of concept, and how the project was a success. And yes, I have on occasion gone down that route and yes, some interesting business has come from it. But it is the exception, not the rule. Apart from once (ironically our first ever paid engagement), all other paid-for trials or PoCs we have run have turned into ongoing business.

The worst culprits? Banks.

We have had some fantastic engagement with the occasional bank (who sadly I cannot name for confidentiality reasons), but we have had some horrors as well (who sadly I also cannot name for confidentiality reasons). At one bank, we had hardware installed (that we had paid for) for two years before it became obvious that there was no project, just a consultant who was justifying his large fee by getting vendors to run endless free trials. At another, we ran a mass of data (voice, email, IM, SMS, trades) through our system at very short notice to show what the art of the possible was (and we found some scary stuff). The bank didn’t buy anything from anyone: I still cannot mention the name in the office without someone swearing loudly after all the late nights that were wasted.

We are even turning some RFPs away now, as so often our weeks are consumed with endless site visits, WebEx’s and meetings which do not amount to anything. Sometimes you are a stalking horse for an incumbent vendor. Other times, it is used as an excuse to make no decision at all. The problem is that the procurement rules that are put in place to try to guarantee the best solution, often guarantee the very worst.

And that costs the customer money in terms of staff and lost opportunity.

There are some glimmers: Some companies are beginning to recognise that they need to foster new innovations, and that the best way to do that is to collaborate with vendors, and help fund the projects. This gets attention and engagement from all sides. I’m much more likely to give my absolute best for the person who provides jam today, rather than the promise of it tomorrow.

In my ideal world, we would all do a little bit of something for free: You should not just buy based on a few PowerPoint slides, and so opening the kimono just a little is a good idea. Ideally, you should have a structured engagement program, where you give all customers the same story. We, for example, offer to take customer data and run it through our system for free, and present the results back. This allows us to a) get the best out of the data, and b) set expectations.

Then we offer a partner program for resellers for an annual fee (with benefits!), or a paid engagement with the customer (in effect, a small initial installation) at a fixed cost. And if that all goes well, we go for the full roll-out.

It is so tempting to offer the world just to try to get business, but unless every engagement is properly scoped, and treated like a real project, and you don’t overstretch your own resources, it is almost always doomed to fail.

Intelligent Voice Signs Strategic Alliance with Navigant

London, 23rd October 2017 – Intelligent Voice, a leading specialist in voice and analysis solutions, is delighted to announce its new strategic alliance with top global professional services firm, Navigant

“We are delighted to be working with Navigant,” says Ben Shellie, CEO of Intelligent Voice. “They provide state-of-the-art technology solutions, including e-discovery, forensics excellence, and deep information security capabilities required to address today’s most complex data challenges. For us, this type of full-service, highly-secure service provision is vital to protect today’s demanding clients.”

Says Richard Chalk, Director of Global Legal Technology Solutions for Navigant “Management and review of audio has become a major headache for a number of our clients. Intelligent Voice offers a solution that can be easily integrated into almost any workflow, and provides ultra-fast accurate processing in a small footprint.”

The new strategic alliance will allow rapid and effective deployment of Intelligent Voice for regulatory and investigative matters, in particular a deep integration with leading review tools such as Relativity. Intelligent Voice offers a suite of review capabilities using speech-to-text, phonetic indexing and biometric search.

Intelligent Voice offers a suite of review capabilities using speech-to-text, phonetic indexing and biometric search. Intelligent Voice provides cloud-level accuracy from an on-premise solution for clients who do not want data processed using public cloud providers. The solution also ensures compliance with existing and future data privacy regulations including GDPR.

About Intelligent Voice®

Intelligent Voice Limited is based in London, San Francisco and New York. The company has over 25 years’ experience of delivering mission critical systems in the financial services industry, including to several of the world’s top 20 insurers and banks. Through innovations such as the SmartTranscript® and GPU-accelerated speech recognition, Intelligent Voice allow companies to understand their businesses better, with a key focus on unlocking the value in telephone and meeting room audio. For further information about Intelligent Voice, please visit http://www.intelligentvoice.com

About Navigant

Navigant Consulting, Inc. (NYSE: NCI) is a specialized, global professional services firm that helps clients take control of their future. Navigant’s professionals apply deep industry knowledge, substantive technical expertise, and an enterprising approach to help clients build, manage, and/or protect their business interests. With a focus on markets and clients facing transformational change and significant regulatory or legal pressures, the firm serves clients in the healthcare, energy, high tech and financial services industries. Across a range of advisory, consulting, outsourcing, and technology/analytics services, Navigant’s practitioners bring sharp insight that pinpoints opportunities and delivers powerful results. More information about Navigant can be found at www.navigant.com.

No cloud server or messaging system is completely secure: Just ask Hillary Clinton. Even though these systems are protected with layers of security, these layers can be hacked. Brute force attacks can crack passwords. MITM attacks using tools like sslstrip can turn secure sessions into insecure HTTP sessions. And outright manipulation of human confidence can […]

No cloud server or messaging system is completely secure: Just ask Hillary Clinton. Even though these systems are protected with layers of security, these layers can be hacked. Brute force attacks can crack passwords. MITM attacks using tools like sslstrip can turn secure sessions into insecure HTTP sessions. And outright manipulation of human confidence can be used to access virtually anything.

This is why homomorphic encryption is on the brink of becoming popular in cloud computing, especially when only 25% of people trust cloud providers with their data.

With homomorphic encryption, a cloud server can’t see the original content of a file. Instead of the original content being stored, a scrambled version of it is stored. And using homomorphic encryption, everything from plaintext to audio snippets can be stored, searched for, and located on the cloud server without the cloud server company seeing it (explained visually below).

For instance, if you are a doctor who has dictated sensitive patient data (as hundreds of thousands of medical professionals do every day), you could send the recording to a homomorphic speech service, then search the audio file for specific keywords. Without understanding the content of the recording, the service could locate parts of the recording with those keywords and send them back to you.

Currently, most practices send audio reports to medical transcriptionists, which is hardly secure, especially if the transcription service is outsourced and not kept in-house. At the end of the day, computers are less emotional and, therefore, more reliable with information than humans.

How files are securely stored and searched for on cloud servers

At Intelligent Voice we take emails, phone calls and other communication and put them through a powerful, AI-driven analytics engine. This helps companies see what kind of conversations their team is having with customers, among other things.

The results from this, including transcripts of video files and phone calls, can now be stored securely using homomorphic encryption on cloud servers.

We can search encrypted audio transcripts without ever decrypting them. The cloud server never sees them in plaintext form and privacy is assured.

Below we’ll go over how this works with an audio file. However, the approach is the same for files that are already in plaintext.

Architecture of homomorphic-based encrypted phonetic-string-search

Data Flow

We reduce an audio or text file into symbols (which could be phonetically based). These symbols are the “content” that’s indexed on our cloud servers.

The encrypted audio and symbols are uploaded to the cloud and added to an encrypted index.

When a search for a file is initiated, the search term is encrypted using our algorithms to find matching symbols. Relevant files and file portions are returned.

Legend

Light blue: Encrypt Audio File

Blue: Cloud Server

Green: Turn Audio into phonetic symbols and encrypt

Yellow: Homomorphic representation of phonemes

Red: Client-side search preparation

Purple: Encrypted results returned

Glossary

AES encryption: A very powerful “symmetric” encryption technique ie the key used to encrypt is the same as the key used to decrypt

Trapdoor: A mathematical function that is easy to compute in one direction, but very difficult to reverse engineer from just the answer

This symbol approach is important (and patent pending) because it reduces “search space.” Technologists have found that if you search for words using this approach, it’s painfully slow because of the processing power required. You might be trying to find over a million possible combinations.

However, if we take a word or phrase and reduce it to symbols — homomorphic HH AO MX AH MX AO RX FX IH KX, for instance — there are only dozens of available symbols. So we index these instead, across voice or text, and the search space is reduced from millions to dozens of units. Instead of looking for collections of matching words, we’re looking for matching streams.

Take a banking institution for instance. While the customer service representative is asking you questions about your social security number and where you live, voice print recognition software could be working in the background for enhanced security. It would identify characteristics of your voice like pronunciation, emphasis, accent, and talking speed.

Currently, it’s harder for someone to steal someone’s unique voiceprint than it is to steal information like social security and account numbers. But it’s not impossible. A hacker could easily hack a third-party cloud server that has your voiceprint and use voice mimicking software to hack your financial accounts.

The recent CloudPets hack shows just how easy this is. Using homomorphically encrypted and stored audio would significantly increase the security and privacy of this data

Conclusion

Even though homomorphic encryption was discovered decades ago, there’s only recently been enough computer processing power to make homomorphic storage and search practical. Before, it would take hours or days to do what now takes seconds.

This is good news for cloud service providers, because even though cloud servers can be hacked, it won’t matter as much if they and their customers are using homomorphic encryption to increase the overall security and privacy of their data: If the cloud has never had a “plain” version of the original data, the hacked data remains encrypted and inaccessible.

Now is the time to consider recording all of your calls, and have a transcript sent direct to your email. Our new product IVNOTE lets you do just that. We developed it to replaces the traditional lawyer’s Attendance Note (see https://www.linkedin.com/pulse/rip-death-attendance-note-nigel-cannings) but anyone in any industry can benefit

Now is the time to consider recording all of your calls, and have a transcript sent direct to your email.