Aural Examination: Making Sense of Audio Data

Audio Data

Interview with Nigel Cannings, CTO at Intelligent Voice 

Although the written word remains recorded in solid alphabetic form, speech is ephemeral and flits past us, to vanish in an instant. And even though call centres, for example, may record our every utterance, the process leaves a mind-boggling volume of data. How can we make sense of it all? Ask Nigel Cannings, CTO of Intelligent Voice. 

You have over 25 years of experience in law and technology. How did you develop the idea for Intelligent Voice? Was there a lightbulb moment and how did it get started? 

After the financial crash in 2008, I thought that maybe we could start to use natural language processing techniques to help uncover unusual behaviour in trader communications, since email and chat were prevalent in much of the investigations. It was over lunch one day, chatting about some work my father had done with a US police force using speech recognition many years before, that it occurred to me that combining all forms of communications, not just text, was the key to managing risk and fraud. At the end of the day, people are much less guarded on phone and video calls than in written communications. 

What are the emerging AI trends that stand out as important developments in the space in the last year or two? 

Large language models are the headline grabbers, with an arms race to make the biggest and “best”, but already we are seeing people fall out of love with them a little, due to the sheer processing power needed to build and run them – and a realisation that they are probably a dead end on the road to artificial general intelligence. For me, the next big growth area will not be so much in the technology, but how we consume it. Edge-based inference will certainly make a greater appearance to deal with security risks associated with cloud-based implementations. We can see this capability already with on-device AI chips appearing in higher-end smartphones, for example. But the big move will be towards end-to-end encryption of inferencing payloads using enclave technology, and maybe a resurgence of interest in cryptonets. Also, we are just beginning to see a renewed interest in spiking neural networks, which are the only neural network that actually mimics biology, and might be unlock some of the roadblocks on the route to AGI. 

How has the compliance and regulation technology industry changed over the last 10 years? How do you see it changing in the future? 

It has not moved as much as we might have hoped. We are talking about massive institutional changes, and massive infrastructure projects. The basics will always remain the same: you have to capture all of your data, and then analyse it, whether you use a human or a computer, and the techniques of analysis are only just beginning to show signs of change.  

The basics will always remain the same: you have to capture all of your data, and then analyse it, whether you use a human or a computer, and the techniques of analysis are only just beginning to show signs of change.

There has been a real increase in focus on compliance technology as we have been thrust into a remote and hybrid working environment, and the recent massive wave of fines for misuse of WhatsApp underlines this. Volume has never been a friend to compliance officers, so the massive increase in data means we have to look differently at how we analyse data, so almost by default, we need to introduce machine learning techniques to help learn from how humans analyse the data, as well as natural language processing to try to “understand” the context of what is being said to filter down communications. There is a lot of hype in this area in compliance technology at the moment but, as the reality catches up, we will see banks get better at capturing wrongdoing and filtering out noise.    

Banks, other financial institutions, and insurance companies can identify vulnerable customers through your technology. With the current cost of living crisis in the UK, how do you see the identification of and help for vulnerable customers evolving? What role can technology play in preventing scams? 

Vulnerability works at a number of levels, some of which are easier to deal with than others.  

For companies that are regulated, and so have monitoring systems in place, you can look at individual communications to see if the person you are dealing with actually understands what is being said or sold to them. Too often, we find that clients have agreed to something that is unsuitable because they have felt pressured, so if we can identify and rectify these calls, that is extremely helpful. 

Where more could be done is in the essentially unregulated world. I know that my parents-in-law receive calls all the time, trying to sell them useless insurance they don’t need (or which replicates what they have) or phishing for bank details. We have some options here, but it needs the help of telecoms companies, with simple fixes, like more sophisticated call blocking based on customer feedback of phone numbers, or white lists of phones whose origin is identifiable (so that doctors and legitimate companies such as utilities can get through). Then we can look at more complicated inline solutions which monitor calls in real time to help identify those that are real, and those that are not. If one of my relatives is being asked over the phone to provide bank details or to transfer money to a “safe” account, I want to know about it straight away. 

How concerning are the fraud issues facing insurers and financial institutions? How can technology be used to spot and reduce fraud?  

Unfortunately, in a recession, fraud, particularly insurance fraud, rises. So as we enter a time of greater economic uncertainty, we will see changes in people’s behaviour in either submitting false or inflated insurance claims. This is on top of the “business as usual” attacks on these businesses. 

Contact centres are both a strong and weak point in fraud prevention. On the one hand, we can use call centre agents to help us determine the veracity of a claim.  Asking a series of structured questions and analysing the answers helps us look for signs of deception in a way that can’t be done over chat or in a written form.  

On the other hand, fraudsters try to manipulate call centre agents using social engineering techniques (such as, “My Dad is sick and we need to get money from his account to pay for medical fees”) in a way that would not be possible if the same attack were done on an automated system. 

What is your proudest accomplishment through your work at Intelligent Voice?

We have had a number of world firsts but, for me, it is commercialising GPU technology for speech recognition back in 2015 that I am proudest of. I had been told it was impossible, but it allowed us to develop the fastest (and cheapest-to-run) speech recognition technology, which could run as well on-premise as it could in the cloud. Today, the idea of “inferencing” using GPU for speech recognition is a no-brainer!                         

What would be your advice for CTOs and up-and-coming technology innovators?

It doesn’t matter how great your product is functionally, a little bit of polish and pizazz goes a long way in selling it. 

Any idea that you are having, someone else is probably having the same one somewhere else! If you can, protect your best ideas. Sometimes this is just by exploiting your ideas quickly, and being first (and best) in the market. But it is also worth looking at patenting your inventions. You are never going to be able to rival the patent portfolios of the big players, but it gives you a stake in the ground. 

Be nimble and quick to market. And, something I learned the hard way, make it beautiful! It doesn’t matter how great your product is functionally, a little bit of polish and pizazz goes a long way in selling it. 

Lastly, what can we see in the future from Intelligent Voice? Do you have any exciting new developments? 

For me, three main areas: 

  1. Biometric voice identification. This is not a new technology area, but the technology that underpins it has changed completely in the last two to three years. We have identified a number of novel techniques that allow us to find people in large, unlabelled datasets that would have been impossible even 12 months ago. Doesn’t sound very sexy, but it is a fantastic way of cleaning up metadata and finding potential links between frauds that could not be found before. 
  2. Self-learning speech recognition. We have seen some amazing advances in speech recognition in the last few years, but there are a lot of difficult edge cases, be it accented speakers, poor-quality recordings, or low-resource languages. In the labs, we can now get systems to “teach” themselves how people ought to sound in these environments and self-tune with no human involvement. 
  3. Fully encrypted cloud processing. I mentioned this earlier, but it is a real game-changer. At the moment, when you send data to the cloud for processing, at some point your data ceases to be encrypted so that it can be processed or indexed.  This means your sensitive data is always going to be at risk of attack. We have developed fully encrypted indexing techniques, as well as encrypted processing that survives transit through an API, which means that the only time your data is unencrypted and unprotected is when it in the user’s hands. 

Executive Profile

Nigel Cannings

Nigel Cannings is the CTO at Intelligent Voice, a global leader in the development of proactive compliance and e-discovery technology solutions for voice, video, and other media. 

The views expressed in this article are those of the authors and do not necessarily reflect the views or policies of The World Financial Review.