A voiceprint, also called a voice template, is a unique pattern of a person’s voice sounds. Based on hundreds of subtle voice characteristics, voiceprints can be used to identify a person, just like fingerprints or facial recognition.
The uniqueness of voiceprints derives from the fact that there are over 70 body parts, each of which varies from person to person, that determine the exact sound of an individual’s voice, including the pitch and tone (Figure 1). In addition, people have different rhythms when speaking. Those factors all contribute to the uniqueness of voiceprints.
Voice and speed recognition are sometimes used interchangeably, but they are not the same. Voice recognition is used to identify people for security purposes, while speech recognition is used to identify words and convert them into a written transcript or to support interactions with an artificial agent.
Voice recognition generally has different applications than other biometric identification techniques, such as fingerprints or facial recognition. For example, it can be implemented remotely over a phone, while facial recognition and fingerprinting require the individual’s presence.
Under the proper circumstances, voice prints are highly accurate. However, they can also be less reliable than other authentication technologies. They can be affected by environmental interference, such as loud background noises, and physical impairments, such as respiratory illnesses, allergies, and vocal cord injuries. Finally, voice cloning tools are available for impersonation.
Voice authentication can be implemented using text-dependent or text-independent approaches. Text-dependent implementations require that the individual speak a specific two- or three-word passphrase. Text-independent systems can authenticate a person’s identity during a normal interaction without needing specific words to be spoken. In both cases, voice authentication requires voiceprint extraction before it can be implemented.
Voiceprint extraction
Voiceprint extraction can be active or passive. In active extraction, the individual participates by repeating specific phrases. This method can produce highly accurate voice prints. Passive extraction captures the voiceprint during regular conversation and does not rely on the individual’s active participation. It’s less intrusive but can also produce less robust results.
In either case, voiceprint extraction involves creating a digital biometric model of a person’s voice. It occurs in two steps:
- Acoustic analysis uses specialized software to measure and extract unique acoustic features, such as pitch, formant frequencies (frequency peaks in the vocal spectrum with high energy), and voice quality.
- The unique acoustic features are transformed into numerical values using statistical and artificial intelligence tools to create an enrollment voiceprint, which serves as the basis for future authentications. The process can include merging multiple enrollment voiceprints to produce higher accuracy and reliability.
User authentication
User authentication involves a similar process to enrollment. The new voice sample is analyzed acoustically, followed by developing a mathematical model to create a new voiceprint. The new voiceprint is compared with the enrollment voiceprint, and the user either passes or fails (Figure 2).
Summary
Under the right circumstances, voice recognition, like fingerprint or facial recognition, can provide accuracy. Its advantage is that it can be implemented remotely using a phone, and the authenticated individual does not need to be present. Voice recognition relies on voiceprint extraction, which can be implemented using active or passive techniques.
References
Biometrics Employing Neural Network, Arvix
Voice Biometrics – how easy is it to hack them with AI Deepfake?, securing
Voice Biometrics Recognition and Opportunities It Gives, siforce solutions
Voice Biometrics: The Essential Guide, Phonexia
What is Voice Biometrics, ID R&D
Related WTWH links
What are the six microphone polar patterns used for?
How does an acoustic camera work and what’s it good for?
Voice processor offers always-on functionality at <55mW
What’s an acoustic leak detector?
Scanning for defects with sound waves