20240013772. Multi-Channel Voice Activity Detection simplified abstract (Google LLC)
Contents
Multi-Channel Voice Activity Detection
Organization Name
Inventor(s)
Nolan Andrew Miller of Seattle WA (US)
Ramin Mehran of Mountain View CA (US)
Multi-Channel Voice Activity Detection - A simplified explanation of the abstract
This abstract first appeared for US patent application 20240013772 titled 'Multi-Channel Voice Activity Detection
Simplified Explanation
The patent application describes a method for multi-channel voice activity detection using a sequence of input frames of streaming multi-channel audio captured by an array of microphones. The method involves determining the location of the audio source relative to the user device using a location fingerprint model based on the audio features of each channel. An output is generated from an application-specific classifier, indicating the likelihood that the audio corresponds to a particular audio type that the application is configured to process. The method then determines whether to accept or reject the audio for processing based on the generated score.
- The method involves analyzing streaming multi-channel audio captured by an array of microphones.
- It uses a location fingerprint model to determine the location of the audio source relative to the user device.
- An application-specific classifier generates a score indicating the likelihood that the audio corresponds to a specific audio type.
- The method decides whether to accept or reject the audio for processing based on the generated score.
Potential Applications:
- Voice recognition systems: The method can be used in voice recognition systems to accurately detect and process voice commands or speech in multi-channel audio.
- Audio surveillance: It can be applied in audio surveillance systems to identify and analyze specific audio types or events in multi-channel audio recordings.
- Teleconferencing: The method can enhance the audio quality and intelligibility in teleconferencing systems by selectively processing audio based on the generated score.
Problems Solved:
- Accurate voice activity detection: The method solves the problem of accurately detecting voice activity in multi-channel audio by considering the location of the audio source and using an application-specific classifier.
- Efficient audio processing: By determining whether to accept or reject the audio for processing based on the generated score, the method optimizes the use of computational resources by only processing relevant audio.
Benefits:
- Improved accuracy: The method improves the accuracy of voice activity detection by incorporating location information and using an application-specific classifier.
- Enhanced audio quality: By selectively processing audio based on the generated score, the method can enhance the audio quality and intelligibility in various applications.
- Efficient resource utilization: The method optimizes the use of computational resources by only processing audio that is likely to be relevant, resulting in improved efficiency.
Original Abstract Submitted
a method for multi-channel voice activity detection includes receiving a sequence of input frames characterizing streaming multi-channel audio captured by an array of microphones. each channel of the streaming multi-channel audio includes respective audio features captured by a separate dedicated microphone. the method also includes determining, using a location fingerprint model, a location fingerprint indicating a location of a source of the multi-channel audio relative to the user device based on the respective audio features of each channel of the multi-channel audio. the method also includes generating an output from an application-specific classifier. the first score indicates a likelihood that the multi-channel audio corresponds to a particular audio type that the particular application is configured to process. the method also includes determining whether to accept or reject the multi-channel audio for processing by the particular application based on the first score generated as output from the application-specific classifier.