18046041. ONLINE SPEAKER DIARIZATION USING LOCAL AND GLOBAL CLUSTERING simplified abstract (SAMSUNG ELECTRONICS CO., LTD.)
Contents
ONLINE SPEAKER DIARIZATION USING LOCAL AND GLOBAL CLUSTERING
Organization Name
Inventor(s)
Myungjong Kim of Milpitas CA (US)
Taeyeon Ki of Milpitas CA (US)
Vijendra Raj Apsingekar of San Jose CA (US)
ONLINE SPEAKER DIARIZATION USING LOCAL AND GLOBAL CLUSTERING - A simplified explanation of the abstract
This abstract first appeared for US patent application 18046041 titled 'ONLINE SPEAKER DIARIZATION USING LOCAL AND GLOBAL CLUSTERING
Simplified Explanation
The patent application describes a method for speaker identification using audio streams containing speech activity. The method involves generating embedding vectors for each segment of the audio stream and clustering them into one or more clusters for speaker identification.
- Obtaining an audio stream with speech activity
- Generating embedding vectors for each segment of the audio stream
- Clustering the embedding vectors into clusters for speaker identification
- Presenting sequences of speaker identities based on the speaker identification performed for local and global windows
Potential applications of this technology:
- Speaker identification in call centers or customer service applications
- Voice recognition and authentication systems
- Forensic analysis of audio recordings
- Automatic transcription and captioning services
Problems solved by this technology:
- Efficient and accurate speaker identification in audio streams with multiple speakers
- Reducing the need for manual speaker identification and annotation
- Handling variations in speech patterns and accents
Benefits of this technology:
- Improved accuracy and reliability in speaker identification
- Time-saving and cost-effective compared to manual identification methods
- Scalable for large audio datasets
- Can be integrated into existing speech processing systems
Original Abstract Submitted
A method includes obtaining at least a portion of an audio stream containing speech activity. At least the portion of the audio stream includes multiple segments. The method also includes, for each of the multiple segments, generating an embedding vector that represents the segment. The method further includes, within each of multiple local windows, clustering the embedding vectors into one or more clusters to perform speaker identification. Different clusters correspond to different speakers. The method also includes presenting at least one first sequence of speaker identities based on the speaker identification performed for the local windows. The method further includes, within each of multiple global windows, clustering the embedding vectors into one or more clusters to perform speaker identification. Each global window includes two or more local windows. In addition, the method includes presenting at least one second sequence of speaker identities based on the speaker identification performed for the global windows.