SAS Institute Inc. (20240347064). SYSTEMS AND METHODS FOR ENHANCED SPEAKER DIARIZATION simplified abstract

From WikiPatents
Jump to navigation Jump to search

SYSTEMS AND METHODS FOR ENHANCED SPEAKER DIARIZATION

Organization Name

SAS Institute Inc.

Inventor(s)

Xiaolong Li of Cary NC (US)

Xiaozhuo Cheng of Cary NC (US)

Xu Yang of Cary NC (US)

SYSTEMS AND METHODS FOR ENHANCED SPEAKER DIARIZATION - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240347064 titled 'SYSTEMS AND METHODS FOR ENHANCED SPEAKER DIARIZATION

Simplified Explanation: The patent application describes a system that processes speech audio from a multi-turn conversation to generate an enhanced transcript with refined utterances and speaker identification values.

  • **Key Features and Innovation:**
   - Receiving speech audio of a multi-turn conversation
   - Textually segmenting speech into utterances
   - Generating a speaker diarization prompt with contextual information
   - Inputting data to a large language model
   - Obtaining an enhanced transcript with refined utterances and speaker identification values
  • **Potential Applications:**
   - Speech recognition technology
   - Conversational analysis tools
   - Voice-controlled systems
  • **Problems Solved:**
   - Enhancing accuracy in transcribing multi-turn conversations
   - Improving speaker identification in speech processing
  • **Benefits:**
   - Increased efficiency in analyzing conversations
   - Enhanced user experience in voice interactions
   - Improved data organization in speech data processing
  • **Commercial Applications:**
   - Speech-to-text software development
   - Customer service call analysis tools
   - Virtual assistant technology
  • **Prior Art:**
   Further research can be conducted in the field of speech processing, diarization, and language modeling to explore existing technologies and innovations related to this patent application.
  • **Frequently Updated Research:**
   Stay updated on advancements in speech recognition, natural language processing, and machine learning technologies to understand the latest developments in this field.

Questions about speech audio processing technology:

1. How does the system differentiate between speakers in a multi-turn conversation?

   - The system uses speaker diarization prompts and contextual information to associate speaker identification values with refined utterances.

2. What are the potential challenges in accurately transcribing speech audio from multi-turn conversations?

   - Challenges may include background noise, overlapping speech, and variations in speech patterns among speakers.


Original Abstract Submitted

a system, method, and computer-program product includes receiving speech audio of a multi-turn conversation, generating, via a speech-to-text process, a transcript of the speech audio, wherein the transcript of the speech audio textually segments speech spoken during the multi-turn conversation into a plurality of utterances, generating a speaker diarization prompt that includes contextual information about a plurality of speakers participating in the multi-turn conversation, inputting, to a large language model, the speaker diarization prompt and the transcript of the speech audio, and obtaining, from the large language model, an output comprising an enhanced transcript of the speech audio, wherein the enhanced transcript of the speech audio textually segments the speech spoken during the multi-turn conversation into a plurality of refined utterances and associates a speaker identification value with each of the plurality of refined utterances.