17986516. SYSTEMS AND METHODS FOR SEMANTIC SEGMENTATION FOR SPEECH simplified abstract (MICROSOFT TECHNOLOGY LICENSING, LLC)

From WikiPatents
Jump to navigation Jump to search

SYSTEMS AND METHODS FOR SEMANTIC SEGMENTATION FOR SPEECH

Organization Name

MICROSOFT TECHNOLOGY LICENSING, LLC

Inventor(s)

Sayan Dev Pathak of Kirkland WA (US)

Amit Kumar Agarwal of Redmond WA (US)

Amy Parag Shah of Bothell WA (US)

Sourish Chatterjee of Bothell WA (US)

Zoltan Romocsa of Kenmore WA (US)

Christopher Hakan Basoglu of Everett WA (US)

Piyush Behre of Santa Clara CA (US)

Shuangyu Chang of Davis CA (US)

Emilian Yordanov Stoimenov of Bellevue WA (US)

SYSTEMS AND METHODS FOR SEMANTIC SEGMENTATION FOR SPEECH - A simplified explanation of the abstract

This abstract first appeared for US patent application 17986516 titled 'SYSTEMS AND METHODS FOR SEMANTIC SEGMENTATION FOR SPEECH

Simplified Explanation

The patent application describes systems that decode streaming audio data to identify linguistic boundaries and apply punctuation accordingly. Here are some key points to note:

  • Systems decode streaming audio data to identify language utterances.
  • Linguistic boundaries are determined within the decoded data.
  • Punctuation is applied at linguistic boundaries.
  • The output includes the first portion of the data up to the boundary.
  • Output is delayed until punctuation validation processes are completed.

Potential Applications

This technology could be applied in various fields such as:

  • Speech recognition software
  • Language translation tools
  • Transcription services

Problems Solved

The technology addresses the following issues:

  • Improving accuracy in speech recognition
  • Enhancing the efficiency of language processing
  • Streamlining transcription processes

Benefits

The technology offers the following benefits:

  • Increased precision in identifying linguistic boundaries
  • Improved readability of transcribed audio data
  • Enhanced user experience in language-related applications

Potential Commercial Applications

With its capabilities, this technology could be utilized in:

  • Virtual assistants
  • Call center operations
  • Language learning platforms

Possible Prior Art

One possible prior art could be the use of speech recognition software that identifies pauses in speech to determine sentence boundaries.

Unanswered Questions

How does this technology handle different languages and accents?

The patent application does not specify how the system adapts to various languages and accents to accurately identify linguistic boundaries.

What is the processing speed of the system in real-time applications?

The patent application does not provide information on the processing speed of the system when used in real-time scenarios.


Original Abstract Submitted

Systems are configured to obtain streaming audio data comprising language utterances, continuously decode the streaming audio data in order to generate decoded streaming audio data and determine whether a linguistic boundary exists within an initial segment of decoded streaming audio data. When a linguistic boundary is determined to exist, the systems apply a punctuation at the linguistic boundary and output a first portion of the initial segment of the streaming audio data ending at the linguistic boundary while refraining from outputting a second portion of the initial segment which is located temporally subsequent to the first portion of the initial segment. Systems are also configured to delay the output until predetermined punctuation validation processes have been performed.