US Patent Application 17815211. CUSTOM DISPLAY POST PROCESSING IN SPEECH RECOGNITION simplified abstract

From WikiPatents
Jump to navigation Jump to search

CUSTOM DISPLAY POST PROCESSING IN SPEECH RECOGNITION

Organization Name

MICROSOFT TECHNOLOGY LICENSING, LLC

Inventor(s)

Wei Liu of Bijing (CN)

Padma Varadharajan of San Jose CA (US)

Piyush Behre of Santa Clara CA (US)

Nicholas Kibre of Redwood City CA (US)

Edward C. Lin of Beijing (CN)

Shuangyu Chang of Davis CA (US)

Che Zhao of Beijing (CN)

Khuram Shahid of Woodinville WA (US)

Heiko Willy Rahmel of Bellevue WA (US)

CUSTOM DISPLAY POST PROCESSING IN SPEECH RECOGNITION - A simplified explanation of the abstract

This abstract first appeared for US patent application 17815211 titled 'CUSTOM DISPLAY POST PROCESSING IN SPEECH RECOGNITION

Simplified Explanation

The patent application describes a solution for custom display post processing (DPP) in speech recognition (SR).

  • The solution uses a multi-stage DPP pipeline to transform a stream of SR tokens from lexical form to display form.
  • The first transformation stage of the pipeline converts a specific aspect of the tokens (e.g., disfluency, inverse text normalization, capitalization) from lexical form to display form.
  • The upstream filter and/or downstream filter can modify the tokens to customize the behavior of the DPP pipeline.
  • Additional transformation stages in the pipeline allow for further customization of the output text in a display format tailored to each user.
  • This enables users to efficiently utilize a common baseline DPP pipeline to generate customized output.


Original Abstract Submitted

Solutions for custom display post processing (DPP) in speech recognition (SR) use a customized multi-stage DPP pipeline that transforms a stream of SR tokens from lexical form to display form. A first transformation stage of the DPP pipeline receives the stream of tokens, in turn, by an upstream filter, a base model stage, and a downstream filter, and transforms a first aspect of the stream of tokens (e.g., disfluency, inverse text normalization (ITN), capitalization, etc.) from lexical form into display form. The upstream filter and/or the downstream filter alter the stream of tokens to change the default behavior of the DPP pipeline into custom behavior. Additional transformation stages of the DPP pipeline perform further transforms, allowing for outputting final text in a display format that is customized for a specific user. This permits each user to efficiently leverage a common baseline DPP pipeline to produce a custom output.