Google llc (20240203404). ENABLING LARGE LANGUAGE MODEL-BASED SPOKEN LANGUAGE UNDERSTANDING (SLU) SYSTEMS TO LEVERAGE BOTH AUDIO DATA AND TEXTUAL DATA IN PROCESSING SPOKEN UTTERANCES simplified abstract

From WikiPatents
Jump to navigation Jump to search

ENABLING LARGE LANGUAGE MODEL-BASED SPOKEN LANGUAGE UNDERSTANDING (SLU) SYSTEMS TO LEVERAGE BOTH AUDIO DATA AND TEXTUAL DATA IN PROCESSING SPOKEN UTTERANCES

Organization Name

google llc

Inventor(s)

Nir Shabat of Geva (IL)

Volodymyr Polosukhin of Ramat Gan (IL)

Shlomo Fruchter of Ness Ziona (IL)

Golan Pundak of New York NY (US)

Roy Atsmon of Tel-Aviv (IL)

ENABLING LARGE LANGUAGE MODEL-BASED SPOKEN LANGUAGE UNDERSTANDING (SLU) SYSTEMS TO LEVERAGE BOTH AUDIO DATA AND TEXTUAL DATA IN PROCESSING SPOKEN UTTERANCES - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240203404 titled 'ENABLING LARGE LANGUAGE MODEL-BASED SPOKEN LANGUAGE UNDERSTANDING (SLU) SYSTEMS TO LEVERAGE BOTH AUDIO DATA AND TEXTUAL DATA IN PROCESSING SPOKEN UTTERANCES

Simplified Explanation: The patent application describes a method where a computing device receives audio data of a user's spoken utterance, processes it using automatic speech recognition to generate textual data, creates a semantic representation based on both the audio and textual data using a large language model, and uses this representation to fulfill the user's spoken request.

Key Features and Innovation:

  • Receiving audio data capturing a user's spoken utterance
  • Processing audio data with automatic speech recognition to generate textual data
  • Creating a semantic representation using a large language model
  • Utilizing the semantic representation to fulfill the user's spoken request

Potential Applications: This technology can be applied in virtual assistants, customer service chatbots, language translation services, and voice-controlled devices.

Problems Solved: This technology addresses the challenges of accurately understanding and responding to spoken language, improving user interaction with computing devices.

Benefits:

  • Enhanced user experience with voice-controlled devices
  • Improved accuracy in speech recognition and response
  • Efficient communication between users and computing devices

Commercial Applications: The technology can be used in smart speakers, mobile devices, call centers, and language translation services to enhance user interaction and streamline communication processes.

Prior Art: Prior research in automatic speech recognition, natural language processing, and language models can provide insights into the development and evolution of this technology.

Frequently Updated Research: Stay updated on advancements in automatic speech recognition, language models, and semantic representation techniques to enhance the performance and capabilities of this technology.

Questions about the Technology: 1. How does this technology improve user interaction with computing devices? 2. What are the potential challenges in implementing this technology in real-world applications?


Original Abstract Submitted

in various implementations, a method implemented by one or more processors of a computing device can comprise receiving audio data that captures a spoken utterance of a user; processing the audio data using an automatic speech recognition (asr) model to generate textual data corresponding to the spoken utterance; generating a semantic representation corresponding to the spoken utterance of the user based on applying both the audio data and the textual data as input across a large language model (llm); and causing the semantic representation corresponding to the spoken utterance of the user to be utilized in fulfilling the spoken utterance.