20240029743. INTERMEDIATE DATA FOR INTER-DEVICE SPEECH PROCESSING simplified abstract (Amazon Technologies, Inc.)

From WikiPatents
Jump to navigation Jump to search

INTERMEDIATE DATA FOR INTER-DEVICE SPEECH PROCESSING

Organization Name

Amazon Technologies, Inc.

Inventor(s)

Stanislaw Ignacy Pasko of Zawonia (PL)

Pawel Zelazko of Gdansk (PL)

Cagdas Bak of Gdansk (PL)

Eli Joshua Fidler of Toronto (CA)

Michal Kowalczuk of Gdansk (PL)

Andrew Oberlin of Lynnwood WA (US)

Ariya Rastrow of Seattle WA (US)

INTERMEDIATE DATA FOR INTER-DEVICE SPEECH PROCESSING - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240029743 titled 'INTERMEDIATE DATA FOR INTER-DEVICE SPEECH PROCESSING

Simplified Explanation

The abstract of the patent application describes a speech processing system that can handle certain commands on the device itself, instead of sending the audio data to another device or system for processing. The first device has limited speech processing capabilities for handling common language and commands, while the second device can call on additional language models, entity libraries, skill components, etc. for performing additional tasks. An intermediate data generator helps divide speech processing operations between devices by generating a data stream that includes a first-pass automatic speech recognition (ASR) output and other characteristics of the audio data. The second device can then perform additional processing using this data stream, without needing the audio data itself. This approach enhances privacy by processing the audio data locally without sending it to other devices/systems.

  • The system allows some speech commands to be processed on the device itself, reducing the need for sending audio data to other devices or systems.
  • The first device has limited speech processing capabilities, while the second device can leverage additional language models, entity libraries, skill components, etc.
  • An intermediate data generator generates a data stream that includes a first-pass ASR output and other characteristics of the audio data.
  • The second device can perform additional processing using this data stream, without requiring the actual audio data.
  • This approach enhances privacy by processing the audio data locally without sending it to other devices/systems.

Potential Applications

  • Voice assistants and smart speakers that can handle certain commands locally without relying on cloud processing.
  • Mobile devices with limited processing capabilities that can perform basic speech processing tasks on-device.
  • Edge computing systems that can offload some speech processing tasks to a remote system while maintaining privacy.

Problems Solved

  • Reduces the need for sending audio data to other devices or systems for processing, improving efficiency and privacy.
  • Enables devices with limited processing capabilities to handle common language and commands without relying on external systems.
  • Facilitates the division of speech processing operations between devices, allowing for distributed processing and improved performance.

Benefits

  • Enhanced privacy by processing audio data locally without sending it to other devices or systems.
  • Improved efficiency by handling certain speech commands on the device itself, reducing reliance on external processing.
  • Flexibility to leverage additional language models, entity libraries, skill components, etc. for performing advanced speech processing tasks.


Original Abstract Submitted

some speech processing systems may handle some commands on-device rather than sending the audio data to a second device or system for processing. the first device may have limited speech processing capabilities sufficient for handling common language and/or commands, while the second device (e.g., an edge device and/or a remote system) may call on additional language models, entity libraries, skill components, etc. to perform additional tasks. an intermediate data generator may facilitate dividing speech processing operations between devices by generating a stream of data that includes a first-pass asr output (e.g., a word or sub-word lattice) and other characteristics of the audio data such as whisper detection, speaker identification, media signatures, etc. the second device can perform the additional processing using the data stream; e.g., without using the audio data. thus, privacy may be enhanced by processing the audio data locally without sending it to other devices/systems.