Adobe Inc. (20240220530). MULTI-MODAL SOUND EFFECTS RECOMMENDATION simplified abstract

From WikiPatents
Jump to navigation Jump to search

MULTI-MODAL SOUND EFFECTS RECOMMENDATION

Organization Name

Adobe Inc.

Inventor(s)

Julia Lepley Wilkins of Brooklyn NY (US)

Oriol Nieto-caballero of Oakland CA (US)

Justin Salamon of San Francisco CA (US)

MULTI-MODAL SOUND EFFECTS RECOMMENDATION - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240220530 titled 'MULTI-MODAL SOUND EFFECTS RECOMMENDATION

Simplified Explanation: The patent application describes a system that recommends sound effects based on a multi-modal embedding space that incorporates visuals, text, and audio. An encoder generates a query embedding in this space, identifying relevant sound effect embeddings to provide recommendations.

  • Sound effects system using multi-modal embedding space
  • Encoder generates query embedding incorporating visuals, text, and audio
  • Recommends sound effects based on identified sound effect embeddings
  • Provides recommendations for sound effects corresponding to the query embedding

Potential Applications: This technology could be applied in various industries such as film production, video game development, virtual reality experiences, and audiovisual content creation.

Problems Solved: This technology streamlines the process of selecting appropriate sound effects for visual and textual content, enhancing the overall audiovisual experience for users.

Benefits: - Improved user experience with tailored sound effects - Time-saving in selecting sound effects for multimedia projects - Enhanced creativity and storytelling through audiovisual content

Commercial Applications: The technology could be utilized by multimedia production companies, video game developers, virtual reality experience creators, and content creators on platforms like YouTube and TikTok.

Questions about Sound Effects System: 1. How does the system determine the relevance of sound effect embeddings to the input query? 2. What are the potential limitations of using a multi-modal embedding space for recommending sound effects?


Original Abstract Submitted

a sound effects system recommends sound effects using a multi-modal embedding space for projecting visuals, text, and audio. given an input query comprising a visual (i.e., an image/video) and/or text, an encoder generates a query embedding in the multi-modal embedding space in which sound effects have been projected into sound effect embeddings. a relevant sound effect embedding in the multi-modal space is identified using the query embedding, and a recommendation is provided for a sound effect corresponding to the sound effect embedding.