Qualcomm incorporated (20240134908). SOUND SEARCH simplified abstract
SOUND SEARCH
Organization Name
Inventor(s)
Rehana Mahfuz of San Diego CA (US)
Yinyi Guo of San Diego CA (US)
Erik Visser of San Diego CA (US)
SOUND SEARCH - A simplified explanation of the abstract
This abstract first appeared for US patent application 20240134908 titled 'SOUND SEARCH
Simplified Explanation
The device described in the abstract is a system that uses processors to generate query caption embeddings and select caption embeddings from a set of media files based on similarity metrics. The system then generates search results identifying media files associated with the selected caption embeddings.
- The device uses processors to generate query caption embeddings based on a query.
- The system selects caption embeddings from a set of media files based on similarity metrics.
- Search results are generated to identify media files associated with the selected caption embeddings.
Potential Applications
The technology described in this patent application could be applied in the following areas:
- Content recommendation systems
- Audio search engines
- Multimedia content retrieval systems
Problems Solved
The technology addresses the following issues:
- Improving search accuracy for sound-based queries
- Enhancing user experience in searching for specific audio content
Benefits
The technology offers the following benefits:
- Efficient retrieval of relevant media files based on sound descriptions
- Enhanced search capabilities for audio content
- Improved user satisfaction in finding specific sound-based information
Potential Commercial Applications
The technology could be commercially applied in:
- Music streaming services
- Podcast platforms
- Audio editing software
Possible Prior Art
One possible prior art for this technology could be the use of text-based search algorithms in multimedia content retrieval systems.
Unanswered Questions
How does the system handle variations in sound descriptions?
The abstract does not specify how the system accounts for different ways of describing sounds in the query and caption embeddings.
What is the computational complexity of the similarity metric used in selecting caption embeddings?
The abstract does not provide information on the computational resources required for calculating the similarity metric between query and caption embeddings.
Original Abstract Submitted
a device includes one or more processors configured to generate one or more query caption embeddings based on a query. the processor(s) are further configured to select one or more caption embeddings from among a set of embeddings associated with a set of media files of a file repository. each caption embedding represents a corresponding sound caption, and each sound caption includes a natural-language text description of a sound. the caption embedding(s) are selected based on a similarity metric indicative of similarity between the caption embedding(s) and the query caption embedding(s). the processor(s) are further configured to generate search results identifying one or more first media files of the set of media files. each of the first media file(s) is associated with at least one of the caption embedding(s).