US Patent Application 18189883. System and Method to Generate Interpretable Embeddings for Domain Specific Small Corpus simplified abstract

From WikiPatents
Jump to navigation Jump to search

System and Method to Generate Interpretable Embeddings for Domain Specific Small Corpus

Organization Name

Robert Bosch GmbH


Inventor(s)

Rishabh Gupta of Bareilly (IN)


System and Method to Generate Interpretable Embeddings for Domain Specific Small Corpus - A simplified explanation of the abstract

  • This abstract for appeared for US patent application number 18189883 Titled 'System and Method to Generate Interpretable Embeddings for Domain Specific Small Corpus'

Simplified Explanation

This abstract describes a method and system for creating understandable representations of a small collection of text documents. The system cleans the documents and then uses a technique called semantic infusion to create a new corpus. The system then generates embeddings (representations) of the new corpus using a technique called word2vec. The system also creates baseline embeddings for comparison. These embeddings can be evaluated for interpretability and performance in classification tasks.


Original Abstract Submitted

A method and systems for generating interpretable and embeddings for a domain-specific small corpus of text-based documents are described. A processing module may obtain the plurality of text-based documents and perform a basic cleaning of each of the plurality of text-based documents. Further, the semantic infusion module may generate the semantically infused corpus using the semantic infusion technique. An embedding generation module is configured to compute the optimal dimensionality for the infused corpus and generate the infused optimal dimensional embeddings using word2vec technique. Further, the embedding generation module is configured to generate baseline optimal dimensional embeddings which can be used to evaluate in terms of interpretability and downstream classification task performance.--