17806556. MULTIMODAL DATA INFERENCE simplified abstract (INTERNATIONAL BUSINESS MACHINES CORPORATION)

From WikiPatents
Jump to navigation Jump to search

MULTIMODAL DATA INFERENCE

Organization Name

INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor(s)

Andrea Giovannini of Zurich (CH)

Antonio Foncubierta Rodriguez of Zurich (CH)

Niharika Dsouza of San Jose CA (US)

Tanveer Syeda-mahmood of Cupertino CA (US)

HONGZHI Wang of Santa Bruno CA (US)

MULTIMODAL DATA INFERENCE - A simplified explanation of the abstract

This abstract first appeared for US patent application 17806556 titled 'MULTIMODAL DATA INFERENCE

Simplified Explanation

The patent application describes a computer-implemented method for generating a machine learning model for multimodal data inference tasks. Here are the key points:

  • The method involves encoding each sample in a training dataset of multimodal data samples to produce a compressed vector representation in a latent space.
  • Features of the sample are perturbed to identify active features that significantly change the vector representation in each dimension of the latent space.
  • A sample graph is generated with nodes representing features and edges indicating the active features for each dimension.
  • A graph neural network model is trained using the sample graph to perform multimodal data inference tasks.
  • This method can be used in multimodal data inference systems to improve the accuracy and efficiency of machine learning models.

Potential applications of this technology:

  • Natural language processing: This method can be used to analyze and infer meaning from multimodal data sources such as text, images, and audio.
  • Autonomous vehicles: The technology can help in processing and interpreting data from various sensors to make informed decisions.
  • Healthcare: It can be applied to analyze medical data from different modalities to assist in diagnosis and treatment decisions.
  • Fraud detection: The method can be used to analyze multiple data sources to identify patterns and anomalies indicative of fraudulent activities.

Problems solved by this technology:

  • Efficient representation: The method provides a compressed vector representation of multimodal data samples, reducing the computational complexity of processing and analyzing large datasets.
  • Feature selection: By identifying active features that significantly impact the vector representation, the method helps in selecting relevant features for inference tasks, improving accuracy and reducing noise.
  • Multimodal integration: The sample graph and graph neural network model enable the integration of different modalities of data, allowing for more comprehensive and accurate inference.

Benefits of this technology:

  • Improved accuracy: By considering multiple modalities of data and selecting relevant features, the method can enhance the accuracy of machine learning models for multimodal data inference tasks.
  • Efficient processing: The compressed vector representation and feature selection help in reducing computational resources required for processing multimodal data.
  • Flexibility: The method can be applied to various domains and tasks that involve multimodal data, providing a versatile solution for inference tasks.


Original Abstract Submitted

Computer-implemented methods are provided for generating machine learning model for multimodal data inference tasks. Such a method includes, for each sample in a training dataset of multimodal data samples, encoding the sample to produce a compressed vector representation of the sample in a k-dimensional latent space, and perturbing features of the sample to identify, for each dimension of the latent space, a set of active features perturbation of each of which produces more than a threshold change in the vector representation in that dimension. The method further comprises generating a sample graph having nodes interconnected by edges, wherein the nodes comprise nodes representing respective said features of the sample and edges interconnecting nodes indicate the active features for each dimension. The sample graph is then used to train a graph neural network model to perform the multimodal data inference task. Multimodal data inference systems employing such models are also provided.