17961277. GENERATING IN-DISTRIBUTION SAMPLES OF TIME-SERIES OR IMAGE DATA FOR THE NEIGHBORHOOD DISTRIBUTION simplified abstract (INTERNATIONAL BUSINESS MACHINES CORPORATION)

From WikiPatents
Jump to navigation Jump to search

GENERATING IN-DISTRIBUTION SAMPLES OF TIME-SERIES OR IMAGE DATA FOR THE NEIGHBORHOOD DISTRIBUTION

Organization Name

INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor(s)

Natalia Martinez Gil of Durham NC (US)

Kanthi Sarpatwar of Elmsford NY (US)

Roman Vaculin of Larchmont NY (US)

GENERATING IN-DISTRIBUTION SAMPLES OF TIME-SERIES OR IMAGE DATA FOR THE NEIGHBORHOOD DISTRIBUTION - A simplified explanation of the abstract

This abstract first appeared for US patent application 17961277 titled 'GENERATING IN-DISTRIBUTION SAMPLES OF TIME-SERIES OR IMAGE DATA FOR THE NEIGHBORHOOD DISTRIBUTION

Simplified Explanation

The abstract describes a method, system, and computer program product for generating in-distribution samples of data for a neighborhood distribution to be used by post-hoc local explanation methods. An autoencoder is trained to generate in-distribution samples of input data for the neighborhood distribution to be used by a post-hoc local explanation method. The training process involves mapping input data into a latent dimension, obtaining a mixed code by combining latent codes with a random coefficient, decoding the mixed code with interpretable features, and performing adversarial training against a discriminator to promote in-distribution samples.

  • An autoencoder is trained to generate in-distribution samples of data for a neighborhood distribution.
  • The training process involves mapping input data into a latent dimension, obtaining a mixed code, decoding with interpretable features, and adversarial training against a discriminator.

Potential Applications

This technology could be applied in various fields such as anomaly detection, fraud detection, and predictive maintenance where generating in-distribution samples is crucial for accurate analysis and decision-making.

Problems Solved

This technology addresses the challenge of generating representative in-distribution samples for post-hoc local explanation methods, improving the interpretability and reliability of the explanations provided.

Benefits

The benefits of this technology include enhanced model transparency, improved accuracy in local explanations, and better understanding of the underlying data distribution for decision-making processes.

Potential Commercial Applications

Potential commercial applications of this technology include financial services for fraud detection, healthcare for anomaly detection in patient data, and manufacturing for predictive maintenance of equipment.

Possible Prior Art

One possible prior art could be the use of generative adversarial networks (GANs) for generating synthetic data, but the specific approach of training an autoencoder to generate in-distribution samples for post-hoc local explanation methods may be novel.

Unanswered Questions

How does this technology compare to existing methods for generating in-distribution samples?

This article does not provide a direct comparison to existing methods for generating in-distribution samples. It would be helpful to understand the advantages and limitations of this technology compared to other approaches.

What are the computational requirements for implementing this technology at scale?

The article does not address the computational resources needed to implement this technology at scale. Understanding the computational demands could be crucial for practical applications in real-world scenarios.


Original Abstract Submitted

A computer-implemented method, system and computer program product for generating in-distribution samples of data for a neighborhood distribution to be used by post-hoc local explanation methods. An autoencoder is trained to generate in-distribution samples of input data for the neighborhood distribution to be used by a post-hoc local explanation method. Such training includes mapping the input data (e.g., time series data) into a latent dimension (or latent space) forming a first and a second latent code. A mixed code is then obtained by convexly combining the first and second latent codes with a random coefficient. The mixed code is then decoded with the input data masked with interpretable features to obtain conditional mixed reconstructions. Adversarial training is then performed against a discriminator in order to promote in-distribution samples by computing the reconstruction losses of the conditional mixed reconstructions as well as the discriminator losses and then minimizing such losses.