Adobe Inc. (20240330682). SYSTEMS AND METHODS FOR GENERATING SYNTHETIC TABULAR DATA FOR MACHINE LEARNING AND OTHER APPLICATIONS simplified abstract

From WikiPatents
Jump to navigation Jump to search

SYSTEMS AND METHODS FOR GENERATING SYNTHETIC TABULAR DATA FOR MACHINE LEARNING AND OTHER APPLICATIONS

Organization Name

Adobe Inc.

Inventor(s)

Surgan Jandial of Noida Uttar Pradesh (IN)

Siddarth Ramesh of Hyderabad Telangana (IN)

Piyush Gupta of Noida Uttar Pradesh (IN)

Gauri Gupta of Cambridge MA (US)

Balaji Krishnamurthy of Noida Uttar Pradesh (IN)

SYSTEMS AND METHODS FOR GENERATING SYNTHETIC TABULAR DATA FOR MACHINE LEARNING AND OTHER APPLICATIONS - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240330682 titled 'SYSTEMS AND METHODS FOR GENERATING SYNTHETIC TABULAR DATA FOR MACHINE LEARNING AND OTHER APPLICATIONS

Simplified Explanation: This patent application describes systems and methods for generating synthetic tabular data for machine learning and other applications. It involves training a variational autoencoder to learn inter-feature correlations in real tabular data, then using this information to train a generator model in a generative adversarial network (GAN) to create synthetic tabular data with similar correlations.

  • Trains a variational autoencoder to learn inter-feature correlations in real tabular data.
  • Uses the trained autoencoder to train a generator model in a GAN to generate synthetic tabular data.
  • The synthetic data exhibits the same inter-feature correlation distribution as the real data.
  • Involves receiving tabular data records, training machine learning models, and generating synthetic data based on learned correlations.

Key Features and Innovation: - Utilizes a variational autoencoder to capture inter-feature correlations in tabular data. - Employs a generative adversarial network (GAN) to generate synthetic tabular data. - Mimics the correlation distribution of real data in the synthetic data. - Enables the creation of large datasets for machine learning training.

Potential Applications: This technology can be applied in various fields such as: - Data augmentation for machine learning models. - Privacy-preserving data sharing. - Testing and validation of algorithms without access to real data.

Problems Solved: - Overcoming limitations in generating diverse and realistic synthetic data. - Addressing the need for large and diverse datasets for training machine learning models. - Providing a solution for data privacy concerns in sharing sensitive information.

Benefits: - Enhances the performance of machine learning models by providing diverse training data. - Facilitates research and development in data-driven applications. - Improves data privacy and security by enabling the use of synthetic data for testing and validation.

Commercial Applications: Potential commercial applications include: - Data science and analytics companies. - Healthcare and finance industries for data analysis. - Research institutions for algorithm development and testing.

Questions about Synthetic Tabular Data Generation: 1. How does the use of a variational autoencoder improve the generation of synthetic tabular data? 2. What are the key advantages of using a generative adversarial network (GAN) in this context?


Original Abstract Submitted

systems and methods for generating synthetic tabular data for machine learning and other applications are provided. in some embodiments, a variational autoencoder is trained to learn inter-feature correlations found in tabular data collected from real data sources. the trained variational autoencoder is used to train a generator model of a generative adversarial network (gan) to generate synthetic tabular data that exhibits the inter-feature correlation distribution found in the tabular data collected from real data sources. in some embodiments, processing devices perform operations comprising: receiving a set of tabular data records, each record comprising a plurality of features; training a first machine learning model using the tabular data records to learn correlations between the plurality of features; and training a second machine learning model, using the first machine learning model, to generate a synthetic tabular data records based at least on the one or more correlations between the plurality of features.