18430829. SYSTEMS AND METHODS FOR DATA STREAM USING SYNTHETIC DATA GENERATION simplified abstract (Capital One Services, LLC)

From WikiPatents
Jump to navigation Jump to search

SYSTEMS AND METHODS FOR DATA STREAM USING SYNTHETIC DATA GENERATION

Organization Name

Capital One Services, LLC

Inventor(s)

Anh Truong of Champaign IL (US)

Jeremy Goodsitt of Champaign IL (US)

Austin Walters of Savoy IL (US)

SYSTEMS AND METHODS FOR DATA STREAM USING SYNTHETIC DATA GENERATION - A simplified explanation of the abstract

This abstract first appeared for US patent application 18430829 titled 'SYSTEMS AND METHODS FOR DATA STREAM USING SYNTHETIC DATA GENERATION

Simplified Explanation

The patent application describes systems and methods for generating synthetic data using machine learning techniques in real-time. The system receives a continuous data stream, processes it, creates bins without overlapping, determines the number of samples in each bin based on bin edges, and populates the dataset with synthetic data.

  • Receiving continuous data stream and processing it in real-time
  • Using machine learning techniques to generate synthetic data
  • Creating non-overlapping bins within a data range
  • Determining number of samples in each bin based on bin edges

Potential Applications

The technology can be applied in various fields such as finance, healthcare, and marketing for generating synthetic data for training machine learning models, testing algorithms, and conducting simulations.

Problems Solved

1. Generating synthetic data efficiently and in real-time 2. Creating non-overlapping bins to organize data effectively

Benefits

1. Improved data processing speed 2. Enhanced accuracy in generating synthetic data 3. Efficient organization of data in bins

Potential Commercial Applications

Optimizing marketing campaigns, improving healthcare data analysis, enhancing financial risk assessment models

Possible Prior Art

One possible prior art could be the use of traditional data generation techniques that may not be as efficient or real-time as the system described in the patent application.

Unanswered Questions

How does the system handle outliers in the continuous data stream?

The patent application does not mention how outliers in the continuous data stream are handled during the synthetic data generation process. This aspect is crucial as outliers can significantly impact the accuracy of the generated synthetic data.

What is the scalability of the system for handling large datasets?

The scalability of the system for processing and generating synthetic data from large datasets is not addressed in the patent application. Understanding the system's scalability is essential for determining its practical applications in real-world scenarios with extensive data requirements.


Original Abstract Submitted

Systems and methods for synthetic data generation. A system includes at least one processor and a storage medium storing instructions that, when executed by the one or more processors, cause the at least one processor to perform operations including receiving a continuous data stream from an outside source, processing the continuous data stream in real-time, and using machine learning techniques to generating synthetic data to populate the dataset. The operations also include creating a plurality of bins, wherein the plurality of bins occupy a data range between the determined minimum and maximum values without overlapping; and determining a number of samples within each of the created bin, based on a bin edges, wherein the bin edges are bounds within the data range.