18082811. Automatic Generation of Training and Testing Data for Machine-Learning Models simplified abstract (GOOGLE LLC)

From WikiPatents
Revision as of 06:25, 8 May 2024 by Wikipatents (talk | contribs) (Creating a new page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Automatic Generation of Training and Testing Data for Machine-Learning Models

Organization Name

GOOGLE LLC

Inventor(s)

Madhav Datt of Mountain View CA (US)

Sukriti Ramesh of Bengaluru (IN)

Automatic Generation of Training and Testing Data for Machine-Learning Models - A simplified explanation of the abstract

This abstract first appeared for US patent application 18082811 titled 'Automatic Generation of Training and Testing Data for Machine-Learning Models

Simplified Explanation

The abstract describes a system for generating training and testing data for machine-learning models by extracting signals from a data store, processing the raw data, and generating input data for a machine-learning pipeline.

  • Receiving signal extraction information with instructions to query a data store
  • Accessing raw data from the data store using SQL code generated based on the signal extraction information
  • Processing the raw data to generate a plurality of signals using signal configuration information
  • Joining the signals with a label source to generate training and testing data using SQL code
  • Processing the training and testing data to create input data for a machine-learning pipeline

Potential Applications

This technology can be applied in various fields such as finance, healthcare, marketing, and cybersecurity for developing and testing machine-learning models.

Problems Solved

This technology streamlines the process of generating training and testing data for machine-learning models, reducing manual effort and improving efficiency in model development.

Benefits

The system automates the data preparation process, accelerates model training, and enhances the accuracy and performance of machine-learning models.

Potential Commercial Applications

"Automated Data Generation for Machine-Learning Models: Revolutionizing Model Development"

Possible Prior Art

One possible prior art could be the use of manual data extraction and processing methods for generating training and testing data for machine-learning models.

What are the potential limitations of this technology in real-world applications?

The abstract does not mention any potential limitations of the technology in real-world applications. However, some challenges that may arise include data quality issues, scalability concerns, and the need for continuous optimization of the signal extraction and processing algorithms.

How does this technology compare to existing data generation methods for machine-learning models?

The abstract does not provide a direct comparison to existing data generation methods for machine-learning models. However, this technology appears to offer automation and efficiency advantages over traditional manual data extraction and processing methods. Further analysis and comparison with existing methods would be necessary to determine the full extent of its benefits.


Original Abstract Submitted

Provided are computing systems, methods, and platforms for generating training and testing data for machine-learning models. The operations can include receiving signal extraction information that has instructions to query a data store. Additionally, the operations can include accessing, using Structured Query Language (SQL) code generated based on the signal extraction information, raw data from the data store. Moreover, the operations can include processing the raw data using signal configuration information to generate a plurality of signals. The signal configuration information can have instructions on how to generate the plurality of signals from the raw data. Furthermore, the operations can include joining, using SQL code, the plurality of signals with a first label source to generate training data and testing data. Subsequently, the operations can include processing the training data and the testing data to generate the input data. The input data being an ingestible for a machine-learning pipeline.