AUTOMATED DATA EXTRACTION PIPELINE FOR LARGE LANGUAGE MODEL TRAINING

Organization Name

Inventor(s)

Shruthan Radhakrishna of San Francisco CA (US)

Yazdan Jamshidi of San Francisco CA (US)

AUTOMATED DATA EXTRACTION PIPELINE FOR LARGE LANGUAGE MODEL TRAINING

This abstract first appeared for US patent application 20250060944 titled 'AUTOMATED DATA EXTRACTION PIPELINE FOR LARGE LANGUAGE MODEL TRAINING

Original Abstract Submitted

an automated data extraction pipeline for large language model (llm) training may include extracting a set of code segments from a set of natural language question-answer (q&a) combinations that each include a provided input, a provided output, and a provided code segment formatted to transform the provided input into the provided output. the data extraction pipeline may then generate a predicted output from a question portion of a first natural language q&a combination using a first llm. a first extracted code segment from the extracted set of code segments may then be executed to generate a first actual output of the first extracted code segment. one or more data samples may then be generated for training a second llm based on a comparison of the first actual output to the predicted output. the second llm may then be trained using the one or more data samples.

Salesforce, inc. (20250060944). AUTOMATED DATA EXTRACTION PIPELINE FOR LARGE LANGUAGE MODEL TRAINING

AUTOMATED DATA EXTRACTION PIPELINE FOR LARGE LANGUAGE MODEL TRAINING

Organization Name

Inventor(s)

AUTOMATED DATA EXTRACTION PIPELINE FOR LARGE LANGUAGE MODEL TRAINING

Original Abstract Submitted

Patent Application Monitoring