Microsoft technology licensing, llc (20240202518). PRIVACY-PRESERVING GENERATION OF SYNTHESIZED TRAINING DATA simplified abstract

From WikiPatents
Jump to navigation Jump to search

PRIVACY-PRESERVING GENERATION OF SYNTHESIZED TRAINING DATA

Organization Name

microsoft technology licensing, llc

Inventor(s)

Jason Michael Eisner of Baltimore MD (US)

Eui Chul Shin of San Francisco CA (US)

Fatemehsadat Mireshghallah of San Diego CA (US)

Tatsunori Benjamin Hashimoto of Palo Alto CA (US)

Yu Su of Columbus OH (US)

PRIVACY-PRESERVING GENERATION OF SYNTHESIZED TRAINING DATA - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240202518 titled 'PRIVACY-PRESERVING GENERATION OF SYNTHESIZED TRAINING DATA

    • Simplified Explanation:**

The patent application describes a method for generating a dataset of utterances using a computer while protecting user privacy. This dataset can be used to train a machine learning model.

    • Key Features and Innovation:**

- Training a differentially private parse tree generation model using private parse trees from a private dataset. - Training a differentially private parse-to-utterance model using private utterances and corresponding parse trees. - Generating a synthesized parse tree dataset by sampling parse trees from the trained model. - Generating a synthesized utterance dataset based on the synthesized parse trees.

    • Potential Applications:**

- Natural language processing research and development. - Training machine learning models for various applications. - Enhancing privacy protection in data synthesis processes.

    • Problems Solved:**

- Preserving user privacy while generating datasets for machine learning. - Ensuring the confidentiality of private utterances and parse trees. - Providing a method for creating synthetic datasets for training models.

    • Benefits:**

- Improved privacy protection in data synthesis. - Enhanced training of machine learning models. - Facilitating research in natural language processing.

    • Commercial Applications:**

The technology can be used in industries such as healthcare, finance, and marketing for developing personalized services and products based on user data.

    • Questions about the Technology:**

1. How does the technology ensure the privacy of user data during dataset generation? 2. What are the potential limitations of using synthesized datasets in training machine learning models?

    • Frequently Updated Research:**

Stay updated on advancements in differential privacy techniques and natural language processing algorithms to enhance the effectiveness of the technology.


Original Abstract Submitted

examples are disclosed that related to synthesizing a dataset of utterances in an automated manner using a computer while preserving user privacy. the synthesized dataset of utterances is usable to train a machine learning model. in one example, a differentially private parse tree generation model is trained based at least on private parse trees of a private utterance-parse tree dataset. a differentially private parse-to-utterance model is trained based at least on private utterances and corresponding private parse trees of the private utterance-parse tree dataset. a synthesized parse tree dataset is generated. the synthesized parse tree dataset includes synthesized parse trees sampled at random from the trained differentially private parse tree generation model. a synthesized utterance dataset is generated, via the trained differentially private parse-to-utterance model. the synthesized utterance dataset includes synthesized utterances that are generated based at least on the synthesized parse trees of the synthesized parse tree dataset.