17988315. Systems And Methods For Training Translation Models Using Source-Augmented Training Examples simplified abstract (GOOGLE LLC)

From WikiPatents
Jump to navigation Jump to search

Systems And Methods For Training Translation Models Using Source-Augmented Training Examples

Organization Name

GOOGLE LLC

Inventor(s)

Jing Huang of Sunnyvale CA (US)

Apurva Shah of Burlingame CA (US)

Melvin Johnson of Sunnyvale CA (US)

Viresh Ratnakar of Los Altos CA (US)

Maxim Krikun of Castro Valley CA (US)

Systems And Methods For Training Translation Models Using Source-Augmented Training Examples - A simplified explanation of the abstract

This abstract first appeared for US patent application 17988315 titled 'Systems And Methods For Training Translation Models Using Source-Augmented Training Examples

Simplified Explanation

The patent application describes systems and methods for training a translation model using text sequences in different languages and a label based on the source of the text sequence.

  • The translation model is trained using a first text sequence in one language, a second text sequence in a different language, and a label indicating the source of the second text sequence.
  • The label can include information such as an Internet domain, subdomain, URL, website name, or IP address.
  • The label may also indicate the source of the first text sequence.
  • Training examples are automatically generated by sampling the first text sequence from a page of an Internet domain, sampling the second text sequence from another page of the same domain, and generating the label based on the source data of the second page.

Potential applications of this technology:

  • Improving machine translation systems by training them on specific domains or sources.
  • Enhancing translation accuracy by incorporating information about the source of the text.
  • Enabling more targeted translation models for specific websites or IP addresses.

Problems solved by this technology:

  • Overcoming limitations of generic translation models by training them on specific domains or sources.
  • Addressing the challenge of accurately translating text from different languages by considering the source of the text.
  • Providing a more customized and accurate translation experience for specific websites or IP addresses.

Benefits of this technology:

  • Improved translation quality and accuracy.
  • Enhanced understanding of the context and source of the text being translated.
  • Customized translation models for specific domains or sources, leading to more relevant and precise translations.


Original Abstract Submitted

Systems and methods for training a translation model based on a first text sequence in a first language, a second text sequence in a second language different from the first language, and a label based on a source of the second text sequence. In some examples, the label may comprise an Internet domain, an Internet subdomain, a uniform resource locator, a website name, or an IP address. In some examples, the label may further indicate a source of the first text sequence. In some examples, each given training example may be automatically generated by sampling the first text sequence from a first page of a given Internet domain, sampling the second text sequence from a second page of the given Internet domain, and generating the label based on all or a portion of source data of the second page.