18318315. MULTI-LINGUAL NATURAL LANGUAGE GENERATION simplified abstract (Oracle International Corporation)

From WikiPatents
Revision as of 06:32, 26 April 2024 by Wikipatents (talk | contribs) (Creating a new page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

MULTI-LINGUAL NATURAL LANGUAGE GENERATION

Organization Name

Oracle International Corporation

Inventor(s)

Praneet Pabolu of Bangalore (IN)

Karan Dua of Najibabad (IN)

Sriram Chaudhury of Bangalore (IN)

MULTI-LINGUAL NATURAL LANGUAGE GENERATION - A simplified explanation of the abstract

This abstract first appeared for US patent application 18318315 titled 'MULTI-LINGUAL NATURAL LANGUAGE GENERATION

Simplified Explanation

The method described in the abstract involves training a base model for text generation and summarization using an input model pretrained on multiple languages and a base vocabulary. The base model is trained using enhanced training datasets generated from public data to improve its performance in generating text and summarizing information.

  • The method involves preparing a base model using an input model pretrained on at least three different languages and a base vocabulary with words from two of those languages.
  • The base model is then trained using enhanced training datasets generated from public data to create text summarization and generation models.
  • The base model is further trained using datasets generated from the initial training datasets to improve its performance in generating the next sentence in a given text.

Potential Applications

This technology could be applied in various fields such as natural language processing, machine translation, and content generation for automated systems.

Problems Solved

This technology addresses the challenge of improving text generation and summarization models by leveraging multilingual pretrained models and enhanced training datasets.

Benefits

The use of multilingual pretrained models and enhanced training datasets can lead to more accurate and contextually relevant text generation and summarization, improving the overall quality of automated content creation.

Potential Commercial Applications

This technology could be utilized in content generation tools, chatbots, automated customer service systems, and other applications that require natural language processing capabilities.

Possible Prior Art

One possible prior art could be the use of pretrained language models in natural language processing tasks, as well as the use of enhanced training datasets to improve model performance.

Unanswered Questions

How does this method compare to existing text generation and summarization techniques?

This article does not provide a direct comparison to existing methods in the field. It would be helpful to understand the specific advantages and limitations of this approach compared to traditional text generation and summarization techniques.

What are the potential limitations or challenges of implementing this technology in real-world applications?

The article does not address the practical considerations or challenges that may arise when implementing this technology in real-world scenarios. It would be valuable to explore the potential obstacles or limitations that could affect the adoption and effectiveness of this method.


Original Abstract Submitted

A method includes preparing a base model using an input model pretrained on at least three languages different from each other and a base vocabulary including words corresponding to two languages among the at least three languages, where the preparing the base model includes constraining the input model to the words included in the base vocabulary; training the base model using a first enhanced training dataset generated from public data, to generate a text summarization model; training the base model using a second enhanced training dataset generated from the first enhanced training dataset, to generate a text generation model; and training the base model using a third enhanced training dataset that is generated using the second enhanced training dataset and the text summarization model, to generate a next sentence generation model.