Oracle international corporation (20240127004). MULTI-LINGUAL NATURAL LANGUAGE GENERATION simplified abstract

From WikiPatents
Revision as of 03:07, 26 April 2024 by Wikipatents (talk | contribs) (Creating a new page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

MULTI-LINGUAL NATURAL LANGUAGE GENERATION

Organization Name

oracle international corporation

Inventor(s)

Praneet Pabolu of Bangalore (IN)

Karan Dua of Najibabad (IN)

Sriram Chaudhury of Bangalore (IN)

MULTI-LINGUAL NATURAL LANGUAGE GENERATION - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240127004 titled 'MULTI-LINGUAL NATURAL LANGUAGE GENERATION

Simplified Explanation

The patent application describes a method for extracting keywords from articles in a target language using a machine learning model and generating a dataset of keyword-text pairs for further analysis.

  • Obtaining article-summary pairs in multiple languages from a text corpus.
  • Inputting articles into a machine learning model to generate embeddings for sentences.
  • Extracting keywords from the articles based on sentence lengths.
  • Outputting the extracted keywords.
  • Applying a maximal marginal relevance algorithm to select relevant keywords.
  • Generating a dataset of keyword-text pairs with relevant keywords and corresponding text.

Potential Applications

This technology could be applied in various fields such as natural language processing, information retrieval, and content analysis.

Problems Solved

This technology helps in automating the process of keyword extraction from articles, which can be time-consuming and labor-intensive when done manually.

Benefits

The benefits of this technology include improved efficiency in analyzing large amounts of text data, better organization of information, and enhanced search capabilities.

Potential Commercial Applications

One potential commercial application of this technology could be in developing tools for content creators, marketers, and researchers to optimize their content for search engines and improve information retrieval.

Possible Prior Art

One possible prior art could be existing keyword extraction algorithms used in natural language processing and information retrieval systems.


Original Abstract Submitted

a computer-implemented method includes obtaining, from text corpus including article-summary pairs in a plurality of languages, a plurality of article-summary pairs in a target language among the plurality of languages, to form an article-summary pairs dataset in which each article corresponds to a summary; inputting articles from the article-summary pairs to a machine learning model; generating, by the machine learning model, embeddings for sentences of the articles; extracting, by the machine learning model, keywords from the articles with a probability that varies based on lengths of the sentences, respectively; outputting, by the machine learning model, the keywords; applying a maximal marginal relevance algorithm to the extracted keywords, to select relevant keywords; and generating a keyword-text pairs dataset that includes the relevant keywords and text from the articles, the text corresponding to the relevant keywords in each of keyword-text pairs of the keyword-text pairs dataset.