18318327. MULTI-LINGUAL NATURAL LANGUAGE GENERATION simplified abstract (Oracle International Corporation)

From WikiPatents
Revision as of 06:31, 26 April 2024 by Wikipatents (talk | contribs) (Creating a new page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

MULTI-LINGUAL NATURAL LANGUAGE GENERATION

Organization Name

Oracle International Corporation

Inventor(s)

Praneet Pabolu of Bangalore (IN)

Karan Dua of Najibabad (IN)

Sriram Chaudhury of Bangalore (IN)

MULTI-LINGUAL NATURAL LANGUAGE GENERATION - A simplified explanation of the abstract

This abstract first appeared for US patent application 18318327 titled 'MULTI-LINGUAL NATURAL LANGUAGE GENERATION

Simplified Explanation

The patent application describes a method for extracting keywords from articles in a target language using a machine learning model and generating a dataset of relevant keyword-text pairs.

  • Obtaining article-summary pairs in multiple languages from a text corpus.
  • Inputting articles into a machine learning model to generate embeddings for sentences.
  • Extracting keywords from articles based on sentence lengths.
  • Outputting the extracted keywords and applying a maximal marginal relevance algorithm to select relevant keywords.
  • Generating a dataset of keyword-text pairs with the relevant keywords and corresponding text from the articles.

Potential Applications

This technology could be applied in various fields such as natural language processing, information retrieval, and content summarization.

Problems Solved

This technology helps in automatically extracting relevant keywords from articles, which can assist in improving search engine optimization and content organization.

Benefits

The benefits of this technology include improved keyword extraction accuracy, efficient content summarization, and enhanced information retrieval.

Potential Commercial Applications

A potential commercial application of this technology could be in the development of SEO tools, content management systems, and text analysis software.

Possible Prior Art

One possible prior art for this technology could be existing keyword extraction algorithms and text summarization techniques.

Unanswered Questions

How does this method handle articles with complex sentence structures?

The machine learning model may struggle to accurately extract keywords from articles with complex sentence structures, leading to potential inaccuracies in the generated keyword-text pairs.

Can this method be applied to audio or video content for keyword extraction?

This method is specifically designed for text articles, so it may not be directly applicable to audio or video content without significant modifications to account for different data formats.


Original Abstract Submitted

A computer-implemented method includes obtaining, from text corpus including article-summary pairs in a plurality of languages, a plurality of article-summary pairs in a target language among the plurality of languages, to form an article-summary pairs dataset in which each article corresponds to a summary; inputting articles from the article-summary pairs to a machine learning model; generating, by the machine learning model, embeddings for sentences of the articles; extracting, by the machine learning model, keywords from the articles with a probability that varies based on lengths of the sentences, respectively; outputting, by the machine learning model, the keywords; applying a maximal marginal relevance algorithm to the extracted keywords, to select relevant keywords; and generating a keyword-text pairs dataset that includes the relevant keywords and text from the articles, the text corresponding to the relevant keywords in each of keyword-text pairs of the keyword-text pairs dataset.