18520714. INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND STORAGE MEDIUM simplified abstract (CANON KABUSHIKI KAISHA)

From WikiPatents
Jump to navigation Jump to search

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND STORAGE MEDIUM

Organization Name

CANON KABUSHIKI KAISHA

Inventor(s)

Kodai Watanabe of Kanagawa (JP)

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND STORAGE MEDIUM - A simplified explanation of the abstract

This abstract first appeared for US patent application 18520714 titled 'INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND STORAGE MEDIUM

The abstract describes a method to improve the accuracy of extracting named entities representing characteristics of a document using a natural language processing model.

  • Data from the document is processed to generate a token string, which is then divided into input blocks for the natural language processing model.
  • The named entity for each input block is estimated by inputting them into the model.
  • The validity of each input block for extracting the named entity is determined based on the estimation results.
  • The named entity representing the document's characteristic is outputted based on the estimation and determination results.
    • Key Features and Innovation:**
  • Utilizes a natural language processing model to extract named entities from document data.
  • Improves the accuracy of extracting named entities representing document characteristics.
  • Processes text data into token strings and input blocks for efficient analysis.
    • Potential Applications:**

This technology can be applied in various fields such as information retrieval, document categorization, and content analysis.

    • Problems Solved:**
  • Enhances the accuracy of extracting named entities from documents.
  • Streamlines the process of identifying document characteristics.
  • Improves the efficiency of natural language processing tasks.
    • Benefits:**
  • Increases the precision of extracting named entities.
  • Enhances the overall quality of document analysis.
  • Facilitates better understanding and categorization of document content.
    • Commercial Applications:**

This technology can be utilized in industries such as data analytics, information retrieval systems, and content management platforms to improve document processing and analysis.

    • Questions about the Technology:**

1. How does this technology compare to traditional methods of named entity extraction? 2. What are the potential limitations of using a natural language processing model for this task?


Original Abstract Submitted

The accuracy of extracting a named entity representing a characteristic of a document is improved. An information processing apparatus that extracts the named entity from document data by using a natural language processing model obtains data of text from the document data, generates a token string by processing to breakdown the text into a token unit, generates input blocks by dividing the token string into blocks in a unit that can be processed by the natural language processing model, estimates the named entity for each input block by inputting each of the input blocks to the natural language processing model, determines whether each of the input blocks is valid to be used to extract the named entity representing the characteristic of the document data based on an estimation result, and outputs the named entity representing the characteristic of the document data based on the estimation result and a determination result.