US Patent Application 18313252. INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND STORAGE MEDIUM FOR EXTRACTING A NAMED ENTITY FROM A DOCUMENT simplified abstract

From WikiPatents
Jump to navigation Jump to search

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND STORAGE MEDIUM FOR EXTRACTING A NAMED ENTITY FROM A DOCUMENT

Organization Name

CANON KABUSHIKI KAISHA


Inventor(s)

TOMOAKI Higo of Kanagawa (JP)

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND STORAGE MEDIUM FOR EXTRACTING A NAMED ENTITY FROM A DOCUMENT - A simplified explanation of the abstract

This abstract first appeared for US patent application 18313252 titled 'INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND STORAGE MEDIUM FOR EXTRACTING A NAMED ENTITY FROM A DOCUMENT

Simplified Explanation

The patent application describes an information processing apparatus that converts text data from a document image into a token string and calculates the processing time required for natural language processing based on the token string. It then divides the token string into blocks, ensuring that adjacent blocks overlap, and selects estimation results for tokens in the overlap portion from each block.

  • Information processing apparatus converts text data from a document image into a token string.
  • Calculates the number of processing times required for natural language processing based on the token string.
  • Divides the token string into blocks with overlapping portions between adjacent blocks.
  • Selects estimation results for tokens in the overlap portion from each block.


Original Abstract Submitted

An information processing apparatus for converting text data from a document image read from a document into a token string and calculates a number of processing times necessary for performing processing in a natural language processing model based on the token string. Then, at the time of division, the information processing apparatus divides the token string into blocks so that at least a portion overlaps between adjacent blocks based on the calculated number of processing times and for each token belonging to the overlap portion between the adjacent blocks, selects one of estimation results obtained from each block.