20240046677. TEXT BLOCK SEGMENTATION simplified abstract (INTERNATIONAL BUSINESS MACHINES CORPORATION)

From WikiPatents
Jump to navigation Jump to search

TEXT BLOCK SEGMENTATION

Organization Name

INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor(s)

Ang Yi of Beijing (CN)

Jing Zhang of Beijing (CN)

Hai Cheng Wang of Beijing (CN)

Jun Hong Zhao of ShangDi (CN)

Rajesh M. Desai of San Jose CA (US)

Yang Zhong Li of Beijing (CN)

Xue Xu of Beijing (CN)

TEXT BLOCK SEGMENTATION - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240046677 titled 'TEXT BLOCK SEGMENTATION

Simplified Explanation

The computer-implemented method described in the patent application involves segmenting text blocks based on a comparison of semantic information and predefined types of text block segmentation patterns. The method includes determining a segmentation pattern, calculating a degree of confidence in the size of the segmented text block, and determining if the size is non-optimal based on a predetermined threshold.

  • The method segments text blocks based on semantic information and predefined patterns.
  • It calculates a degree of confidence in the size of the segmented text block.
  • It determines if the size of the segmented text block is non-optimal.
  • The method utilizes a graph to indicate the types of text block segmentation patterns.
  • Semantic entities associated with the segmented text block are compared with semantic entities indicated by leaf nodes stemming from a non-leaf node in the graph.
  • The method aims to improve the accuracy of text block segmentation.

Potential Applications:

  • Text analysis and processing in various industries such as legal, finance, and healthcare.
  • Document management systems to enhance organization and retrieval of information.
  • Natural language processing applications for improved understanding and extraction of text content.

Problems Solved:

  • Inaccurate or suboptimal text block segmentation in automated systems.
  • Difficulty in accurately determining the size and boundaries of text blocks.
  • Challenges in extracting meaningful information from unstructured text.

Benefits:

  • Improved accuracy in segmenting text blocks based on semantic information.
  • Enhanced efficiency in text analysis and processing.
  • Better organization and retrieval of information from documents.
  • Increased understanding and extraction of relevant content from unstructured text.


Original Abstract Submitted

a computer-implemented method for text block segmentation includes determining a first text block segmentation pattern utilized to generate a segmented text block based, at least in part, on a comparison of semantic information associated with the segmented text block and a plurality of predefined types of text block segmentation patterns indicated by a graph; calculating a first degree of confidence in a size of the segmented text block based, at least in part, on comparing semantic entities associated with the segmented text block with semantic entities indicated by leaf nodes stemming from a first non-leaf node included in the graph and representative of the first type of text block segmentation pattern; and determining that the size of the segmented text block is non-optimal based on the calculated degree of confidence in the size of the segmented text block being below a predetermined threshold.