20240046677. TEXT BLOCK SEGMENTATION simplified abstract (INTERNATIONAL BUSINESS MACHINES CORPORATION)
Contents
TEXT BLOCK SEGMENTATION
Organization Name
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor(s)
Hai Cheng Wang of Beijing (CN)
Rajesh M. Desai of San Jose CA (US)
TEXT BLOCK SEGMENTATION - A simplified explanation of the abstract
This abstract first appeared for US patent application 20240046677 titled 'TEXT BLOCK SEGMENTATION
Simplified Explanation
The computer-implemented method described in the patent application involves segmenting text blocks based on a comparison of semantic information and predefined types of text block segmentation patterns. The method includes determining a segmentation pattern, calculating a degree of confidence in the size of the segmented text block, and determining if the size is non-optimal based on a predetermined threshold.
- The method segments text blocks based on semantic information and predefined patterns.
- It calculates a degree of confidence in the size of the segmented text block.
- It determines if the size of the segmented text block is non-optimal.
- The method utilizes a graph to indicate the types of text block segmentation patterns.
- Semantic entities associated with the segmented text block are compared with semantic entities indicated by leaf nodes stemming from a non-leaf node in the graph.
- The method aims to improve the accuracy of text block segmentation.
Potential Applications:
- Text analysis and processing in various industries such as legal, finance, and healthcare.
- Document management systems to enhance organization and retrieval of information.
- Natural language processing applications for improved understanding and extraction of text content.
Problems Solved:
- Inaccurate or suboptimal text block segmentation in automated systems.
- Difficulty in accurately determining the size and boundaries of text blocks.
- Challenges in extracting meaningful information from unstructured text.
Benefits:
- Improved accuracy in segmenting text blocks based on semantic information.
- Enhanced efficiency in text analysis and processing.
- Better organization and retrieval of information from documents.
- Increased understanding and extraction of relevant content from unstructured text.
Original Abstract Submitted
a computer-implemented method for text block segmentation includes determining a first text block segmentation pattern utilized to generate a segmented text block based, at least in part, on a comparison of semantic information associated with the segmented text block and a plurality of predefined types of text block segmentation patterns indicated by a graph; calculating a first degree of confidence in a size of the segmented text block based, at least in part, on comparing semantic entities associated with the segmented text block with semantic entities indicated by leaf nodes stemming from a first non-leaf node included in the graph and representative of the first type of text block segmentation pattern; and determining that the size of the segmented text block is non-optimal based on the calculated degree of confidence in the size of the segmented text block being below a predetermined threshold.