17534610. ITERATIVELY UPDATING A DOCUMENT STRUCTURE TO RESOLVE DISCONNECTED TEXT IN ELEMENT BLOCKS simplified abstract (International Business Machines Corporation)
ITERATIVELY UPDATING A DOCUMENT STRUCTURE TO RESOLVE DISCONNECTED TEXT IN ELEMENT BLOCKS
Organization Name
International Business Machines Corporation
Inventor(s)
Daiki Tsuzuku of Kawasaki-shi (JP)
Shunsuke Ishikawa of Shinjuku-ku (JP)
Yasumasa Kajinaga of Funabashi-shi (JP)
Masaki Komedani of Yokohama-shi (JP)
Keisuke Nitta of Koshigaya-shi (JP)
ITERATIVELY UPDATING A DOCUMENT STRUCTURE TO RESOLVE DISCONNECTED TEXT IN ELEMENT BLOCKS - A simplified explanation of the abstract
This abstract first appeared for US patent application 17534610 titled 'ITERATIVELY UPDATING A DOCUMENT STRUCTURE TO RESOLVE DISCONNECTED TEXT IN ELEMENT BLOCKS
Simplified Explanation
The abstract describes a system and method for updating the structure of a digital document file to fix disconnected text in blocks of the document. The method involves analyzing the document to determine its structure, identifying blocks with disconnected text, determining the order of the blocks, pairing disconnected blocks to form complete sentences, and consolidating the paired blocks into new blocks.
- The system and method analyze the structure of a digital document file.
- Blocks with disconnected text are identified and flagged.
- The order of the blocks in the document is determined.
- Disconnected blocks are paired based on their order.
- Natural language processing is used to determine if the paired blocks form complete sentences.
- Paired blocks are consolidated to form new blocks with connected text.
Potential applications of this technology:
- Document editing and formatting software.
- Content management systems.
- Digital publishing platforms.
- Document conversion tools.
Problems solved by this technology:
- Fixing disconnected or fragmented text in digital documents.
- Improving the readability and coherence of documents.
- Streamlining the editing and formatting process for documents.
Benefits of this technology:
- Saves time and effort in manually fixing disconnected text.
- Enhances the overall quality and professionalism of digital documents.
- Improves the user experience when reading or editing documents.
- Increases productivity in document management and publishing workflows.
Original Abstract Submitted
A system and method iteratively update a determined structure of a digital document file to remediate disconnected text in blocks of the determined structure. In embodiments, a method includes determining a structure of a digital document file using a document understanding analysis, the structure including blocks of elements having text information; determining for each of the blocks of the digital document file whether text information in the block is disconnected; determining an order of the blocks in the digital document file; pairing two blocks from a list of blocks with disconnected text information to form a block pair, wherein the two blocks are ordered based on the determined order of the blocks; determining that the text information of the block pair forms a complete sentence using natural language processing; and consolidating the block pair to form a new block.
- International Business Machines Corporation
- Daiki Tsuzuku of Kawasaki-shi (JP)
- Shunsuke Ishikawa of Shinjuku-ku (JP)
- Yasumasa Kajinaga of Funabashi-shi (JP)
- Masaki Komedani of Yokohama-shi (JP)
- Keisuke Nitta of Koshigaya-shi (JP)
- Tohru Hasegawa of Tokyo (JP)
- G06F40/166
- G06F40/20
- G06F16/33
- G06V30/414
- G06V30/416