17534610. ITERATIVELY UPDATING A DOCUMENT STRUCTURE TO RESOLVE DISCONNECTED TEXT IN ELEMENT BLOCKS simplified abstract (International Business Machines Corporation)

From WikiPatents
Jump to navigation Jump to search

ITERATIVELY UPDATING A DOCUMENT STRUCTURE TO RESOLVE DISCONNECTED TEXT IN ELEMENT BLOCKS

Organization Name

International Business Machines Corporation

Inventor(s)

Daiki Tsuzuku of Kawasaki-shi (JP)

Shunsuke Ishikawa of Shinjuku-ku (JP)

Yasumasa Kajinaga of Funabashi-shi (JP)

Masaki Komedani of Yokohama-shi (JP)

Keisuke Nitta of Koshigaya-shi (JP)

Tohru Hasegawa of Tokyo (JP)

ITERATIVELY UPDATING A DOCUMENT STRUCTURE TO RESOLVE DISCONNECTED TEXT IN ELEMENT BLOCKS - A simplified explanation of the abstract

This abstract first appeared for US patent application 17534610 titled 'ITERATIVELY UPDATING A DOCUMENT STRUCTURE TO RESOLVE DISCONNECTED TEXT IN ELEMENT BLOCKS

Simplified Explanation

The abstract describes a system and method for updating the structure of a digital document file to fix disconnected text in blocks of the document. The method involves analyzing the document to determine its structure, identifying blocks with disconnected text, determining the order of the blocks, pairing disconnected blocks to form complete sentences, and consolidating the paired blocks into new blocks.

  • The system and method analyze the structure of a digital document file.
  • Blocks with disconnected text are identified and flagged.
  • The order of the blocks in the document is determined.
  • Disconnected blocks are paired based on their order.
  • Natural language processing is used to determine if the paired blocks form complete sentences.
  • Paired blocks are consolidated to form new blocks with connected text.

Potential applications of this technology:

  • Document editing and formatting software.
  • Content management systems.
  • Digital publishing platforms.
  • Document conversion tools.

Problems solved by this technology:

  • Fixing disconnected or fragmented text in digital documents.
  • Improving the readability and coherence of documents.
  • Streamlining the editing and formatting process for documents.

Benefits of this technology:

  • Saves time and effort in manually fixing disconnected text.
  • Enhances the overall quality and professionalism of digital documents.
  • Improves the user experience when reading or editing documents.
  • Increases productivity in document management and publishing workflows.


Original Abstract Submitted

A system and method iteratively update a determined structure of a digital document file to remediate disconnected text in blocks of the determined structure. In embodiments, a method includes determining a structure of a digital document file using a document understanding analysis, the structure including blocks of elements having text information; determining for each of the blocks of the digital document file whether text information in the block is disconnected; determining an order of the blocks in the digital document file; pairing two blocks from a list of blocks with disconnected text information to form a block pair, wherein the two blocks are ordered based on the determined order of the blocks; determining that the text information of the block pair forms a complete sentence using natural language processing; and consolidating the block pair to form a new block.