18170070. DOCUMENT CLASSIFICATION APPARATUS, METHOD, AND STORAGE MEDIUM simplified abstract (KABUSHIKI KAISHA TOSHIBA)

From WikiPatents
Jump to navigation Jump to search

DOCUMENT CLASSIFICATION APPARATUS, METHOD, AND STORAGE MEDIUM

Organization Name

KABUSHIKI KAISHA TOSHIBA

Inventor(s)

Kosei Fume of Kawasaki Kanagawa (JP)

DOCUMENT CLASSIFICATION APPARATUS, METHOD, AND STORAGE MEDIUM - A simplified explanation of the abstract

This abstract first appeared for US patent application 18170070 titled 'DOCUMENT CLASSIFICATION APPARATUS, METHOD, AND STORAGE MEDIUM

Simplified Explanation

The document classification apparatus described in the abstract utilizes word embedded spaces to analyze text content and classify document data based on similarity to common words.

  • The processing circuit acquires text content for logical elements in semi-structured document data.
  • Logical elements are selected and grouped into sets for analysis.
  • Word embedded spaces are constructed for each logical element set.
  • Common words shared between word embedded spaces are used to update and refine the classification process.
  • The classification result of the document data is output using the updated word embedded spaces and embedding information of a feature quantity of a classification target.

Potential Applications

This technology can be applied in various fields such as information retrieval, data mining, and natural language processing for efficient document classification and organization.

Problems Solved

This technology addresses the challenge of accurately classifying semi-structured document data by utilizing word embedded spaces and common word similarities to improve classification results.

Benefits

The use of word embedded spaces and common word similarities enhances the accuracy and efficiency of document classification, leading to improved data organization and retrieval.

Potential Commercial Applications

Potential commercial applications of this technology include document management systems, search engines, and content recommendation platforms for enhanced data categorization and retrieval.

Possible Prior Art

Prior art in document classification and text analysis includes techniques such as machine learning algorithms, natural language processing tools, and semantic analysis methods used for similar purposes.

Unanswered Questions

How does the processing circuit handle large volumes of text data for classification?

The abstract does not provide details on the scalability of the processing circuit for handling substantial amounts of text content.

What is the computational complexity of the word embedded space construction process?

The abstract does not specify the computational resources required for constructing word embedded spaces and analyzing text content for classification.


Original Abstract Submitted

According to one embodiment, a document classification apparatus includes a processing circuit. The processing circuit is configured to: acquire text content for each of logical elements for semi-structured document data including text data stored for each of the logical elements; select logical elements from the logical elements and generating logical element sets each including the logical elements; analyze text contents for the respective logical element sets and constructing respective word embedded spaces; select a first word embedded space and a second word embedded space including a common word shared with the first word embedded space from the word embedded spaces, and update the first word embedded space based on similarity to the common word in the second word embedded space; and output a classification result of the document data using the first word embedded space and embedding information of a feature quantity of a classification target.