18170070. DOCUMENT CLASSIFICATION APPARATUS, METHOD, AND STORAGE MEDIUM simplified abstract (KABUSHIKI KAISHA TOSHIBA)
Contents
- 1 DOCUMENT CLASSIFICATION APPARATUS, METHOD, AND STORAGE MEDIUM
- 1.1 Organization Name
- 1.2 Inventor(s)
- 1.3 DOCUMENT CLASSIFICATION APPARATUS, METHOD, AND STORAGE MEDIUM - A simplified explanation of the abstract
- 1.4 Simplified Explanation
- 1.5 Potential Applications
- 1.6 Problems Solved
- 1.7 Benefits
- 1.8 Potential Commercial Applications
- 1.9 Possible Prior Art
- 1.10 Unanswered Questions
- 1.11 Original Abstract Submitted
DOCUMENT CLASSIFICATION APPARATUS, METHOD, AND STORAGE MEDIUM
Organization Name
Inventor(s)
Kosei Fume of Kawasaki Kanagawa (JP)
DOCUMENT CLASSIFICATION APPARATUS, METHOD, AND STORAGE MEDIUM - A simplified explanation of the abstract
This abstract first appeared for US patent application 18170070 titled 'DOCUMENT CLASSIFICATION APPARATUS, METHOD, AND STORAGE MEDIUM
Simplified Explanation
The document classification apparatus described in the abstract utilizes word embedded spaces to analyze text content and classify document data based on similarity to common words.
- The processing circuit acquires text content for logical elements in semi-structured document data.
- Logical elements are selected and grouped into sets for analysis.
- Word embedded spaces are constructed for each logical element set.
- Common words shared between word embedded spaces are used to update and refine the classification process.
- The classification result of the document data is output using the updated word embedded spaces and embedding information of a feature quantity of a classification target.
Potential Applications
This technology can be applied in various fields such as information retrieval, data mining, and natural language processing for efficient document classification and organization.
Problems Solved
This technology addresses the challenge of accurately classifying semi-structured document data by utilizing word embedded spaces and common word similarities to improve classification results.
Benefits
The use of word embedded spaces and common word similarities enhances the accuracy and efficiency of document classification, leading to improved data organization and retrieval.
Potential Commercial Applications
Potential commercial applications of this technology include document management systems, search engines, and content recommendation platforms for enhanced data categorization and retrieval.
Possible Prior Art
Prior art in document classification and text analysis includes techniques such as machine learning algorithms, natural language processing tools, and semantic analysis methods used for similar purposes.
Unanswered Questions
How does the processing circuit handle large volumes of text data for classification?
The abstract does not provide details on the scalability of the processing circuit for handling substantial amounts of text content.
What is the computational complexity of the word embedded space construction process?
The abstract does not specify the computational resources required for constructing word embedded spaces and analyzing text content for classification.
Original Abstract Submitted
According to one embodiment, a document classification apparatus includes a processing circuit. The processing circuit is configured to: acquire text content for each of logical elements for semi-structured document data including text data stored for each of the logical elements; select logical elements from the logical elements and generating logical element sets each including the logical elements; analyze text contents for the respective logical element sets and constructing respective word embedded spaces; select a first word embedded space and a second word embedded space including a common word shared with the first word embedded space from the word embedded spaces, and update the first word embedded space based on similarity to the common word in the second word embedded space; and output a classification result of the document data using the first word embedded space and embedding information of a feature quantity of a classification target.