17895730. TEXT MINING USING A RELATIVELY LOWER DIMENSION REPRESENTATION OF DOCUMENTS simplified abstract (International Business Machines Corporation)

From WikiPatents
Jump to navigation Jump to search

TEXT MINING USING A RELATIVELY LOWER DIMENSION REPRESENTATION OF DOCUMENTS

Organization Name

International Business Machines Corporation

Inventor(s)

Jia Li Yun of Beijing (CN)

Yin Xiang Xiong of Beijing (CN)

Shan Gu of Beijing (CN)

Yan Bin Hu of Beijing (CN)

Yao Zhang of Beijing (CN)

TEXT MINING USING A RELATIVELY LOWER DIMENSION REPRESENTATION OF DOCUMENTS - A simplified explanation of the abstract

This abstract first appeared for US patent application 17895730 titled 'TEXT MINING USING A RELATIVELY LOWER DIMENSION REPRESENTATION OF DOCUMENTS

Simplified Explanation

The abstract describes a computer-implemented method for text mining using word clustering based on deduplication chunks.

  • Method involves generating matrices from words in documents and deduplication chunks
  • Word clustering is performed to identify features of documents
  • Text mining is then performed using the generated matrices
    • Potential Applications:**
  • Data analysis in research fields
  • Information retrieval in search engines
  • Sentiment analysis in social media
    • Problems Solved:**
  • Efficient organization and analysis of large amounts of text data
  • Improved accuracy in identifying key features in documents
    • Benefits:**
  • Enhanced data processing capabilities
  • Increased efficiency in text mining tasks
  • Improved accuracy in document analysis


Original Abstract Submitted

A computer-implemented method according to one embodiment includes generating a first matrix based on words extracted from documents, and generating a second matrix based on deduplication chunks. The deduplication chunks include words of the documents. Word clustering is performed based on an analysis performed on the second matrix. Each cluster of the words represents a feature of at least one of the documents. The method further includes generating a third matrix based on the first matrix and the clusters, and performing text mining using the third matrix. A computer program product according to another embodiment includes a computer readable storage medium having program instructions embodied therewith. The program instructions are readable and/or executable by a computer to cause the computer to perform the foregoing method.