17528215. Classifying Parts of a Markup Language Document, and Applications Thereof simplified abstract (MICROSOFT TECHNOLOGY LICENSING, LLC)

From WikiPatents
Jump to navigation Jump to search

Classifying Parts of a Markup Language Document, and Applications Thereof

Organization Name

MICROSOFT TECHNOLOGY LICENSING, LLC

Inventor(s)

Siarhei Alonichau of Seattle WA (US)

Saksham Gupta of Bothell WA (US)

Aliaksei Bondarionok of Redmond WA (US)

Classifying Parts of a Markup Language Document, and Applications Thereof - A simplified explanation of the abstract

This abstract first appeared for US patent application 17528215 titled 'Classifying Parts of a Markup Language Document, and Applications Thereof

Simplified Explanation

Abstract

A link-analyzing system (LAS) extracts information from a markup language (ML) document associated with a web page link. The LAS generates feature information based on the extracted information and uses a classification model to assess the link. The LAS can control a crawling engine based on the assessment and revise low-confidence assessments based on similar links.

  • The system extracts information from a markup language document associated with a web page link.
  • It analyzes the extracted information to generate feature information.
  • A classification model is used to assess the link based on the feature information.
  • The system can control a crawling engine based on the assessment.
  • Low-confidence assessments can be revised based on similar links.

Potential Applications

  • Web page link analysis and classification.
  • Improving the accuracy of link assessments.
  • Enhancing web crawling and indexing processes.

Problems Solved

  • Inaccurate or unreliable link assessments.
  • Difficulty in determining the relevance or trustworthiness of web page links.
  • Inefficient web crawling and indexing processes.

Benefits

  • Improved accuracy in link assessments.
  • Enhanced control over web crawling and indexing.
  • More efficient and effective analysis of web page links.


Original Abstract Submitted

A link-analyzing system (LAS) extracts information from a markup language (ML) document associated with a web page link. In some implementations, the information that is extracted includes at least: a) address content that is part of the link's destination address; and b) text that is associated with the link but that is not part of the destination address itself. The LAS generates feature information based on the address content and the text, and then uses a classification model to make a classification assessment for the link based on the feature information. In some implementations, the LAS can control a crawling engine based on the classification assessment. In some implementations, the LAS can revise a low-confidence classification assessment based on an examination of the classification assessments of a group of similar links described by the ML document. Other implementations use the above-described functionality to classify other parts of an ML document.