US Patent Application 18345834. INFERRING INFORMATION ABOUT A WEBPAGE BASED UPON A UNIFORM RESOURCE LOCATOR OF THE WEBPAGE simplified abstract
Contents
INFERRING INFORMATION ABOUT A WEBPAGE BASED UPON A UNIFORM RESOURCE LOCATOR OF THE WEBPAGE
Organization Name
Microsoft Technology Licensing, LLC
Inventor(s)
Siarhei Alonichau of Seattle WA (US)
Aliaksei Bondarionok of Redmond WA (US)
Junaid Ahmed of Bellevue WA (US)
INFERRING INFORMATION ABOUT A WEBPAGE BASED UPON A UNIFORM RESOURCE LOCATOR OF THE WEBPAGE - A simplified explanation of the abstract
- This abstract for appeared for US patent application number 18345834 Titled 'INFERRING INFORMATION ABOUT A WEBPAGE BASED UPON A UNIFORM RESOURCE LOCATOR OF THE WEBPAGE'
Simplified Explanation
This abstract describes technologies that can infer information about a webpage based on the semantics of its URL. The URL is broken down into individual tokens, and an embedding is created based on these tokens, which represents the meaning or context of the URL. Using this embedding, information about the webpage linked to by the URL is inferred. The webpage is then retrieved, and information is extracted from it based on the inferred information about the webpage.
Original Abstract Submitted
Described herein are technologies related to inferring information about a webpage based upon semantics of a uniform resource location (URL) of the webpage. The URL is tokenized to create a sequence of tokens. An embedding for the URL is generated based upon the sequence of tokens, wherein the embedding is representative of semantics of the URL. Based upon the embedding for the URL, information about the webpage pointed to by the URL is inferred, the webpage is retrieved, and information is extracted from the webpage based upon the information inferred about the webpage.