18473877. Crawl Algorithm simplified abstract (GOOGLE LLC)

From WikiPatents
Jump to navigation Jump to search

Crawl Algorithm

Organization Name

GOOGLE LLC

Inventor(s)

Linhai Qiu of Palo Alto CA (US)

Robert Istvan Busa-fekete of Chatham NJ (US)

Julian Ulf Zimmert of Berlin (DE)

Andras Gyorgy

Hao Shen of Mountain View CA (US)

Hyomin Choi of Mountain View CA (US)

Sharmila Vijay of Mountain View CA (US)

Xiao Li of San Francisco CA (US)

Crawl Algorithm - A simplified explanation of the abstract

This abstract first appeared for US patent application 18473877 titled 'Crawl Algorithm

Simplified Explanation

The abstract describes a method for a crawl algorithm that involves obtaining web pages for a web crawler to crawl, determining available bandwidth, calculating crawl values for web pages based on bandwidth, and updating web pages in a cache memory if their crawl values meet a threshold.

  • The method involves obtaining a plurality of web pages for a web crawler to crawl.
  • The available bandwidth for the web crawler is determined.
  • For each web page, a crawl value is calculated based on the available bandwidth.
  • If the crawl value of a web page meets a threshold, the web page is updated in a cache memory.

Potential Applications

This technology could be applied in web crawling, search engine optimization, and content caching systems.

Problems Solved

This technology helps optimize web crawling by prioritizing pages based on available bandwidth, improving efficiency in crawling large websites.

Benefits

The method improves the efficiency of web crawling by prioritizing pages based on bandwidth, leading to faster and more effective crawling processes.

Potential Commercial Applications

Potential commercial applications include search engine companies, web scraping services, and content delivery networks.

Possible Prior Art

One possible prior art could be the use of crawl values in web crawling algorithms to prioritize pages for crawling based on certain criteria.

Unanswered Questions

How does this method handle dynamic changes in available bandwidth during the crawling process?

The method does not specify how it adapts to changes in available bandwidth while crawling.

What impact does this method have on the overall performance of the web crawler in terms of speed and efficiency?

The abstract does not provide information on the performance improvements achieved by implementing this method.


Original Abstract Submitted

A method for a crawl algorithm includes obtaining a plurality of web pages for a web crawler to crawl. The method also includes determining an available bandwidth for the web crawler. The method includes, for each respective web page of the plurality of web pages, determining a respective crawl value for the respective web page based on the available bandwidth and determining that the respective crawl value of the respective web page satisfies a threshold value. The method includes, in response to determining that the respective crawl value of the respective web page satisfies the threshold value, updating the respective web page in a cache memory.