Google llc (20240111819). Crawl Algorithm simplified abstract

From WikiPatents
Jump to navigation Jump to search

Crawl Algorithm

Organization Name

google llc

Inventor(s)

Linhai Qiu of Palo Alto CA (US)

Robert Istvan Busa-fekete of Chatham NJ (US)

Julian Ulf Zimmert of Berlin (DE)

Andras Gyorgy

Hao Shen of Mountain View CA (US)

Hyomin Choi of Mountain View CA (US)

Sharmila Vijay of Mountain View CA (US)

Xiao Li of San Francisco CA (US)

Crawl Algorithm - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240111819 titled 'Crawl Algorithm

Simplified Explanation

The abstract describes a method for a crawl algorithm that involves obtaining web pages for a web crawler to crawl, determining available bandwidth, calculating crawl values for each web page based on bandwidth, and updating web pages in a cache memory if the crawl value satisfies a threshold.

  • Obtaining web pages for a web crawler to crawl
  • Determining available bandwidth for the web crawler
  • Calculating crawl values for each web page based on available bandwidth
  • Updating web pages in a cache memory if the crawl value satisfies a threshold

Potential Applications

This technology can be applied in web crawling, search engine optimization, content indexing, and data mining.

Problems Solved

This technology helps optimize the crawling process by prioritizing web pages based on available bandwidth, ensuring efficient use of resources.

Benefits

The method improves the efficiency of web crawling, reduces bandwidth wastage, and enhances the overall performance of the web crawler.

Potential Commercial Applications

Potential commercial applications include search engine companies, web scraping services, and online marketing firms.

Possible Prior Art

One possible prior art could be the use of caching mechanisms in web crawlers to improve performance and reduce redundant data retrieval.

Unanswered Questions

How does this method handle dynamic web pages that may change frequently?

The method does not address how it handles dynamic web pages that may have content changes between crawls. This could impact the accuracy of the crawl values assigned to web pages.

What impact does this method have on the overall speed and efficiency of the web crawler?

The abstract does not provide information on how this method affects the speed and efficiency of the web crawler compared to traditional crawling methods.


Original Abstract Submitted

a method for a crawl algorithm includes obtaining a plurality of web pages for a web crawler to crawl. the method also includes determining an available bandwidth for the web crawler. the method includes, for each respective web page of the plurality of web pages, determining a respective crawl value for the respective web page based on the available bandwidth and determining that the respective crawl value of the respective web page satisfies a threshold value. the method includes, in response to determining that the respective crawl value of the respective web page satisfies the threshold value, updating the respective web page in a cache memory.