Skip to main content

One post tagged with "google"

View All Tags

· 2 min read
Sai Mun Lee

How Google's Search Works!

Google Search can be broken down into three steps namely:

  1. Crawling
  2. Indexing
  3. Searching

Crawling

Crawling is the process of searching the web for new and updated content. During crawling, Google must constantly look for new and updated pages and add them to its list of known pages. This process is called "URL discovery". The program that does the fetching is called Googlebot (AKA Spiders) to crawl pages to extract out information and store it in an index. Such information includes storing the addresses (or page URLs). Google also discovers other pages when you submit a list of pages (a sitemap) for Google to crawl. A sitemap is a file where you provide information about the pages, videos, and other files on your site and the relationship between them. Information like video runtime, rating, age-appropriateness rating. It can also include the article title and publication date.

However, Googlebot doesn't crawl all the pages it discovered. Some issues that prevent Googlebot from accessing sites include:

  • Problems with the server handling the site
  • Network issues
  • robots.txt directives preventing Googlebot's access to the page.

Indexing

Indexing is the process of Google trying to understand what the page is about. It includes processing and analyzing the text content and key content tags and attributes, such as <title> elements, other attributes, images, videos and more. Google determines if a page is a duplicate of another page on the internet. If there is, Google will select one and that will be the canonical version of the page. Canonical pages are the pages that Google will show in search results and which Google thinks its the most represented page of all the duplicates. Google also collects signals or metadata such as the language of the page, the country the content is local to, the usability of the page and so on. All the collected information is stored on the Google Index, a large database hosted on thousands of computers.

Searching

Searching search the index and it ranks the results based on the search query. However, the ranking algorithm is dependant of several factors such as language used, location of the user and the search query to return the most relevant search results back to the user.

Summary

Gooogle Search Summary

Notes