← Back to Blog
This blog is majorly inspired by the SMX West 2016 Keynote speaker Paul Haahr- @haahr, Rank Engineer at Google. I am writing this blog because I think if a digital marketer wants to make the best of any SEO strategy in this world, he/she ought to know how things actually work on the other side of the world more clearly - I am talking about the search engine’s side- Google’s side.
Google uses a self-developed software called ‘Googlebot’ which collects documents from all over the web or at least most of the web to create indices that render Google’s Search Engine Results Page (SERPs) to the search engine user. A Search Engine Results Page is the page displayed as a result of the searcher’s query.
When a query is submitted into the Google’s search engine, these Googlebots which are also called ‘spiders’ begin crawling into a random set of seed websites or webpages extracting outgoing links out of them. The spiders then follow these links landing onto a whole different set of websites extracting outgoing links from them. This how Googlebots gather documents and use these links to understand the structure of the web and find relevant pages.
After the spiders have crawled enough of the web and majority of its content, they build a specific index or what Google calls ‘Keyword Inverted Index’ collecting and storing all the pages with keywords/ phrases relevant to what the user types in on Google’s search engine as the query. Once the index is created, the bots calculate which pages out of all the documents are the most relevant to the user’s query. Along with this they also check which pages are the most popular, reliable, trustworthy and other factors (signals). All these metrics are a part of Google’s PageRank Algorithm. To understand PageRank Algorithm, please read our blog What Google Humms..these days..
Basically Google up until a few years back would only analyze the crawled pages by extracting links and few other things like rendering content but its core job would be to extract outgoing links from webpages. However, since a few years, Google has added/improved on how crawlers would analyze web pages and build indices based on its analysis. The added functionality is called ‘semantics search engine’.
What semantic search engine does is that it increases the accuracy of relevant search results by understanding the meaning of the query in context of the searcher’s intention or actual need.
Building a ‘web index’ according to google is just like building a book index. Web Index is , in short, for every word a list of pages that the word appears on. Keeping in mind that the scale of the web is too large, Google breaks the web into groups of millions of pages and each group is called an ‘index shard’. This way the web is divided into thousands of index shards which makes up a substantial part of Google’s Index Building Process’.
What does Google do after it gets a query?
It understands the query first- part of the semantic search if you will. It asks several questions such as “Does the query name know any entities? Are there any useful synonyms? In what context should the user input be treated?. After understanding what the user is actually looking for, Google send the query to all the ‘shards’. The index shard then finds matching pages, scores them in other words what is called ‘PageRanking’ conclusively sending the top pages sort by score. The returned pages are then tested for ‘signals’ like spam, duplication, clustering site links, demoting or promoting sites on its base.
There are two kinds of signals identified by google
How google works!

- Query Independent that includes pageRank, language, Mobile-friendliness
- Query Dependent that includes feature of page and query, Keywords hints, Synonyms, Proximity
- Expertise
- Authoritativeness
- Trustworthiness
- The quality of the main content is low
- There is not enough/ unsatisfying amount of content on the main page
- The author does not have expertise or is not trustworthy or authoritative on the topic
- The website has negative reputation
- The secondary content is distracting or helpful