Understanding Search Engines and How They Work
Anyone who’s ever looked up anything online has probably wondered how search engines work their magic and how they always manage to deliver exactly what we’re looking for.
Search engines, to a very large extent, dictate the success of websites, and therefore of entire online businesses. If you’re not visible and ranked high on search engine results pages (SERPs), your website traffic is sure to suffer. And lower traffic means less profits.
SEO or search engine optimization is a field dedicated entirely to understanding the way search engines function. This understanding allows us to improve our websites, optimize their content, strengthen their technical aspects, and increase their ranking within the complex network of other websites. In these efforts, we can use SEO WordPress themes that are already technically optimized – to some level, at least. There are also tons of great and often free SEO tools designed to make optimization easier for those who are not exactly marketing wizards. All with the end goal of coming up as high as possible on search engine result pages (SERPs).
Because of this, it is vital to have at least a basic understanding of what search engines are and how they work, which is exactly what we’re going to cover in this article.
If you’d like to skip to any specific part of the article, just click on one of the links below:
A search engine is basically any computer program used for finding specific information, either on a single computer or on a network of interlinked machines. But when you hear someone talking about search engines these days, what they’re usually referring to are internet search engines.
Internet search engines are complex software systems specifically designed to search through the vast amounts of data on the World Wide Web. They helps us, their users, find out what we need to know by providing a list of the most relevant websites that contain the specific word or phrase we searched for.
For most of us, search engines are basic web tools. Without them, we would have to remember the exact URLs for every single website or page we want to reach. And while this might seem inconceivable to most, there actually was a time when the internet worked like that. And boy did it suck.
Luckily, things changed. Today we’re so accustomed to the convenience of search engines that it’s hard to imagine life without them.
Just think about how painful it would be to teach your grandma to find out what time the neighborhood pharmacy opens on Monday if we didn’t have search engines. First, you’d have to get her to memorize the pharmacy website’s URL. Then you’d have to show her where to type in that URL and how to browse through the site to find their “About” page. And finally, you’d have to explain that she now needs to read through the entire page just to find the working hours.
Today, however, all you have to do is show her how to open her browser and type “George’s Pharmacy opening hours” in the address bar. If George’s Pharmacy has a properly optimized website, the results Grandma needs will pop up immediately. “A miracle!” For Grandma, Google is like one big magical answer machine. A strange, non-physical temple of information on anything and everything in this world. And that’s not too far from the truth, either.
For most people, search engines = Google. It is, after all, the most widely used search engine in the world, and it has been so for quite some time. Google is so ubiquitous in our lives, it has even become a verb for searching for something on the Internet: “Lemme just google that real quick!”
So, in most cases, when we talk about search engines, we are really talking about Google, not Bing, Yahoo! or Baidu. Not to mention the extinct ones, like good old AltaVista. There are tons of search engines out there, some even older than Google. Yet all SEO efforts are aimed towards the Big G.
Now, that’s not to say you shouldn’t pay attention to other search engines. But for starters, it is vital to make sure Google treats you well. Once you get that right you might want to dive deeper and explore the potential that comes with other engines’ algorithms.
Search engines perform three basic operations: they crawl, index and rank websites.
You might have heard people talk about spiders, spiderbots or crawlers in relation to search engines. These exotic terms are just names for special pieces of software used by search engines to discover new websites and web pages.
Crawlers, as their name implies, crawl the web by using hyperlinks to jump from one web page to another. Once they find a new page,these hardworking little bots index it in a database.
When you search for something online, the search engine goes through its database of indexed pages and provides you with the most relevant content, depending on your query. This is what ranking is all about – displaying the pages in order of relevance to the query.
Now let’s take a closer look at each of the main search engines functions:
Crawling
As we saw earlier, crawling is the first step towards getting your website displayed on SERPs. Naturally, you want to rank as high as possible, and later on we’ll take a look at how you can make that happen. But for now, let’s see how to make sure crawlers actually see your pages, pick them up, and add them to the index.
While crawling is an automatic process, it’s not uncommon for a website to have just some – and not all – of its pages crawled. Sometimes this happens by accident, because of a mistake made by the website’s administrator. Obviously, this is far from desirable. But on occasion we might actually want the crawler to skip some of the pages on our site.
Making Sure the Right Pages Get Crawled
In SEO, there’s this thing called “crawl budget.” It refers to the average number of pages a bot crawls before leaving a site. So, if you have 50 pages on your site, and the crawl budget is 30, you want to make sure the bot crawls all the right pages, without wasting time on less important ones.
This is where robots.txt comes in. This handy little text file is used by webmasters to tell crawlers which pages to crawl and how to crawl them. By using the allow and disallow directives webmasters can define exactly what pages (or entire folders) will be visible to certain crawlers.
There are certain types of pages that should definitely never be crawled. For example, you don’t want search engines indexing the wp-admin page on your WordPress website, or any pages that contain members lists or other sensitive data. In that case, you definitely want to hide such a WordPress page or post from search engines so it doesn’t get crawled. Which pages should be viewed as “unimportant” by crawlers will generally depend on your website’s specific nature and purpose.
Besides a well-defined robots.txt file, you should also make sure your site navigation and architecture enables crawlers to do their job properly. This isn’t always the case, so you need to check if:
-
The HTML navigation throughout the site is clean and clear
-
Your content is properly organized and labeled
-
You have a sitemap in place
-
Crawlers are not running into errors (client errors, server errors, bad redirects).
Indexing
After search engine crawlers have discovered your pages, it’s time for indexing. This basically means all the crawled pages are going to be stored (along with all their information) in a massive database from which they can later be retrieved and displayed as search results.
Reindexing
It’s important to note that it’s not enough to have the search engine index your pages just once. Your pages also need to be re-indexed regularly. This is because you are most likely going to add new pages to your site, or update the content of existing ones. Those changes need to be crawled and indexed too, which is why you need crawlers to visit (and index) your website frequently.
Large, well-established sites get indexed quite frequently but brand new ones sometimes have to wait quite a while for the crawlers to come back. Some of the factors that influence how quickly and how frequently a website will be indexed include:
-
Domain authority (DA, also known as “thought leadership”) – how relevant a website is in its particular niche
-
Page authority (PA) – similar to DA, but on the page level
-
Content schedule or frequency of updates
-
Popularity
Removed Pages
It’s also worth mentioning that sometimes a page can be removed from the index. This happens, among other reasons, because:
-
the link returns an error
-
the page contains a noindex tag
-
the URL has been penalized or blocked.
If you want a page to be reindexed after being removed, you can manually submit it to the search engine, but more on that later.
Meta Directives
Similarly to the directives we discussed earlier regarding crawling (allow and disallow), you can use robots.txt meta directives to tell search engines how to index your pages:
-
index/noindex tells search engines whether you want a specific page to be indexed and stored or not
-
follow/nofollow is used when you want or don’t want the crawler to follow a certain link and, more importantly, to pass the link equity (or link juice, or authority) to that link.
-
noarchive tells the engines not to keep a cached version of your page, and it’s mainly used for pages that change content frequently, such as ecommerce pages.
Submitting Content to Search Engines
As we mentioned earlier, if you feel like Google is not indexing your pages fast enough, or that it’s skipping them for some reason, you can manually submit your pages and your content for indexation.
The world’s largest search engine has an amazing tool for this purpose, called Google Search Console. You can use this tool to ask Google to go back and recrawl all your pages in case you’ve added some new content you want to be crawled as soon as possible.
Google Search Console also has a URL Inspection tool, which can be used to manually submit URLs to the search engine. In case you have a lot of URLs to submit, Google will prefer it if you send an updated sitemap instead.
Ranking
Search engine ranking represents the position a URL holds on a SERP. Obviously, the higher the position, the better, since users are more likely to click on top-ranked links on the first page.
Getting high up on page one of Google is the ultimate goal for every website, but it’s easier said than done.
Before we look into some of the factors that affect rankings, let’s first see how search engines even know which results are relevant for a searcher’s query.
Search engines use algorithms to determine the relevance and the position of a website in the rankings. Ranking pages based on their popularity, so to speak, was first devised back in the late 1990s when Google’s co-founders Larry Page and Sergey Brin came up with PageRank, a formula for determining the value of a page based on the number of links pointing to it.
PageRank as a tool was eventually dropped but its main principle was maintained in the core of every Google algorithm update, as we will see later on.
In the dynamic and complex environment of SEO, it’s hard to establish with certainty what specific factors guarantee a high SERP position.
This means that you can’t just do a one-time SEO job and call it a day. You need to go back, tweak, fix and improve things just the way Google wants.
Factors Affecting Rankings
They can roughly be divided into on-page and off-page factors. On-page factors are, as their name implies, SEO factors related to the specific page you are optimizing. Since all on-page factors are on your website, you have complete control over them. Off-page factors, on the other hand, are factors that exist on other sites but affect the ranking of your website pages.
Let’s take a closer look at both of these categories.
-
On-Page Factors include:
- Quality content. This means content that is relevant to the subject. It should be well-written and long-form, if possible.
- Title tags. A title tag is an HTML element that represents the title of your page and the clickable title listed on SERPs. A title tag should provide an accurate description of the content it’s linked to. According to best SEO practices, title tags should be between 50 and 60 characters long. Their structure should consist of primary keyword+secondary keyword|brand name.
- Meta description tags. These are brief summaries of the page displayed below the title tag on SERPs. Google keeps changing the preferred length of meta descriptions, but it’s generally around 150 characters.
- URL structure. An SEO-friendly URL should be short, concise, with words separated by hyphens, and in lowercase. The structure should, ideally, be: subdomain.domain.top-level domain/folder or path/page/anchor.
- Keyword density. This represents the number of times you used your keyword or keyphrase compared to the total number of words in your content. Ideally, it should be between 0.5% and 3%.
- XML sitemaps. A sitemap is a list of all the URLs on your site. It serves as a roadmap for search engine crawlers.
- Alt tags. Also known as an alt attribute or alt description, an alt tag is an HTML attribute that’s added to an image with the purpose of providing a textual description of that image for search engines.
- Internal linking. This is the practice of adding links on your pages that point to other pages on the same website.
- Heading tags (H1 to H6). These HTML tags are used to identify headings and subheadings. They help search engines read and understand your pages, and they improve user experience.
-
Off-Page Factors include:
- Link building. The process of getting other websites to link back to your website. The best backlinks come from websites that meet the criteria of E-A-T: Expert, Authoritative and Trustworthy. The proper placing of links in a natural context and with adequate anchor text is also essential. Guest blogging is one of the most important tactics for increasing the number of links to your website.
- Social media marketing. Content shared on social media can significantly boost page rankings because shared content gets indexed, builds authority and, most importantly, involves real people.
- Influencer marketing. This is a form of marketing in which you focus particular attention on people with a strong reputation and a lot of followers in your industry. Under the right circumstances, it can do wonders for a site’s ranking.
- Video. Video content has become particularly important for SEO over the last few years. Off-site videos (such as videos hosted on YouTube or Vimeo) boost traffic and generate social shares, as well as backlinks.
User Engagement and Ranking
User engagement is a particularly important factor in improving a website’s search engine ranking.
It represents user behavior around your website. What your visitors look for, how long they stay, which pages they skip, where they click, which pages they leave immediately – all these metrics should be taken into consideration.
Basically, you want to know how your visitors interact with your website and then do everything you can to improve their experience. Better user experience means better user engagement and this, in turn, reflects on better ranking. This is particularly important for websites that get plenty of traffic but no conversions.
Certain engagement factors have a direct impact on rankings:
-
Traffic
-
Page views (the number of pages clicked on and viewed during a session)
-
Bounce rate (the number of users who navigate to a website and leave after viewing only one page)
-
Brand mentions (instances of your brand being mentioned online)
-
Mobile-responsive design (because if your site is not responsive, your bounce rate is going to be sky-high)
-
Technical SEO (as it is crucial for good user experience).
In addition, there are certain engagement metrics you should keep a keen eye on:
-
Clicks on links
-
Social shares
-
Scroll depth
-
Form submissions.
Of course, these are just some of the engagement factors and metrics that contribute to search engine ranking. It’s a very complex area of SEO that requires special attention and will be covered in more depth in future articles.
We already mentioned that search engines, Google in particular, use extremely complex ranking algorithms. And these algorithms are updated quite frequently – Google rolls out between 500 and 600 updates per year. Most of these are minor modifications to the algorithm, and as such represent little to no concern for webmasters. But occasionally, Google rolls out major updates that are serious game-changers when it comes to understanding how search engines work and what they want from us.
These are the most important algorithm updates that you should know about:
Google Panda (2011). The first major core update with a significant impact on SEO. Marking the end of unethical and black hat SEO, keyword stuffing and “content farms” Panda introduced quality as the #1 ranking factor and reminded website owners that user experience should be their primary concern.
Google Penguin (2012). Google continues to target keyword stuffing but also focuses a lot on linking quality. Penguin marked the demise of spammy and fake links and favored quality links from reputable sources, as well as proper internal linking.
Google Pirate (2013). This update targeted pirated content and websites involved in copyright infringement, including torrent hosts, with multiple takedown requests filed through Google’s DMCA system.
Google Hummingbird (2013) Another big update, bringing a full overhaul of the core algorithm, instead of just serving as a filter, like many of the previous ones. Hummingbird put a bigger focus on user intent and optimization for humans, not robots.
Google Pigeon (2014). Pigeon favors the websites optimized for local search. Based on distance and location, the update increases the ranking of local listings in search results, aiming to provide more accurate and relevant results for local queries.
Google Mobilegeddon (2015). As its name suggests, this update favored websites with pages that display well on smaller screens. Its purpose was to make sure users get mobile-ready results for their mobile queries, and did not affect searches made from desktop or laptop computers.
Google RankBrain (2015). Another critical update, in that it introduced machine learning to the way Google processes and ranks URLs. In essence, it uses artificial intelligence to determine the most relevant search results, finding the best fit for the search query.
Google Possum (2016) mostly targeted businesses relying on local search and improved local packs and local search results.
Google Fred (2017) is more of an umbrella term for a series of continuous updates focusing on site quality and penalizing websites that don’t follow Google’s Webmaster Guidelines.
Now that we’ve seen how Google does its thing, let’s take a quick look at some of the other major search engines and the differences in their functioning.
In 2019, the most popular search engines in the world, based on their market share, were:
Let’s take a look at the first five. We already talked plenty about Google, so let’s move on to Baidu. Since Google is banned in China, the country developed its own search engine giant. In addition to being the Chinese version of Google, Baidu is also important for Western businesses with a particular interest in the Chinese market.
As for SEO, Baidu’s algorithm is somewhat slower and more thorough in checking the content of the websites. Plus, it has to check if the content is compliant with China’s special content rules. Therefore, it takes a longer time for new websites and pages to be indexed and ranked. On the other hand, it’s quicker to recognize the SEO changes to a site compared to Google.
When it comes to ranking, Baidu seems to pay more attention to the number of backlinks than Google, which prioritizes link quality over quantity. As for the content, the “Chinese Google” favors videos, images, music and similar rich media content. In essence, the more “fun” your pages appear, the better they will rank.
Bing is the search engine that represents Google’s most dangerous contender outside of markets like China and Russia. The main difference between the two is in how they look at keywords. Specifically, Bing still uses targeted keywords as one of the main ranking factors, which is something Google dropped a few years back.
Also, while Google no longer pays particular attention to meta keywords (HTML elements or tags added to a page indicating the page’s main keywords), they are still an important SEO factor for Bing.
There are some differences in terms of backlinks, too. For instance, Google prefers links from websites with high domain authority and is suspicious of large amounts of links from sites with low DA. Bing, on the other hand, treats quantity and quality equally, and has a preference for links from .edu, .gov and .org domains.
Yahoo! has been powered by Bing since 2011 but that doesn’t mean you shouldn’t track both of them, as there are some differences.
Compared to Google, Yahoo! pays more attention to domain age, associating it with authority and authenticity. Because of this, it sometimes takes quite a while for new websites to show up on Yahoo!.
Similarly to Bing, Yahoo! likes targeted keywords, as well as meta keywords.
Yandex is the number one internet property in Russia, a major tech company and the most popular search engine in that enormous market. When it comes to SEO, the Russian giant, unsurprisingly, follows its own set of rules. Yandex pays more attention to user behavior as a ranking factor than any other search engine out there. User behavior is more important than backlinks, and high-quality content is even more important for Yandex than it is for Google (as impossible as that may sound). Also, Yandex is slower to show new websites and openly prefers older ones.
Final Thoughts
Now that we’ve covered most of the basic functions of search engines and the way they work, you should have a clearer idea of how to optimize your website so it performs well and ranks high.
Just keep these simple rules in mind:
Best of luck!