How Search Engines Operate

Search engines contain a short list of critical operations that allows them to provide relevant web results when searchers use their system to find information.

1. Crawling the Web Search engines run automated programs, called “bots” or “spiders” that use the hyperlink structure of the web to “crawl” the pages and documents that make up the World Wide Web. Estimates are that of the approximately 20 billion existing pages, search engines have crawled between 8 and 10 billion.

2. Indexing Documents When a page has been crawled, it’s contents can certainly be “indexed” – saved in a giant database of documents that makes up a search engine’s “index”. This index needs to be tightly managed, so that requests which must search and sort billions of documents can be completed in fractions of a second.

3. Processing Queries When a request for information comes into the search engine (hundreds of millions do each day), the engine retrieves through its index all the document that match the query. A match is identified if the terms or phrase is found on the page in the manner specified by the user. For example, a search for car and driver magazine at Google returns 8.25 million results, but a search for the same phrase in quotes (“car and driver magazine”) returns only 166 thousand results. In the first system, commonly called “Findall” mode, Google returned all documents which had the terms “car” “driver” and “magazine” (they ignore the term “and” because it’s not useful to narrowing the results), while in the second search, just those pages with the exact phrase “car and driver magazine” were returned. Other advanced operators (Google has a list of 11) can change which results a search engine will consider a match for a given query.

4. Ranking Results Once the search engine has determined which results are a match for the query, the engine’s algorithm (a mathematical equation commonly used for sorting) runs calculations on each of the results to figure out which is most relevant to the given query. They sort these on the results pages in order from most relevant to least so that users can make a choice about which to select.

Although a search engine’s operations are not particularly lengthy, systems like Google, Yahoo!, AskJeeves and MSN are some of the most complex, processing-intensive computers in the world, managing millions of calculations each second and funneling demands for information to an enormous group of users.

Leave a Reply