Do you have enough pages in the search index?

By Mike Moran. Filed in Organic Search, Search Marketing  |   
Tags: , , , , , ,
A quick-pick ticket with two sets of numbers f...

Photo credit: Wikipedia

In some ways, you never have enough pages in the search index, because every extra page that sneaks in there is a lottery ticket in the search sweepstakes–you’ve got to be in it to win it. So, the more pages you have in the search index, the more chances you have to be found. But clearly there is some amount of pages that seem like you are doing OK and a different amount that seems bad–like, zero would be bad. How do you figure out how many pages you have in the search index and how do you know if that is OK?

First off, you need to understand that there is no single search index–each search engine has its own search index. Google has its own, Bing has its own, and so do many other search engines. So, you need to know which search engines are worth worrying about–in the U.S., it’s Google and Bing.

So how do you find out how many pages are in Google’s index and how many are in Bing’s?

Both Google and Bing have a tool called the “site:” command. You can just enter into each one the word “site:” along with your domain name (Such as “site:biznology.com”).  For some sites, this handy command works just fine and you can see how many pages are stored in each index. If your results look right, great. But sometimes the results just look nuts. For example, “site:ibm.com” yields 2.8 million pages on Bing but a crazy 12.2 million pages on Google.

To avoid such inaccuracies, use each search engine’s Webmaster Tools sites. Both Google and Bing will tell your Webmaster exactly how many pages are in the index and will even let you know which pages they are having trouble grabbing. It’s possible that the IBM Webmaster is aware that there actually is a big discrepancy between Google and Bing, which might be just fine or might be something they are working on.

I’ve spoken to a few experts and they have varying theories. One told me that Bing stops crawling when more than 1% of the pages get errors–the Bing Webmaster site will clue you in on this. Another speculated that Bing is only returning counts of pages that get search visits, not every page in their index. No one I spoke with knew for sure why this is happening, but it shows you the importance of checking your numbers.

Likewise, big swings in indexed pages (1,000 pages indexed in Google today vs. 5,000 yesterday) mean that you should look into it. And, in general, an inclusion ratio (pages indexed divided by actual pages) below 70% is something that should give you pause, although with these Bing errors who knows what a good inclusion ration is for Bing right now.

Regardless. knowing how many pages are indexed is the first step to seeing if you have a problem.

Enhanced by Zemanta
Be Sociable, Share!

2 Comments

  1. Comment by Rowdy Rhodes:

    “there is no single search index”

    What about DMOZ.org? They are the main feed to the main search engines who in turn feed trickle down to smaller engines or engines that they own under different names. If I to pick one engine that feeds the ‘net it would definitely be DMOZ. e.g. Go to the Search Engine Decoder Relationship Chart at http://www.search-this.com/search-engine-decoder/ and it will display the informational feeds and relationships between search engines. Click on any of the names and you will see arrows displayed showing information flow.

    • Comment by Mike Moran:

      Thanks for the comment Rowdy.

      DMOZ is a directory and getting a link from it can be helpful for both higher ranking and getting indexed, but the decoder chart is very out of date. The relationships it notes on that chart where which search engines make use of the DMOZ directory (Google used it in its old Google Directory interface) not which search indexes are fed. You can tell for yourself how old that chart is when you see that Bing is not listed (it’s still called MSN) and it shows Yahoo! feeding results to many sites instead of being fed by Bing, which has been true for more than a year.

      Google and Bing both have their own ways of following links to create their own indexes and while a link from DMOZ is good, it’s hardly necessary to get indexed. Thanks for asking such a great question.

Leave a Reply

Your email address will not be published. Required fields are marked *

Subscribe without commenting