Google Groups
Subscribe to Software Outsourcing [ Hire Dedicated Group ]
Email:
Visit this group

Saturday, July 28, 2007

Google’s Supplemental Index

The Big Daddy update of late 2005 to early 2006 was largely about installing a new Supplemental index. The new version is so different to the old version that it shouldn’t now be called the Supplemental index. The old Supplemental index was a repository for garbage webpages and such, and was accessed for the search results only when a reasonable number of results couldn’t be found in the regular index. The new version is very different because many millions of perfectly good pages are put in it.

Many, perhaps most, websites have plenty of their pages in the Supplemental index because their linkage profiles don’t score well enough. Even Google has pages in there - hundreds of thousand of them. A site’s linkage profile is an evaluation of the links into and out of the site. Things like linking to off-topic sites, and too high a percentage of a site’s inbound links being reciprocals, lowers the score of a site’s linkage profile, and reduces the number of pages that it can have in the Regular index, which means that more of its pages are placed in the Supplemental index. Improving the linkage profile brings pages out of the Supplemental index and into the Regular one.

Before Big Daddy, pages in the Supplemental index had been given the kiss of death - they rarely came out, and were rarely seen in the search results. But that has changed, and is continuing to change. It is now possible to bring pages out of the Supplemental index by getting some good links to the site, and the continued improvement is in the way that the Supplemental index is used by Google’s system.

Right now, most of the datacenters are using the new Supplemental index in the same way as the old one was used; i.e. get a results set from the Regular index and, if the set isn’t large enough, add to it from the Supplemental index. The quality of the results from the Regular index doesn’t come into it. If the results set is large enough, the Supplemental index is ignored.

But at least one datacenter operates differently. It operates along the lines of, get a results set from the Regular index. Sometimes many of those results will be poor quality matches (e.g. they only match one word of a three word query), so get some better matches from the Supplemental index. The use of the Supplemental index in a way something like this is likely to spread across the datacenters in 2007.

The new way makes a lot of sense. Since many of the results that are acquired from the Regular index are often poor matches for the query, and since millions of perfectly good pages are now stored in the Supplemental index, some of which will be good matches for many queries, it makes good sense to pull results from the Supplemental index when there are some poor matches from the Regular index.

It’s good news for website owners who have large numbers of pages in the Supplemental index. As the new way of operating spreads, more of their pages will rightly find their way into the search results, even though they are in the Supplemental index.

Source by www.webworkshop.net