Google’s Radically Transparent (Almost) About Its Search Algorithm
Over at the Official Google Blog, search quality tzar Udi Manber peels back the curtain covering Google’s coveted search engine technology, and provides us a peek of what’s been happening recently.
It’s a long post, so here are the snippets that caught my attention:
We also need to understand the queries people pose, which are on average fewer than three words, and map them to our understanding of all documents
PageRank is still in use today, but it is now a part of a much larger system.
Other parts include language models (the ability to handle phrases, synonyms, diacritics, spelling mistakes, and so on), query models (it’s not just the language, it’s how people use it today), time models (some queries are best answered with a 30-minutes old page, and some are better answered with a page that stood the test of time), and personalized models (not all people want the same thing).
…improve the user experience. This is not the main goal, it is the only goal.
There are automated evaluations every minute (to make sure nothing goes wrong), periodic evaluations of our overall quality, and, most importantly, evaluations of specific algorithmic improvements.
In 2007, we launched more than 450 new improvements, about 9 per week on the average.
…we made significant changes to the PageRank algorithm in January.
…we have a large set of volunteers from all parts of Google who speak different languages and help us improve search.
The UI team is helped by a team of usability experts who conduct user studies and evaluate new features. They travel all over the world, and they even go to people’s homes to see users in their natural habitat.
There is a whole team that concentrates on fighting webspam and other types of abuse…The team spots new spam trends and works to counter those trends in scalable ways; like all other teams, they do it internationally.
Matt Cutts offers some additional insight on the post.