Posted October 31, 2008 10:20 am by with 25 comments

Tweet about this on TwitterShare on LinkedInShare on Google+Share on FacebookBuffer this page

How much processing power does Google have at its fingertips? It must be a lot because the search engine giant will now include scanned documents in its search results.

In the past, scanned documents were rarely included in search results as we couldn’t be sure of their content. We had occasional clues from references to the document– so you might get a search result with a title but no snippet highlighting your query. Today, that changes. We are now able to perform OCR on any scanned documents that we find stored in Adobe’s PDF format. This Optical Character Recognition (OCR) technology lets us convert a picture (of a thousand words) into a thousand words — words that can be searched and indexed, so that these valuable documents are more easily found. This is a small but important step forward in our mission of making all the world’s information accessible and useful.

The possibilities here are endless. One idea, the document imaging business is huge and I’m sure they’re hoping Google Apps doesn’t start offering companies the ability to scan and archive all of their printed documents, receipts, etc.

How well does it work? See for yourself! Here’s the scanned document and here’s the Google converted one.