Posted October 31, 2008 10:20 am by with 25 comments

Tweet about this on TwitterShare on LinkedInShare on Google+Share on FacebookBuffer this page

How much processing power does Google have at its fingertips? It must be a lot because the search engine giant will now include scanned documents in its search results.

In the past, scanned documents were rarely included in search results as we couldn’t be sure of their content. We had occasional clues from references to the document– so you might get a search result with a title but no snippet highlighting your query. Today, that changes. We are now able to perform OCR on any scanned documents that we find stored in Adobe’s PDF format. This Optical Character Recognition (OCR) technology lets us convert a picture (of a thousand words) into a thousand words — words that can be searched and indexed, so that these valuable documents are more easily found. This is a small but important step forward in our mission of making all the world’s information accessible and useful.

The possibilities here are endless. One idea, the document imaging business is huge and I’m sure they’re hoping Google Apps doesn’t start offering companies the ability to scan and archive all of their printed documents, receipts, etc.

How well does it work? See for yourself! Here’s the scanned document and here’s the Google converted one.

  • Nick Stamoulis

    Wow that is amazing. This will open a pretty large door for a variety of SEO techniques.

  • Xbox

    Google should be indexing my site not messing with scanned stuff

  • Busby SEO

    New SEO Technique, and time to learn the new technique ๐Ÿ˜€

    Busby SEO’s last blog post..By: Busby SEO Test | Move to New Server

  • Andy Beal

    @Busby & Nick – the sales of invisible ink pens just went up! ๐Ÿ˜‰

  • PS3

    Forget all Google’s problems, it just staggers me how much power/technology they must have to cope with all this stuff. How many servers, a building full!!

  • rajesh

    wow!!! google is always the best

  • prabakaran

    amazing google …always brings new things

  • Armen Shirvanian

    This would appear to be a beneficial improvement in their indexing, as it would increase the amount of information that would be readily available from a Google search. It seems to be the case that more and more information that is normally not disclosed is becoming part of the public archive. We may later see more and more statistics about our days, such as details about our breathing patterns and more.

    Armen Shirvanian’s last blog post..Using Long-term Thinking To Reduce Regret

  • brandon alan scofield

    google need billions tons of server to index scanned page

    brandon alan scofield’s last blog post..First post

  • Saad Kamal

    Well this sounds good. I don’t think this technology is absolutely new. Evernote & Microsoft OneNote can actually read through scanned images or even low-res pictures taken by your phone.

    Saad Kamal’s last blog post..Google Experiments with Social Voting within Search

  • Helen

    I remember Google indexed pdf, doc and txt files before.

  • iPod

    This is a step forward. I still wonder whether it’ll be harmful with so much information being publically available. Google has search data from millions of users. This gives them incredible power. We are pretty sure they are acting ethically now, but what if that changed? The things they could do with all this data would be incredble.

    I find this quite scary actually.


  • Utah SEO Pro

    It’d make natural sense for Google to create an OCR document where they could store the files for you. That’s data they don’t have control of right now.

    Utah SEO Pro’s last blog post..Link Metrics for SEO

  • GoScript

    It was already detecting objects in images, so this was a “easy” thing for mighty Google :)

    GoScript’s last blog post..WordPress Uniquefier Plugin v3.0

  • MB Web Design

    It’ll be interesting to see how – if at all – scanned documents have an effect on SEO.

    There will come a day when you Google for your car keys.

    MB Web Design’s last blog post..H2D Hair Straighteners

  • Rika Susan’s Home Improvement News

    Interesting development. Will be waiting to see what effect indexing scanned documents will have on SEO and listings. Thanks for heads up.

  • Zurpit

    Wow this is a huge step for Google search engines, I wonder how this will affect SEO

  • Nicole Price

    Till how it works in practice it is all in the realms of theory. At the end of it all, only G is likely to laugh all the way to the bank.

    Nicole Price’s last blog post..Green Tax Breaks

  • Diamonds

    Nice. Hopefully, their character recognition is accurate.

    Diamonds’s last blog post..EGL USA versus EGL International or EGL Israel

  • Chaitanya Patel

    Yeah i had seen it before couple of days, you won’t believe for one of my website i was getting back link from uploaded .xls file.

    Chaitanya Patel’s last blog post..Highest Run Scorer in Test Cricket โ€“ Sachin Tendulkar Now

  • Pingback: ื’ื•ื’ืœ ืชืฆื™ื’ ืžื™ื“ืข ืžื“ืคื™ื ืกืจื•ืงื™ื | ืงื™ื“ื•ื ืืชืจื™ื, ืฉื™ื•ื•ืง ื‘ืื™ื ื˜ืจื ื˜()

  • Galin

    That is the major reason we used, always use and will use Google. It really doesnโ€™t stop to delight its users!! Google is the best indeed.

  • Brett Gian

    How is it displaying in search either in image or in text?


    Brett Gian’s last blog post..Grand Theft Auto- Vice City Game Review

  • Pingback: Google is Using OCR to Expand its Reach | FREE DOWNLOAD GAMES()

  • Seotest

    yeah. Google is the best SE.

    Seotest’s last blog post..Busby SEO Test My Opinion