Friday, October 31st, 2008 by Andy Beal

25

Google Now Indexing Scanned Documents

How much processing power does Google have at its fingertips? It must be a lot because the search engine giant will now include scanned documents in its search results.

In the past, scanned documents were rarely included in search results as we couldn’t be sure of their content. We had occasional clues from references to the document– so you might get a search result with a title but no snippet highlighting your query. Today, that changes. We are now able to perform OCR on any scanned documents that we find stored in Adobe’s PDF format. This Optical Character Recognition (OCR) technology lets us convert a picture (of a thousand words) into a thousand words — words that can be searched and indexed, so that these valuable documents are more easily found. This is a small but important step forward in our mission of making all the world’s information accessible and useful.

The possibilities here are endless. One idea, the document imaging business is huge and I’m sure they’re hoping Google Apps doesn’t start offering companies the ability to scan and archive all of their printed documents, receipts, etc.

How well does it work? See for yourself! Here’s the scanned document and here’s the Google converted one.


Social Media Monitoring in Just 60-Seconds. Guaranteed!

Similar Stories in: Search | Forward: Email This Post

Share this post

Share on TwitterStumble This!Bookmark on DeliciousShare on FriendFeedDigg This!Share on Facebook

 

25 comments on “Google Now Indexing Scanned Documents”

  1. Nick Stamoulis Says:

    October 31st, 2008 at 11:38 am

    Wow that is amazing. This will open a pretty large door for a variety of SEO techniques.

  2. Xbox Says:

    October 31st, 2008 at 11:41 am

    Google should be indexing my site not messing with scanned stuff

  3. Busby SEO Says:

    October 31st, 2008 at 12:17 pm

    New SEO Technique, and time to learn the new technique :D

    Busby SEO’s last blog post..By: Busby SEO Test | Move to New Server

  4. Andy Beal Says:

    October 31st, 2008 at 12:19 pm

    @Busby & Nick – the sales of invisible ink pens just went up! ;-)

  5. PS3 Says:

    November 1st, 2008 at 5:20 am

    Forget all Google’s problems, it just staggers me how much power/technology they must have to cope with all this stuff. How many servers, a building full!!

  6. rajesh Says:

    November 1st, 2008 at 7:15 am

    wow!!! google is always the best

  7. prabakaran Says:

    November 1st, 2008 at 7:16 am

    amazing google …always brings new things

  8. Armen Shirvanian Says:

    November 1st, 2008 at 11:02 pm

    This would appear to be a beneficial improvement in their indexing, as it would increase the amount of information that would be readily available from a Google search. It seems to be the case that more and more information that is normally not disclosed is becoming part of the public archive. We may later see more and more statistics about our days, such as details about our breathing patterns and more.

    Armen Shirvanian’s last blog post..Using Long-term Thinking To Reduce Regret

  9. brandon alan scofield Says:

    November 2nd, 2008 at 4:01 am

    google need billions tons of server to index scanned page

    brandon alan scofield’s last blog post..First post

  10. Saad Kamal Says:

    November 2nd, 2008 at 6:11 am

    Well this sounds good. I don’t think this technology is absolutely new. Evernote & Microsoft OneNote can actually read through scanned images or even low-res pictures taken by your phone.

    Saad Kamal’s last blog post..Google Experiments with Social Voting within Search

  11. Helen Says:

    November 2nd, 2008 at 8:38 am

    I remember Google indexed pdf, doc and txt files before.

  12. iPod Says:

    November 2nd, 2008 at 11:12 am

    This is a step forward. I still wonder whether it’ll be harmful with so much information being publically available. Google has search data from millions of users. This gives them incredible power. We are pretty sure they are acting ethically now, but what if that changed? The things they could do with all this data would be incredble.

    I find this quite scary actually.

    John

  13. Utah SEO Pro Says:

    November 2nd, 2008 at 2:50 pm

    It’d make natural sense for Google to create an OCR document where they could store the files for you. That’s data they don’t have control of right now.

    Utah SEO Pro’s last blog post..Link Metrics for SEO

  14. GoScript Says:

    November 2nd, 2008 at 3:36 pm

    It was already detecting objects in images, so this was a “easy” thing for mighty Google :)

    GoScript’s last blog post..WordPress Uniquefier Plugin v3.0

  15. MB Web Design Says:

    November 2nd, 2008 at 6:13 pm

    It’ll be interesting to see how – if at all – scanned documents have an effect on SEO.

    There will come a day when you Google for your car keys.

    MB Web Design’s last blog post..H2D Hair Straighteners

  16. Rika Susan's Home Improvement News Says:

    November 3rd, 2008 at 4:54 am

    Interesting development. Will be waiting to see what effect indexing scanned documents will have on SEO and listings. Thanks for heads up.

  17. Zurpit Says:

    November 3rd, 2008 at 8:40 am

    Wow this is a huge step for Google search engines, I wonder how this will affect SEO

  18. Nicole Price Says:

    November 3rd, 2008 at 10:40 am

    Till how it works in practice it is all in the realms of theory. At the end of it all, only G is likely to laugh all the way to the bank.

    Nicole Price’s last blog post..Green Tax Breaks

  19. Diamonds Says:

    November 3rd, 2008 at 2:43 pm

    Nice. Hopefully, their character recognition is accurate.

    Diamonds’s last blog post..EGL USA versus EGL International or EGL Israel

  20. Chaitanya Patel Says:

    November 3rd, 2008 at 3:31 pm

    Yeah i had seen it before couple of days, you won’t believe for one of my website i was getting back link from uploaded .xls file.

    Chaitanya Patel’s last blog post..Highest Run Scorer in Test Cricket – Sachin Tendulkar Now

  21. גוגל תציג מידע מדפים סרוקים | קידום אתרים, שיווק באינטרנט Says:

    November 4th, 2008 at 4:53 am

    [...] האינטרנט marketingpilgrim.com מדווח כי במאמץ לשפר את תוצאות החיפוש גוגל תציג מידע המגיע [...]

  22. Galin Says:

    November 4th, 2008 at 5:41 am

    That is the major reason we used, always use and will use Google. It really doesn’t stop to delight its users!! Google is the best indeed.

  23. Brett Gian Says:

    November 10th, 2008 at 6:22 am

    How is it displaying in search either in image or in text?

    Thanks
    http://seoandwebdesignblog.wordpress.com/

    Brett Gian’s last blog post..Grand Theft Auto- Vice City Game Review

  24. Google is Using OCR to Expand its Reach | FREE DOWNLOAD GAMES Says:

    November 11th, 2008 at 8:58 am

    [...] Andy alludes to the huge potential behind this seemingly small step for Google: Imagine if Google would develop a service for companies to scan and archive their important documents. [...]

  25. Seotest Says:

    November 22nd, 2008 at 8:00 pm

    yeah. Google is the best SE.

    Seotest’s last blog post..Busby SEO Test My Opinion