Marketing Pilgrim's "Search News" Channel

Sponsor Marketing Pilgrim's Search News Channel today! Get in front of some of the most influential readers in the Internet and social media marketing industry. Contact us today!

Google Now Indexing Scanned Documents


How much processing power does Google have at its fingertips? It must be a lot because the search engine giant will now include scanned documents in its search results.

In the past, scanned documents were rarely included in search results as we couldn’t be sure of their content. We had occasional clues from references to the document– so you might get a search result with a title but no snippet highlighting your query. Today, that changes. We are now able to perform OCR on any scanned documents that we find stored in Adobe’s PDF format. This Optical Character Recognition (OCR) technology lets us convert a picture (of a thousand words) into a thousand words — words that can be searched and indexed, so that these valuable documents are more easily found. This is a small but important step forward in our mission of making all the world’s information accessible and useful.

The possibilities here are endless. One idea, the document imaging business is huge and I’m sure they’re hoping Google Apps doesn’t start offering companies the ability to scan and archive all of their printed documents, receipts, etc.

How well does it work? See for yourself! Here’s the scanned document and here’s the Google converted one.


Trackbacks

  1. [...] האינטרנט marketingpilgrim.com מדווח כי במאמץ לשפר את תוצאות החיפוש גוגל תציג מידע המגיע [...]

  2. [...] Andy alludes to the huge potential behind this seemingly small step for Google: Imagine if Google would develop a service for companies to scan and archive their important documents. [...]

Comments

Comments

Comments

  1. Wow that is amazing. This will open a pretty large door for a variety of SEO techniques.

  2. Xbox says:

    Google should be indexing my site not messing with scanned stuff

  3. Busby SEO says:

    New SEO Technique, and time to learn the new technique :D

    Busby SEO’s last blog post..By: Busby SEO Test | Move to New Server

  4. Andy Beal says:

    @Busby & Nick – the sales of invisible ink pens just went up! ;-)

  5. PS3 says:

    Forget all Google’s problems, it just staggers me how much power/technology they must have to cope with all this stuff. How many servers, a building full!!

  6. rajesh says:

    wow!!! google is always the best

  7. prabakaran says:

    amazing google …always brings new things

  8. This would appear to be a beneficial improvement in their indexing, as it would increase the amount of information that would be readily available from a Google search. It seems to be the case that more and more information that is normally not disclosed is becoming part of the public archive. We may later see more and more statistics about our days, such as details about our breathing patterns and more.

    Armen Shirvanian’s last blog post..Using Long-term Thinking To Reduce Regret

  9. google need billions tons of server to index scanned page

    brandon alan scofield’s last blog post..First post

  10. Saad Kamal says:

    Well this sounds good. I don’t think this technology is absolutely new. Evernote & Microsoft OneNote can actually read through scanned images or even low-res pictures taken by your phone.

    Saad Kamal’s last blog post..Google Experiments with Social Voting within Search

  11. Helen says:

    I remember Google indexed pdf, doc and txt files before.

  12. iPod says:

    This is a step forward. I still wonder whether it’ll be harmful with so much information being publically available. Google has search data from millions of users. This gives them incredible power. We are pretty sure they are acting ethically now, but what if that changed? The things they could do with all this data would be incredble.

    I find this quite scary actually.

    John

  13. Utah SEO Pro says:

    It’d make natural sense for Google to create an OCR document where they could store the files for you. That’s data they don’t have control of right now.

    Utah SEO Pro’s last blog post..Link Metrics for SEO

  14. GoScript says:

    It was already detecting objects in images, so this was a “easy” thing for mighty Google :)

    GoScript’s last blog post..WordPress Uniquefier Plugin v3.0

  15. It’ll be interesting to see how – if at all – scanned documents have an effect on SEO.

    There will come a day when you Google for your car keys.

    MB Web Design’s last blog post..H2D Hair Straighteners

  16. Interesting development. Will be waiting to see what effect indexing scanned documents will have on SEO and listings. Thanks for heads up.

  17. Zurpit says:

    Wow this is a huge step for Google search engines, I wonder how this will affect SEO

  18. Nicole Price says:

    Till how it works in practice it is all in the realms of theory. At the end of it all, only G is likely to laugh all the way to the bank.

    Nicole Price’s last blog post..Green Tax Breaks

  19. Diamonds says:

    Nice. Hopefully, their character recognition is accurate.

    Diamonds’s last blog post..EGL USA versus EGL International or EGL Israel

  20. Yeah i had seen it before couple of days, you won’t believe for one of my website i was getting back link from uploaded .xls file.

    Chaitanya Patel’s last blog post..Highest Run Scorer in Test Cricket – Sachin Tendulkar Now

  21. Galin says:

    That is the major reason we used, always use and will use Google. It really doesn’t stop to delight its users!! Google is the best indeed.

  22. Brett Gian says:

    How is it displaying in search either in image or in text?

    Thanks
    http://seoandwebdesignblog.wordpress.com/

    Brett Gian’s last blog post..Grand Theft Auto- Vice City Game Review

  23. Seotest says:

    yeah. Google is the best SE.

    Seotest’s last blog post..Busby SEO Test My Opinion