Discovering the Rest of the Internet Iceberg


Did you know that the visible portion of an iceberg only represents about 20% of its actual size? Beneath iceberg-photothe water surface lies the other 80%. Imagine if the captain of the Titanic had that tidbit of information. Well the Internet is similar in many ways. The amount of the entire scope of the Internet that is still inaccessible to the engines and their crawlers is quite amazing. Even as Google indexed it one trillionth (with a T) web address last summer it appears as if there is so much more out there.

A New York Times article introduces this concept like this:

Beyond those trillion pages lies an even vaster Web of hidden data: financial information, shopping catalogs, flight schedules, medical research and all kinds of other material stored in databases that remain largely invisible to search engines.

The challenges that the major search engines face in penetrating this so-called Deep Web go a long way toward explaining why they still can’t provide satisfying answers to questions like “What’s the best fare from New York to London next Thursday?” The answers are readily available — if only the search engines knew how to find them.

Since there is so much more out there then you would suspect that there are folks trying to find it. There are and of course Google is among them. Some wonder though what Google would do with the information. It is speculated that it may require a different presentation for search results that until now has been untouchable save the occasional intrusion of universal search and personalized results.

“Google faces a real challenge,” said Chris Sherman, executive editor of Search Engine Land. “They want to make the experience better, but they have to be supercautious with making changes for fear of alienating their users.”

While this may not be at the front of the news and on everyone’s mind all the time it is very real. When you learn that a company like Kosmix, who is doing work in this area, is backed in part by Jeff Bezos of Amazon it’s hard not to raise an eyebrow and think what may be the next generation of search is closer than we may have thought. Even if it isn’t close there is a race on to get there first that could mean ridiculous amounts of money and power.

For a little deeper look, it appears that the true information that currently is not being found by traditional crawlers is in the databases of the Deep Web. Personally, I find it hard to grasp the sheer amount of data that exists on the web and how it is presented right now. Taking a look at this whole matter though is certainly interesting. We have been trained that what is given to us by Google search is the definitive answer (which it is at the moment) and we even ignore anything past the first 5 or so results as being not worthy. With this impatient approach to results will it even matter if we are given more data? Will it now mean that the first 20 results are SO relevant that we have to start slowing down and making decisions on our own rather than having the engines think for us?

Read the article and get the details because it is interesting for sure. Here’s a final thought to leave you with.

“The huge thing is the ability to connect disparate data sources,” said Mike Bergman, a computer scientist and consultant who is credited with coining the term Deep Web. Mr. Bergman said the long-term impact of Deep Web search had more to do with transforming business than with satisfying the whims of Web surfers.

The whims of web surfers? Is that all we are? Well, actually I guess it is in this context. How’s that for making you feel significant on this fine Monday!

  • http://www.marketingpilgrim.com Andy Beal

    I think you touched on the key point Frank. I don’t care about the depth of the iceberg, I only care about that top 20%–or as you say, the first 5 results in Google.

    Or, to use another analogy: I don’t want to see how hard the swan’s legs are furiously kicking below the surface of the water, just show me the graceful, white bird on top! ;-)

  • http://www.invesp.com/blog/ Rachel Burkot

    You quote the editor of Search Engine Land as saying that by finding the hidden information on the internet, Google risks alienating users. I don’t see how this would come to be. Even if internet users have to search through more legitimate options on the SERP, I don’t see how this would alienate users – since when does having more options turn someone off of searching? Until they can get the definite answer they’re searching for – like an end-all to queries such as cheapest flights – I think more options will be unquestionably appreciated by users.

    Rachel Burkot’s last blog post..Pick Your Processor: PayPal Isn’t The Only Third-Party Option

  • http://michellesblog.net Michelle Greer

    It makes me wonder if we will ever have search within a search or vertical search engines. Seems like they would be quite useful.

    Michelle Greer’s last blog post..$8868 raised for charity:water through Twestival

  • http://www.lizmicik.com Liz Micik

    I don’t wonder that we’re close to finally being able to index another 20, or 30 or even the 80 percent of the data stored on the different levels and branches of the Internet. Back in the 1990s, we talked about how small the limb that held the world wide web really was.

    I wonder if we’re one step closer to being able to harness and use all that data than we were in 1998, or 2003. And I sincerely hope not.

    Remember when people were so concerned about the security of their personal information that they were reluctant to shop online? We calmed their fears by pointing out that even though there was (and is) a huge mound of information regarding any one person’s surfing and spending habits, there is no efficient or effective way to tie all that information into a package, or product, or Internet Smart Bomb that could actually do anything.

    This is another facet of search that remains hidden beneath the surface. If Jeff Bezo’s etal are getting close to creating a way to tug on a single query string and follow a single person’s trek through the www, then I’ll sit up and install a proxy server right quick. If not, they’ll just shutting small business out of this limb of the Internet the same way small business was shut out of million dollar advertising slots on TV. Money will rule the Google results, and the web will move somewhere else.

  • Pingback: Computer Articles blog » Discovering the Rest of the Internet Iceberg

  • http://makingmoneyontheinternetwithgoogle.blogspot.com Brent small business trainer

    I read somewhere recently that one of Google’s engineers stated that fully 50% of searches in any one day are using unique keywords.

    That translates into something like 500,000 unique phrases every day. Obviously the users are looking for things that those of us who make money online and use keywords to get targeted visitors could be gaining those visitors by ignoring all keyword and seo work and just writing about our own special area and wait for the visitors to catch up.

    Google also stated that they had indexed something less than 50% of all available web pages and were possibly losing ground. There is a message there for all of us.

    Brent small business trainer’s last blog post..Small Business Making Money on the Internet