Google Indexing Private Data




By now you should know that, anything you post online, to the public, is fair-game for Google’s crawler. But a recent incident involving the indexing of social security numbers and test scores for 619 students at public schools in Catawba County, N.C., shows there is still a grey area when it comes to password protected info.

While Google eventually took down the information – and claims it can’t crawl secure data – it once again raises the issue of where the burden should lie.

Should a webmaster be held responsible for knowing all of the procedures necessary to prevent Google from spidering, or should the burden be on Google.

It’s a tough call, but I can’t help thinking that the Internet existed before Google, therefore the onus should be on Google to NOT crawl secure content – even if it finds a loophole. Of course, for Google to accept responsibility would mean a huge drain on the company’s resources, so it would rather we all appease it, rather than the other way round.

  • http://www.blogger.com/profile/1463441 epc

    The onus should be on the webmaster and site operator. If Google could access and index the alleged private data, so could anyone else with access to the internet. “password protected” is meaningless if the password is encoded in the URL, anyone who can see the URL can then access the site. If a site needs to provide truly confidential, private data over the internet, then the site needs to take steps to ensure that no one can access the site without the proper credentials.

  • http://www.acorg.com MikeOK

    Most spiders are still very basic. They start by looking at a single web site page. From there, they collect every link found and try to access those pages. If yes, they find more links. These links are typically followed with no special programming.
    So my vote is that the webmaster is 100% to blame over these issues. In this case, either a link was supplied that could access a secured area, either published on the site or off, or the area was not secure.
    Bottomline for me is don’t publish information to the internet that you do not want people to see, even if it’s password protected. What I don’t get is publishing social security numbers at all??

  • http://www.blogger.com/profile/3103921 Randy Charles Morin

    You have gotta be kinding. You are saying that Google is responsible for educating every webmaster in the world. That’s an impossible task. And if you want Google to do that, then Yahoo! MSN and Ask must do it too! Since that’s an impossible task, then you might as well shut down all search engines and for that matter shut the Internet down too, because the next step is to say the ISP is responsible for securing the data.

  • http://www.blogger.com/profile/1685318 Andy Beal

    I agree with all the posts above, in that it would be virtually impossible for Google to proactively educated webmasters, but I don’t think the webmaster should take 100% of the responsibility.

    We don’t know how Google managed to index the private data. What if it was made public for just 10 mins, then taken down by the webmaster? Google spiders the site, cache’s the content and keeps it available for days. Shouldn’t Google take responsibility for providing a quick and easy way to remove content? Why does it take a court order?

    Please don’t suggest the webmaster should use the “no cache” command, that is unfair to expect them to know that; also don’t try telling me it would be impossible for Google to know whether a request for content removal is legit or not, they can easily verify that.

    As I said, if you put it online, you should expect Google to find it. That being said, as the Internet becomes the place to store all kinds of information, we need Google (and Yahoo, MSN, ASK) to figure out a way to protect private data – this index all and be damned attitude, while legal, is hardly endearing.