Google's new data retention policy
Google announced yesterday that it is changing its data retention policies on searches. It will now anonymize its search logs so that the data is "much more anonymous" after 18-24 months. The skeptic in me wants to know what much more anonymous means, as AOL thought that it had anonymized its searches when it released user search information to researchers. So I dug a bit deeper and looked at their log retention policy FAQ. Seems that they plan to change bits in the IP address and the cookies they send out. That will make it harder to trace specific searches back to a particular person by an IP address. They make no mention of preventing a similar AOL disclosure snafu by ensuring that individual searches by the same person are anonymized in different ways from each other; otherwise, if they are all anonymized in the same way (i.e., IP address 1.1.1.1 goes to 1.1.2.2 each and every time), the searches aren't really anonymous because people tend to search on their own names, SSNs, addresses, etc.
Skepticism aside, this move isn't just one to satisfy privacy advocates but addresses the very real threat of legislation that will require communication service providers to capture and store data for up to two years. Google has also been the subject of a number of subpoenas from the US federal government for search information. A new clear policy on data retention also helps Google in this matter because they may be able to fight any subpoenas that are either inappropriate or too old by adhering to their policy (IANAL). The lesson here is that if you have no policy and delete data willy-nilly, you can get in trouble for deleting the data that the government wanted. It can look suspicious. Why did you delete that particular data and not others? But if your company has a data disposal/anonymization standard, and so all data older than X is deleted, then it's just standard business practice that the information is no longer around. It is then not suspicious that you don't have that information, if someone were to ask for it. So this solution works well for Google and is a boon for privacy advocates, too.
While I was surfing around for information, I also noticed Google's privacy policy states that they will take "appropriate security measures to protect against unauthorized access." This makes me curious to do a quick survey of other Web sites these days and their privacy policies. How do most companies state that they will protect information? Are they using "appropriate and reasonable" means, or are they over-promising to their consumers? If your company claims the following like Petco.com did, perhaps you'll be visited by the FTC, too.
At PETCO.com our customers’ data is strictly protected against any unauthorized access. PETCO.com also provides a “100% Safeguard Your Shopping Experience Guarantee” so you never have to worry about the safety of your credit card information.

I was just reading about this on the privacy blog http://www.globalpov.com. Personally I think Google is just trying to pull a reversal on the recent bad publicity. We'll see if it works. My guess? No.
Posted by: jay smith | March 16, 2007 at 06:25 PM
Are you referring to the negative publicity of Google sharing search data because of court order? (http://www.google.com/press/images/subpoena_20060317.pdf) I agree with the other blog you mentioned that this step is likely not just for the sake of privacy, but for protection from the same kind of court orders. But it still ends up improving privacy overall. Digging through Yahoo.com I can't even find them talking about how long they will keep information/how soon they will delete it.
Posted by: Jen Albornoz Mulligan | March 16, 2007 at 06:46 PM