WASHINGTON (AP) — Credit card data isn't quite as anonymous as promised, a new study says.
Scientists showed they can identify you with more than 90 percent accuracy by looking at just four purchases, three if the price is included — and this is after companies "anonymized" the transaction records, saying they wiped away names and other personal details. The study out of the Massachusetts Institute of Technology, published Thursday in the journal Science, examined three months of credit card records for 1.1 million people.
"We are showing that the privacy we are told that we have isn't real," study co-author Alex "Sandy" Pentland of MIT said in an email. His research found that adding just a glimmer of information about a person from an outside source was enough to identify him or her in the trove of financial transactions they studied.
Companies routinely strip away personal identifiers from credit card data when they share information with outsiders, saying the data is now safe because it is "anonymized." But the MIT researchers showed that anonymized isn't quite the same as anonymous.
Drawing upon a sea of data in an unnamed developed country, the researchers pieced together available information to see how easily they could identify somebody. They looked at information from 10,000 shops, with each data piece time-stamped to calculate how many pieces of data it would take on average to find somebody, said study lead author Yves-Alexandre de Montjoye, also of MIT.
In this case the experts needed only four pieces, three if price is involved.
As an example, the researchers wrote about looking at data from September 23 and 24 and who went to a bakery one day and a restaurant the other. Searching through the data set, they found there could be only person who fits the bill — they called him Scott. The study said, "and we now know all of his other transactions, such as the fact that he went shopping for shoes and groceries on 23 September, and how much he spent."
It's easier to identify women, but the research couldn't explain why, de Montjoye said.
The study shows that when we think we have privacy when our data is collected, it's really just an "illusion," said Eugene Spafford, director of Purdue University's Center for Education and Research in Information Assurance and Security. Spafford, who wasn't part of the study, said it makes "one wonder what our expectation of privacy should be anymore."
"It is not surprising to those of us who spend our time doing privacy research," said outside expert Lorrie Faith Cranor, director of the CyLab Usable Privacy and Security Laboratory at Carnegie Mellon University. "But I expect it would be surprising to most people, including companies who may be routinely releasing de-identified transaction data, thinking it is safe to do so."
Credit card companies and industry officials either declined comment or did not respond to requests for comment.
The once-obscure concept of metadata — or basic transactional information — grew mainstream in recent years following revelations by former National Security Agency contractor Edward Snowden. Those disclosures from once-top secret U.S. government documents revealed that the NSA was collecting the records of digital communications from millions of Americans not suspected of a crime.
The use of so-called "big data" has been a lucrative prospect for private companies aiming to cash in on the trove of personal information about their consumers. Retail purchases, online web browsing activity and a host of other digital breadcrumbs can provide firms with a wealth of data about you — which is then used in sophisticated advertising and marketing campaigns. And big data-mining was used extensively in the 2012 president election to win over voters or seek out prospective donors.
"While government surveillance has been getting a lot of press, and certainly the revelations warrant such scrutiny, a large number of corporations have been quietly expanding their use of data," said privacy consultant and author Rebecca Herold. Studies like this show "how metadata can be used to pinpoint specific individuals. This also raises the question of how such data would be used within insurance actuarial calculations, insurance claims and adjustments, loan and mortgage application considerations, divorce proceedings."
Journal Science: http://www.sciencemag.org
Seth Borenstein can be followed at http://twitter.com/borenbears
Jack Gillum can be followed at https://twitter.com/jackgillum