Facebook vs. Cambridge Analytic Shows Data Science’s Increasingly Complex Social Impact

Data science has sprung from a niche and budding field just a few years ago to now the mainstay of nearly every industry. We have entire companies relying on just the collection and sorting of data, with data science now being put to use in industries as sensible as advertising to even as distant as lobbying.

Through the use of multifaceted statistical analysis, even the most seemingly innocuous data can be used to predict the most far-fetched information with shocking accuracy. Data science and data mining, while sounding like intimidating terms to many, are merely analytics taken to the extreme.

As one hypothetical example, the timing of your visits to gas stations may, with other factors, be found to correlate in some way with your likelihood to prefer Coke or Pepsi. While that sounds absurd on its face, that is precisely the kind of analysis increasingly becoming the standard now for companies and even governments across the board.

However these past two weeks’ events have shown how data science’s prevalence is increasingly causing tension not only with a public seemingly slowly and inevitably losing its last bits of privacy but also between companies.

Facebook announced it was suspending Cambridge Analytica, its parent company, and several sister companies, from the platform. Cambridge Analytica had allegedly received, and not deleted, vast amounts of user information through an app created by a University of Cambridge professor and improperly shared it.

Cambridge Analytica played a major role in the 2016 election, first in the primary with Ben Carson and Senator Ted Cruz (R-TX)’s campaigns and then later with Donald Trump through the general election. The firm was said to be extremely effective, using data to micro-target ads and other political material that was considered essential behind Cruz’s rise during the primary and Trump’s surprising victory in several battleground states.

According to Facebook, around 270,000 people downloaded a personality prediction app created by a University of Cambridge Professor. The app collected wide-ranging information on their Facebook habits as well as that of their friends. Facebook claims that the professor then shared the information with Cambridge Analytica, allowing the company to have information on over 50 million Facebook users during the 2016 election.

It may seem initially that this dispute centers around privacy and consent, as Facebook claims the professor’s sharing of the app’s information with the company violated the terms of user agreement for information collected on its platform. However the underlying fight is also about the increasing tug of war over data between different companies and firms.

Look at Facebook, for example. The company’s revenue essentially relies on a combination of both user traffic and activity on the platform as well as data that allows advertisers to micro-target advertising on the platform using a wide array of characteristics such as location, travel, food habits, recreation, and more. That is why Facebook is now worth well over $500 billion.

Essentially, Facebook’s revenue model relies also on having exclusive control over the data that allows advertisers to micro-target. Many companies that collect data do not directly “sell” the information, but rather utilize platforms that allow licensed or semi-removed use. After all, otherwise the information would be a one-and-done sale rather than a sustainable model.

This leads to a situation where companies struggle to gain access to treasure troves of data. Even in our supposedly privacy-free age users still are reluctant to provide their essential private information knowing that it will be used for business purposes, which is why the proper collection of data is a slow, grueling, multi-front, and expensive process. As allegedly with Cambridge Analytica and Facebook, sometimes companies take shortcuts.

Currently trends show that in the future data science is not only expected to continue to grow but essentially become universal and ubiquitous. In just a few years we may be living in a time with self-driving cars that not only take us where we want to go, but collect extremely detailed information to help sell us products, predict our health risks, and even our voting habits.

Data science on one hand has great potential in being able to streamline industries, iron out inefficiencies, and ensure personalized custom marketing and solutions. On the other hand, privacy will essentially be a thing of the past.

Furthermore, as we see with the near-monthly mega-data breaches, such as the 2017 Equifax breach that saw the personal information of over 143 million Americans be taken, there are great personal and societal risks to such data collection as well.

Whether the market will see customers willing to pay a premium for data protection or if the public policy and regulatory world intervenes more in some way remains to be seen.