Ethics With Data Collection

You’ve likely experienced the murkier side of data collection before. It’s commonplace for sites we’ve visited to show us advertisements further down the line, but what about connections that are a bit more tenuous?

Perhaps your housemate is interested in a product and suddenly they’re targeting you as well. Or you get someone’s number and, without being linked to your phonebook, Facebook is suddenly suggesting them as someone “you might know”.

Now, data ethics is the topic everyone’s talking about. So where did it come from and why is it now so necessary for modern businesses to understand?

Data collection became an arms race

Data is everywhere. 2.5 quintillion bytes of data are created every day. Data is incredibly powerful, and so it’s no wonder that businesses are looking for ways to innovate with and take advantage of the hot new currency in the digital age. Information is power when it comes to locating, understanding, and connecting with potential customers. And in the race to get ahead of the competition, many businesses have decided that the more data they could collect, the better.

Data collection became an arms race

The bubble burst in 2018

Just how far this approach had gone was brought to startling light by events like the Cambridge Analytica scandal. No longer were tech giants like Facebook the cool, caring, innovative firms making our lives richer—they were corporate giants like all the others, unscrupulously out for our money.

In the aftermath of the shocking revelations about how personal data was used by Cambridge Analytica, the ethics of data collection and analysis became a hot topic. It saw a raft of regulatory reforms brought in across the world, with the EU General Data Protection Regulation (GDPR) perhaps the best known and among the strictest. GDPR insisted on the ethical use of personal data, which meant collecting what was necessary, using it as expected, and keeping it safe.

Ethical use of data is no longer a choice

This stance has been more or less adopted globally. Some countries have taken slightly softer views. Australian regulations, while updated to come more in line with GDPR, are more granular and lenient in places than their European counterparts. But GDPR is enforceable for any companies who are based in the EU or who handle the data of EU citizens. With regards to international firms, that’s likely to be almost everyone. Understanding the regulations and complying with them is crucial to reputation management as well as avoiding serious sanctions.

Ethical data collection

Businesses must understand consent and transparency beyond tick boxes

One of the most obvious changes brought about by GDPR for users is the rise in popularity of opt-in messages as they enter sites. The definition of consent has been made far stricter and visiting a site is no longer reason enough to assume that a user wants their personal data harvested by that site. Obtaining explicit consent for collecting data and being more open and honest about what is being collected is part of a more ethical approach to gathering data.

Avoid collecting more data than necessary

It used to be the case that marketers would collect as much data as they possibly could, striving for any advantage that might be gained through insights small and large into a customer’s profile and behaviour. Most new regulations insist that only data relevant to a specific processing purpose can be collected. Defining data uses and working out the minimum amount of collection required to achieve outcomes is a new ethical issue for those handling personal data.

Respect and use data only for its intended purpose

Linked to what data is being collected is the purpose for which it is being collected. For services that are data-driven, companies need to clearly understand what they need to deliver that service and be upfront with users about what they need and why.

PII needs to stay private

Even with a minimal amount of data being collected, privacy is still an important ethical concept to consider. People be easily tracked by personally identifiable information (PII) and ensuring that this is anonymised before use is often essential for compliant processing.

Judging what constitutes PII is not always straightforward, PII is personal data but not all personal data is PII—there are some data points which can be collected and aggregated without making it clear what individual they belong to. Other combinations of data points can identify an individual from just a few pieces of information. The aggregation of data further clouds this issue, as combining records of non-identifiable personal data can sometimes lead to the creation of new PII within databases. Processors of such data should take care when collating data to ensure this does not occur unintentionally.

Safety is paramount

Data security is of the utmost importance, and it’s likely that large businesses will already have robust procedures in place for protecting their data. Even so, there were thousands of data breaches in 2018, affecting hundreds of millions of individuals. If even those companies with the most money to throw at cybersecurity are vulnerable to hacks and attacks, what chance do smaller firms have? This makes for tricky decisions: while collecting and analysing personal data can provide useful customer insights and even improve the levels of service companies offer those customers, it can only be considered if the data collected can be kept safe.

Emerging issues – data inclusivity

Looking forward, it’s clear that community and collaboration will be key factors in how new ethical guidelines are produced. Programs like the Regional Big Data Innovation Hubs and Spokes event, which reviews data ethics principles produced by a variety of working groups, are steps in the right direction. There is talk of going as far as having data scientists sign up for a Hippocratic Oath like that taken by doctors to “do no harm”. Data even has the power to challenge underlying societal norms, as in the case of human and algorithmic biases discovered in certain facial recognition software programs. This has led to calls to reform the algorithms and code libraries used as underlying building blocks of the tools built for data analysis.


Repost from :