Why ethical reasoning should be an essential capability for Data Science teams

栏目: IT技术 · 发布时间: 3年前

内容简介:Wherever new technology is introduced, ethics and legislation will trail behind the applications. The field of data science cannot be called new anymore from a technical point of view, but it has not yet reached maturity in terms of ethics and legislation.

And two concrete actions to kickstart your team on ethical knowledge

Jul 23 ·6min read

Why ethical reasoning should be an essential capability for Data Science teams

Image source: mbolina via iStock

Wherever new technology is introduced, ethics and legislation will trail behind the applications. The field of data science cannot be called new anymore from a technical point of view, but it has not yet reached maturity in terms of ethics and legislation. As a result, the field is especially prone to make harmful ethical missteps.

How do we prevent these missteps right now , while we wait for — or even better: work on — ethical and legislative maturity?

I propose that the solution lies in taking responsibility as a data scientist yourself . I will give you a brief introduction on data ethics and legislation, before I reach this conclusion. Also, I will share a best-practice from my own team, which gives concrete actions to make your team ethics-ready.

“But data and models are neutral in itself, why worry about good and bad?”

Why ethical reasoning should be an essential capability for Data Science teams

Image source: Kirill_Savenko via iStock

If 2012 denoted the kickoff of the golden age of data science applications — through the crowning of data science as the ‘ Sexiest job of 21st century ’, 2018 might be the age of data ethics . It is the year where the whole world started forming an opinion on how data may and may not be used.

The Cambridge Analytica goal of influencing politics clearly fell in the ‘may not’ camp.

This scandal opened up major discussion about the ethics of data use. Multiple articles have since then discussed situations where the bad of algorithms outweighed the good. The many examples include image recognition AI erroneously denoting humans as gorillas , the chatbot Tay which became too offensive for Twitter within 24 hours and male-preferring HR algorithms (which raises the question: is data science the sexiest, or the most sexist job of the 21st century?).

Clearly, data applications have left neutral ground.

In addition to — or maybe caused by — attention from the public, large (governmental) organisations such as Google , the EU and the UN now also see the importance of data ethics. Many ‘guidelines of data/AI/ML’ have been published, which can provide ethical guidance when working with data and analytics.

It is not necessary to enter the time-consuming endeavour of reading every single one of these. A meta study on 39 different authors of guidelines shows a strong overlap in the following topics:

  1. Privacy
  2. Accountability
  3. Safety and security
  4. Transparency and explainability
  5. Fairness and non-discrimination

This is a good list of topics to start thinking and reading about. I highly encourage you to deeper investigate these yourselves, as this article will not explain these topics as deeply as their importance deserves.

Legal governance, are we there yet?

Why ethical reasoning should be an essential capability for Data Science teams

Image source: serts via iStock

The discussion on the ethics of data is an important step in the journey towards appropriate data regulation. Ideally, laws are based on shared values, which can be found by thinking and talking about data ethics. To write legislation without prior philosophical contemplation would be like blindly pressing some numbers at a vending machine, and hoping your favourite snack comes out.

Some first pieces of legislation aimed at the ethics of data are already in place. Think of the GDPR , which regulates data privacy in the EU. Even though this regulation is not (yet) fully capable of strictly governing privacy, it does propel privacy — and data ethics as a whole — to the center of the debate. It is not the endpoint, but an important step in the right direction.

At this moment, we find ourselves in an in-between situation in the embedding of modern data technology in society:

  • Technically, we are capable of many potentially worthwhile applications.
  • Ethically, we are reaching the point we can mostly agree what is and what is not acceptable.
  • However, legally, we are not in a place where we can suitably ensure that the harmful applications of data are prevented: most data-ethical scandals are solved in the public domain, and not yet in the legal domain.

Responsibility currently (mostly) rests on the shoulders of Data Scientists

Why ethical reasoning should be an essential capability for Data Science teams

Image source: Asergieiev via iStock

So, the field of data cannot be ethically governed (yet) through legislation. I think that the most promising alternative is self-regulation by those with the most expertise in the field: data science teams themselves .

You might argue that self-regulation brings up the problem of partiality, I do however propose it as an in-between solution for the in-between situation we find ourselves in. As soon as legislation on data use is more mature, less–but never zero–self-regulation is necessary.

Another struggle is that many data scientists find themselves in a split between acting ethically and creating the most accurate model. By taking ethical responsibility, data scientists also receive the responsibility to resolve this tension.

I am persuadable with the argument that the unethical alternative might be more expensive in terms of money (e.g. GDPR fines) or damage to company image. Your employer or client may be harder to convince. “ How to persuade your stakeholders to use data ethically ” sounds like a good topic for a future article.

My proposal has an important consequence for data science teams: next to technical skills, they would also need knowledge on data ethics. This knowledge cannot be assumed to be present automatically, as software firm Anaconda found that just 18% of data science students say they received education on data ethics in their studies.

Moreover, just a single person with ethical knowledge wouldn’t be enough, every practitioner of data science must have basic skill in identifying potential ethical threats of their work. Otherwise the risk for ethical accidents remains substantial. But how to reach overall ethical knowhow in your team?

Two concrete actions towards ethical knowledge

Why ethical reasoning should be an essential capability for Data Science teams

Image source: davidf via iStock

Within my own team, we take a two-step approach:

  1. group-wide discussion on what each finds ethically important when dealing with data and algorithms
  2. construct a group-wide accepted ethical doctrine based on this discussion

In the first step we educate the group on the current status in data ethics in both academia and business. This includes discussing problems of data ethics in the news, explaining the most prevalent ethical frameworks, and conversation about how ethical problems may arise in daily work. This should enable each individual member to form an opinion on data ethics.

The team-wide ethical data guidelines constructed in the second step should give our data scientists a strong grounding in identifying potential threats. The guidelines shouldn’t be constructed top down; the individual input that comes out of the group-wide discussions forms a much better basis. This way, general guidelines that represent every data scientist can be constructed.

The doctrine will not succeed if constructed as a detailed step-by-step list. Instead, it should serve as a general guideline that helps to identify which individual cases should be further discussed.

Precisely that should be a task of the data scientist: ensure that potentially unethical data usage will not go unnoticed. Unethical usage not only by data scientists, but by all colleagues who may use data in their work. This way, awareness for data ethics is raised, which enables companies to responsibly leverage the power of data.

In short: start talking about data ethics

We are technically capable of life-changing data applications, however a safety net in the form of legislation is not yet in place. Data scientists walk a tightrope over a deep valley of harmful application, where overall knowledge of ethics acts as the pole that helps them balance. By initiating the proper discussion, your data science team has the tools to prevent expensive ethical missteps.

As I argue in the article, discussion on data ethics propels the field towards maturity, such that we can arrive at a “ rigorous and complex ethical examination ” of data science. So, engage in discussion: be critical about this content, form an opinion, talk about it, and change your opinion often as you encounter novel information. This not only makes you a better data scientist; it makes the whole field better.


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

51单片机应用从零开始

51单片机应用从零开始

杨欣、王玉凤、刘湘黔 / 清华大学 / 2008-1 / 39.80元

《51单片机应用与实践丛书•51单片机应用从零开始》在分析初学者认知规律的基础上,结合国内重点大学一线教师的教学经验以及借鉴国外经典教材的写作手法,对51单片机的应用基础知识进行系统而翔实的介绍。读者学习每一章之后,"实例点拨"环节除了可以巩固所学的内容外,还开辟了单片机应用的视野;再加上"器件介绍"环节,又充实了对单片机从基础到应用所需要的知识。8051单片机不仅是国内用得最多的单片机之一,同时......一起来看看 《51单片机应用从零开始》 这本书的介绍吧!

在线进制转换器
在线进制转换器

各进制数互转换器

图片转BASE64编码
图片转BASE64编码

在线图片转Base64编码工具

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器