How internal politics interfere with data science

栏目: IT技术 · 发布时间: 3年前

内容简介:Machine learning in production is a fairly new phenomenon, and as such, the playbook for managing data science teams that build production ML pipelines is still being written. As a result, well-intentioned companies often get things wrong, and accidentally

How companies can help data scientists do their job

May 28 ·4min read

How internal politics interfere with data science

Source: Pexels

Machine learning in production is a fairly new phenomenon, and as such, the playbook for managing data science teams that build production ML pipelines is still being written. As a result, well-intentioned companies often get things wrong, and accidentally institute policies that, while designed to improve things, actually hamper their data scientists.

I work with a lot of data science teams (for context, I contribute to Cortex , an open source model serving platform) and I constantly hear stories about weird internal policies that—while designed to make things smoother—make their day-to-day harder.

Policies like…

1. All cloud resources need a formal request

If your team doesn’t use local machines, then just about every operation you do requires cloud resources. Processing data, training and evaluating models, experimenting in notebooks, deploying models to production — everything requires a server.

I’ve worked with more than one team in which any request for cloud resources needed to be submitted formally for approval. That means nearly any new ML-related operation required a request process.

Typically, companies adopt this setup for security and control purposes. The fewer people have cloud access, the thinking goes, the fewer opportunities for mistakes. The problem is that development slows to a crawl in these environments. Instead of exploring new ideas or building new models, data scientists spend days navigating the red tape required to get an EC2 instance.

Giving data scientists cloud privileges, at least enough that they can independently do the basic cloud operations required for their job, would significantly increase the speed at which teams move.

2. GPUs are fine — but only for training

One data science team we worked with was actually given their own AWS accounts, one for dev and another for prod. Their dev account, where they did model training, had access to GPUs. Their prod account, where models were deployed, did not.

On some level, you can kind of see where the DevOps team was coming from. They were held accountable for cloud spend, and GPU inference can get expensive. If data scientists were getting by without it before, why did they need it now?

In order to get GPUs, the data science team would need to prove that the models they were deploying couldn’t generate predictions with acceptable latency without GPUs. That’s a lot of friction—particularly when you realize they were already running GPU instances in dev.

3. Notebook instances must be monitored nonstop

One data science team had some weird policies when it came to notebooks.

Basically, the IT department was very vigilant in making sure there was no unnecessary spending, specifically on notebook instances. They were aggressive in making sure that instances were deleted as soon as possible, and were required to sign off on any new notebook instances.

In theory, holding someone responsible for how resources are managed is a good thing. However, holding someone accountable for managing another team’s resources is clearly a recipe for disaster. Instead of ending up with a lean, financially responsible data science team, these companies burn money on wasted time, as data scientists spend hours twiddling their thumbs waiting for approval from IT.

In addition, this is terrible for morale. Imagine needing to file a ticket just to open up a notebook? Or having to answer probing questions about any long-running notebook instance?

Infrastructure makes a better safeguard than policy

As frustrating and bizarre as these policies can seem, they’re all designed for legitimate purposes. Companies want to institute safeguards to control costs, create accountability, and ensure security.

The mistake, however, is that these safeguards are instituted through management policies when they should be baked into the infrastructure.

For example, instead of having a policy for requesting cloud resources, some of the best teams we work with simply give data scientists IAM roles with tightly scoped permissions. Similarly, instead of having another team constantly monitoring data scientists, many companies give the data science team a provisioned-but-limited sandbox to experiment in.

We think about how to solve these issues via infrastructure all the time when building our model serving platform . For example, we want to lower inference costs, but we also want to give data scientists the latitude to use the best tool for the job. Instead of putting limits around GPU inference, we built out spot instance support. Now, teams can use GPU instances with ~90% discounts.

When you try and enforce these safeguards through management policies, you introduce a complex web of human dependencies and a hierarchy of authority. Inevitably, this slows progress and introduces friction.

When you institute these safeguards as features of your infrastructure, however, you enable your team to move independently and quickly—just with some guardrails.


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

美铁之战

美铁之战

[英]帕特里克·蒂利 / 黑曜、超侠 / 百花文艺出版社 / 2018-9 / 44.80元

本书的故事发生在未来,一场核战毁灭了北美大陆上的人类文明,残存下来的人类分化成两拨:生活在地面上退化到刀耕火种时代的平原人;躲藏在地下苟延残喘的沙穴人。几百年后,当保留着战前文明的沙穴人尝试着登上地面,和平原人的同室操戈将不可避免地上演……一起来看看 《美铁之战》 这本书的介绍吧!

RGB转16进制工具
RGB转16进制工具

RGB HEX 互转工具

HTML 编码/解码
HTML 编码/解码

HTML 编码/解码

MD5 加密
MD5 加密

MD5 加密工具