Full Stack Data Scientist: a Jack-of-All-Trades

栏目: IT技术 · 发布时间: 3年前

内容简介:A full-stack data scientist is a jack-of-all trades who engineers and works on each stage in the data science lifecycle, from beginning to end.The scope of a full stack data scientist covers every component of a data science business initiative, from ident

What is a Full Stack Data Scientist?

The scope of the role and skills required

Photo by freestocks on Unsplash

A full-stack data scientist is a jack-of-all trades who engineers and works on each stage in the data science lifecycle, from beginning to end.

The scope of a full stack data scientist covers every component of a data science business initiative, from identifying to training to deploying machine learning models that provide benefit to stakeholders.

Basic stages in the data science life cycle

Basic stages in the data science lifecycle that can be owned by a full stack data scientist:

  1. Business problem. Unless research-oriented, all data science projects should start with a problem that adds value to a business either through efficiency gains, automation, or new capabilities.
  2. Data collection/identification . Machine learning requires quality data to build a quality model for use.
  3. Data exploration and analysis . The data must be analyzed and understood before a model can be built.
  4. Machine learning . Train a model to solve the business problem given the data.
  5. Model analysis and acceptance . Analyze the model results and behavior. Share with stakeholders for approval.
  6. Model deployment . Make the model accessible to the end-user.
  7. Model monitoring . Ensure that the model behaves as expected in the future.

A Jack of all Trades: the Skillset

The high-level skills listed are also keys to successful data science initiatives. It is worth highlighting the soft skills , without which data science technology may not provide value.

Business Acumen

A full stack data scientist must be able to identify and understand business problems (or opportunities) that can be solved using the data science toolkit.

In order to prioritize projects and process flows with most value to their organization, they must understand the needs and goals of their organization.

Ultimately the business doesn’t care how cool or accurate a model is if it provides no value.

Collaboration

Full stack data scientists do not work in a vacuum. They must collaborate with stakeholders to identify existing problems or inefficiencies that can be solved with data science. Once problems are identified, collaboration is essential to ensure that the result is acceptable and meets their needs. Further, collaboration with SME’s (Subject Matter Expert) enables a them to work quickly, such as finding data sources in the organization.

Communication

Effective communication with the business via oral and written mediums allows for better collaboration and “selling” the model to the end-users. This means tailoring data science ideas, results, and value in plain language to non-technical audiences. In some cases, the end-user must understand and trust the model before they choose to use it.

Identifying Data Sources and ETL

Models cannot be trained if there is no data. Oftentimes data is not readily available; it needs to be found, extracted, transformed, and loaded to the right place.

Programming

A full stack data scientist must be able to write clean, efficient object-oriented code that works reliably in production. Ideally, such code will be modular and each function or class validated by unit tests.

Data Analysis and Exploration

This skill is essential because useful machine learning models cannot be built without data understanding.

Machine Learning and Statistics

Perhaps this is a given — without machine learning or statistics, the work is not “data science”. A full stack data scientist must be able to experiment with appropriate machine learning algorithms to solve machine learning problems.

It is worth highlighting that sometimes implementing logical or business rules over a machine learning solution results in immediate value to the business, despite the simplicity. A machine learning model can take weeks or months to get right whereas a business rule may be “good enough” for now.

Model Deployment / Data Engineering

Lastly, a full stack data scientist must have the skill to deploy model pipelines to production. Model pipelines allow the end-user to query a model with data or access pre-generated model results in a desired way. If no deployment mechanism exists, they must be able to design and set up this pipeline.

If a model is not deployed (or perhaps presented in a business analysis), it is not useful and does not provide business value.

Master of None: the Challenges

The skills listed in the previous section are diverse and varied.

With such varied requirements, is not possible for a full stack data scientist to master all the skills, especially as technology, algorithms, and tools advance. Instead, this person must pick and choose which elements are most useful to the projects at hand and interesting to focus on.

Two key underlying skills of the full stack data scientist are the ability to design a system or process and the ability to quickly pick up new technologies.

The Benefit

On the flip side, a full stack data scientist is a data science team rolled up into one (or two).

For organizations new to data science, they can create business value without immediately building up a full team. To be most effective, the full stack data scientist should be given the ability to select and apply the right tools.

Takeaways

A full stack data scientist goes above and beyond the typical data scientist role in two ways:

  1. Links business needs to machine learning (or not machine learning ) solutions
  2. Deploys models to “production”

These two elements are the keys to any organization extracting value from data science — solving the right problems and making them accessible to the end-user.

Further Reading


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

算法设计与分析

算法设计与分析

屈婉玲、刘田、张立昂、王捍贫 / 清华大学 / 2011-5 / 25.00元

《算法设计与分析》为计算机科学技术专业核心课程“算法设计与分析”教材.全书以算法设计技术和分析方法为主线来组织各知识单元,主要内容包括基础知识、分治策略、动态规划、贪心法、回溯与分支限界、算法分析与问题的计算复杂度、NP完全性、近似算法、随机算法、处理难解问题的策略等。书中突出对问题本身的分析和求解方法的阐述,从问题建模、算法设计与分析、改进措施等方面给出适当的建议,同时也简要介绍了计算复杂性理论......一起来看看 《算法设计与分析》 这本书的介绍吧!

HTML 编码/解码
HTML 编码/解码

HTML 编码/解码

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

正则表达式在线测试
正则表达式在线测试

正则表达式在线测试