Full Stack Data Scientist: a Jack-of-All-Trades

栏目: IT技术 · 发布时间: 3年前

内容简介：A full-stack data scientist is a jack-of-all trades who engineers and works on each stage in the data science lifecycle, from beginning to end.The scope of a full stack data scientist covers every component of a data science business initiative, from ident

What is a Full Stack Data Scientist?

The scope of the role and skills required

Photo by freestocks on Unsplash

A full-stack data scientist is a jack-of-all trades who engineers and works on each stage in the data science lifecycle, from beginning to end.

The scope of a full stack data scientist covers every component of a data science business initiative, from identifying to training to deploying machine learning models that provide benefit to stakeholders.

Basic stages in the data science life cycle

Basic stages in the data science lifecycle that can be owned by a full stack data scientist:

Business problem. Unless research-oriented, all data science projects should start with a problem that adds value to a business either through efficiency gains, automation, or new capabilities.
Data collection/identification . Machine learning requires quality data to build a quality model for use.
Data exploration and analysis . The data must be analyzed and understood before a model can be built.
Machine learning . Train a model to solve the business problem given the data.
Model analysis and acceptance . Analyze the model results and behavior. Share with stakeholders for approval.
Model deployment . Make the model accessible to the end-user.
Model monitoring . Ensure that the model behaves as expected in the future.

A Jack of all Trades: the Skillset

The high-level skills listed are also keys to successful data science initiatives. It is worth highlighting the soft skills , without which data science technology may not provide value.

Business Acumen

A full stack data scientist must be able to identify and understand business problems (or opportunities) that can be solved using the data science toolkit.

In order to prioritize projects and process flows with most value to their organization, they must understand the needs and goals of their organization.

Ultimately the business doesn’t care how cool or accurate a model is if it provides no value.

Collaboration

Full stack data scientists do not work in a vacuum. They must collaborate with stakeholders to identify existing problems or inefficiencies that can be solved with data science. Once problems are identified, collaboration is essential to ensure that the result is acceptable and meets their needs. Further, collaboration with SME’s (Subject Matter Expert) enables a them to work quickly, such as finding data sources in the organization.

Communication

Effective communication with the business via oral and written mediums allows for better collaboration and “selling” the model to the end-users. This means tailoring data science ideas, results, and value in plain language to non-technical audiences. In some cases, the end-user must understand and trust the model before they choose to use it.

Identifying Data Sources and ETL

Models cannot be trained if there is no data. Oftentimes data is not readily available; it needs to be found, extracted, transformed, and loaded to the right place.

Programming

A full stack data scientist must be able to write clean, efficient object-oriented code that works reliably in production. Ideally, such code will be modular and each function or class validated by unit tests.

Data Analysis and Exploration

This skill is essential because useful machine learning models cannot be built without data understanding.

Machine Learning and Statistics

Perhaps this is a given — without machine learning or statistics, the work is not “data science”. A full stack data scientist must be able to experiment with appropriate machine learning algorithms to solve machine learning problems.

It is worth highlighting that sometimes implementing logical or business rules over a machine learning solution results in immediate value to the business, despite the simplicity. A machine learning model can take weeks or months to get right whereas a business rule may be “good enough” for now.

Model Deployment / Data Engineering

Lastly, a full stack data scientist must have the skill to deploy model pipelines to production. Model pipelines allow the end-user to query a model with data or access pre-generated model results in a desired way. If no deployment mechanism exists, they must be able to design and set up this pipeline.

If a model is not deployed (or perhaps presented in a business analysis), it is not useful and does not provide business value.

Master of None: the Challenges

The skills listed in the previous section are diverse and varied.

With such varied requirements, is not possible for a full stack data scientist to master all the skills, especially as technology, algorithms, and tools advance. Instead, this person must pick and choose which elements are most useful to the projects at hand and interesting to focus on.

Two key underlying skills of the full stack data scientist are the ability to design a system or process and the ability to quickly pick up new technologies.

The Benefit

On the flip side, a full stack data scientist is a data science team rolled up into one (or two).

For organizations new to data science, they can create business value without immediately building up a full team. To be most effective, the full stack data scientist should be given the ability to select and apply the right tools.

Takeaways

A full stack data scientist goes above and beyond the typical data scientist role in two ways:

Links business needs to machine learning (or not machine learning ) solutions
Deploys models to “production”

These two elements are the keys to any organization extracting value from data science — solving the right problems and making them accessible to the end-user.