Risk And Counter-intuition In Data Science

栏目: IT技术 · 发布时间: 5年前

Y ou will always hear this and it’s also in Google’s 43 rules of ML.

“Rule #4: Keep the first model simple and get the infrastructure right.”

With some opinions floating in the market, I feel it’s a good time to spark a discussion about this topic. Otherwise, the opinions of the popular will just drown other ideas. You too should write and share on this! :grinning:

Note: I work in NLP and these opinions are more focussed towards NLP applications. Cannot guarantee truthfulness for tabular and image problems.

Problems with simple models

A simple system can require heavy feature engineering and hacks. I will share my experience in the end.

Feature engineering analogy being how deep learning changed Computer Vision and how the new language models put an end to statistical NLP.

It’s suitable for companies who want to have good enough automation. If you want to truly win the competition or delight your customers, you need to learn to work with complex models (when they make technical and financial sense). Don’t choose complex for marginal gains of 1%.
If the 2nd iteration system is too different from 1st, it requires changing the code again. You wish you had gone slowly, experimented more and made the 2nd directly
If the 1st iteration has good and not great metrics, they can put the whole project on the risk of “what is the value addition of ML?” . Also whenever something is automated, there is always someone feeling insecure and they will look for numbers to attack.

We don’t need to rely just on domain experts for features. The automated feature engineering libraries are also rising.

First of all, there is nothing like blackbox because there are tools to understand any model.
Secondly , blackboxes are not necessarily bad if you test them heavily. But testing gets boring because it happens at the end of a project when we are tired or impatient to put it to production or don’t have enough test data or don’t have resources to create test data.

Mental models at play :boar:

Thinking fast and slow

We often resort to short-circuit fast thinking.

Thinking, Fast and Slow is a best-selling book published in 2011 by Nobel Memorial Prize in Economic Sciences laureate Daniel Kahneman . The central thesis is a dichotomy between two modes of thought : “System 1” is fast , instinctive and emotional; “System 2” is slower, more deliberative, and more logical.

Nowadays when work pressure can be high, we don’t want to think slowly and directly resort to using the old advice of making a simple ML system first.

Why argue when it actually makes your work easy?

Goodhart’s law

“When a metric becomes a target, it ceases to be a good metric.“

When your seniors might hold the philosophy of ‘simple model’, you will avoid conflict and make a good rapport.

To be frank, many heads of data science are appointed because of long job experience rather than their expertise in algorithms and ML strategy. I can’t put names but I hope you get the drift.

Many have actually never moved past logistic and SVM era. They prefer simple because that’s what they understand. Also, they fear project failures to the point they can panic.

I see many argue against putting transformer NLP models in production because they are heavy and more blackbox which is actually not true .

TinyBERT has 15M parameters while AWD-LSTM of ULMFiT has 24M.
Also you can interpret transformer models using Captum by PyTorch.

So far, I have seen good heads only in Tier 1 companies and some startups. The rest startups are doomed for mediocrity. They might make money from simple models but they will not disrupt.

Conformity kills creativity

The philosophy of the group is a function of majority.

As you might have seen while choosing a restaurant — we have to go with the majority because if we took the risk of suggesting a restaurant and forcing people to eat there, the high expectations of people will put immense stress on us.

Why not just chill instead of taking the risk? There is no clear cut reward for taking risk unless its tied to our individual SMART goals!

Have you ever seen ‘take risk’ in anyone’s SMART goals? :joy:

High attrition side effects :hankey:

Data scientists are changing jobs very frequently and have difficulty building trust in the new place. They will reduce the risk of failure of the first project because the risk is proportional to the complexity of the project.

In my experience, your new ideas will hold value only when people trust you and you have a history of execution. People tend to reject the ideas of a new person talking ‘complex’ when the group is in favour of simple.

To change the group thinking, team needs to have an established person with history in the same or old organisation with an appetite for challenging the status quo.
Conformity kills creativity. :smile:

My experience :construction_worker:

Let me tell you about my recent experience. I just deployed a transformer model to production for semantic search . Earlier we had a non-contextual model with a complex pipeline of processing, external APIs and rules to handle edge cases.

Now our new search system is extremely simple and scalable . It's not only better at understanding queries but also faster due to less reliance on external APIs. It also cost less now because we don’t have to use paid APIs anymore.

Choosing a heavier model saves us current and future troubles :moneybag:

This is what happens with many companies. Adding a rule to ML pipeline for edge case always seem saner than making a new system. But if these cases keep popping up every time, you might want to reconsider.

Companies like Google understand this. That’s why they make better language models because they don’t have time to make rules in the age of hyper-personalisation.

Sweet point of complexity :chart_with_upwards_trend:

The current problem of data science is the ‘search space’. We have extremely simple models to 17B parameters models available for free to use.

In order to evaluate the best model for production, we should ideally evaluate all of them with hyper-tuning and then come up with a prescription. But who has time for such experimentation when your competitor is growing fast and we need to put ‘something’ to production ASAP. :confused:

Most of the people panic in such a scenario. I would too. But with time, I have run immense number of experiments from simple to complex and learnt the sweet point of complexity. :ghost:

ML is so empirical that my assumptions turn out wrong on a regular basis. Hence it’s very important to keep trying new methodologies and learn from them.

Don’t start the research when the project starts.

Be proactive. Experiment and read papers.

Setup your own research lab on Colab.

Takeaway: Hire engineers who do research to develop a dictionary of solutions and use their fast-thinking for selecting solution for production.

Conclusion :boom:

My goal is not to convince you to waste time researching and then deploying big models. My aim is to throw light on the importance of innovation and thinking from the basics.

Don’t use time constraints as an excuse to do simple modelling.

I work in a startup myself and hence I understand all types of constraints such as time, data, manpower and compute.

Google doesn’t shy away from putting heavy models to production — be it for machine translation or YouTube’s recommendation system. They have always believed in using the state of the art.

AI is their strategy. Not a part of strategy.

If the cost of production doesn’t work out, they push the limits by doing hardware innovation like TPUs.

My heroes :bow:

MapReduce was written by just 2 engineers Sanjay Ghemawat and Jeff Dean .
Keras was made by François Chollet to make deep learning easy for everyone.
Jeremy Howard discovered a way to make transfer learning work for NLP while doing his personal experiments and shared with everyone.

All these efforts took time to develop but had a global impact. Don’t stop your engineers from experimenting and open-sourcing.

Find your own 100x engineers who can think fast and slow.

Give them freedom.

And they will take you further than you imagined. :rocket:

If you have suggestions, please DM on my LinkedIn or Twitter .

Modern NLP

All the latest techniques in NLP — Natural Language Processing

medium.com

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网