Risk And Counter-intuition In Data Science

栏目: IT技术 · 发布时间: 4年前

Risk And Counter-intuition In Data Science

Risk-takers | 100x engineers | Astronauts | From Unsplash

Y ou will always hear this and it’s also in Google’s 43 rules of ML.

“Rule #4: Keep the first model simple and get the infrastructure right.”

With some opinions floating in the market, I feel it’s a good time to spark a discussion about this topic. Otherwise, the opinions of the popular will just drown other ideas. You too should write and share on this! :grinning:

Note: I work in NLP and these opinions are more focussed towards NLP applications. Cannot guarantee truthfulness for tabular and image problems.

Problems with simple models

  • A simple system can require heavy feature engineering and hacks. I will share my experience in the end.

Feature engineering analogy being how deep learning changed Computer Vision and how the new language models put an end to statistical NLP.

  • It’s suitable for companies who want to have good enough automation. If you want to truly win the competition or delight your customers, you need to learn to work with complex models (when they make technical and financial sense). Don’t choose complex for marginal gains of 1%.
  • If the 2nd iteration system is too different from 1st, it requires changing the code again. You wish you had gone slowly, experimented more and made the 2nd directly
  • If the 1st iteration has good and not great metrics, they can put the whole project on the risk of “what is the value addition of ML?” . Also whenever something is automated, there is always someone feeling insecure and they will look for numbers to attack.
We don’t need to rely just on domain experts for features. The automated feature engineering libraries are also rising.
  • First of all, there is nothing like blackbox because there are tools to understand any model.
  • Secondly , blackboxes are not necessarily bad if you test them heavily. But testing gets boring because it happens at the end of a project when we are tired or impatient to put it to production or don’t have enough test data or don’t have resources to create test data.

Mental models at play :boar:

Thinking fast and slow

We often resort to short-circuit fast thinking.

Thinking, Fast and Slow is a best-selling book published in 2011 by Nobel Memorial Prize in Economic Sciences laureate Daniel Kahneman . The central thesis is a dichotomy between two modes of thought : “System 1” is fast , instinctive and emotional; “System 2” is slower, more deliberative, and more logical.

Nowadays when work pressure can be high, we don’t want to think slowly and directly resort to using the old advice of making a simple ML system first.

Why argue when it actually makes your work easy?

Goodhart’s law

“When a metric becomes a target, it ceases to be a good metric.“

When your seniors might hold the philosophy of ‘simple model’, you will avoid conflict and make a good rapport.

To be frank, many heads of data science are appointed because of long job experience rather than their expertise in algorithms and ML strategy. I can’t put names but I hope you get the drift.

Many have actually never moved past logistic and SVM era. They prefer simple because that’s what they understand. Also, they fear project failures to the point they can panic.

I see many argue against putting transformer NLP models in production because they are heavy and more blackbox which is actually not true .

TinyBERT has 15M parameters while AWD-LSTM of ULMFiT has 24M.
Also you can interpret transformer models using Captum by PyTorch.

So far, I have seen good heads only in Tier 1 companies and some startups. The rest startups are doomed for mediocrity. They might make money from simple models but they will not disrupt.

Conformity kills creativity

The philosophy of the group is a function of majority.

As you might have seen while choosing a restaurant — we have to go with the majority because if we took the risk of suggesting a restaurant and forcing people to eat there, the high expectations of people will put immense stress on us.

Why not just chill instead of taking the risk? There is no clear cut reward for taking risk unless its tied to our individual SMART goals!

Have you ever seen ‘take risk’ in anyone’s SMART goals? :joy:

High attrition side effects :hankey:

Data scientists are changing jobs very frequently and have difficulty building trust in the new place. They will reduce the risk of failure of the first project because the risk is proportional to the complexity of the project.

In my experience, your new ideas will hold value only when people trust you and you have a history of execution. People tend to reject the ideas of a new person talking ‘complex’ when the group is in favour of simple.

To change the group thinking, team needs to have an established person with history in the same or old organisation with an appetite for challenging the status quo.
Conformity kills creativity. :smile:

My experience :construction_worker:

Let me tell you about my recent experience. I just deployed a transformer model to production for semantic search . Earlier we had a non-contextual model with a complex pipeline of processing, external APIs and rules to handle edge cases.

Now our new search system is extremely simple and scalable . It's not only better at understanding queries but also faster due to less reliance on external APIs. It also cost less now because we don’t have to use paid APIs anymore.

Choosing a heavier model saves us current and future troubles :moneybag:

This is what happens with many companies. Adding a rule to ML pipeline for edge case always seem saner than making a new system. But if these cases keep popping up every time, you might want to reconsider.

Companies like Google understand this. That’s why they make better language models because they don’t have time to make rules in the age of hyper-personalisation.

Sweet point of complexity :chart_with_upwards_trend:

The current problem of data science is the ‘search space’. We have extremely simple models to 17B parameters models available for free to use.

In order to evaluate the best model for production, we should ideally evaluate all of them with hyper-tuning and then come up with a prescription. But who has time for such experimentation when your competitor is growing fast and we need to put ‘something’ to production ASAP. :confused:

Most of the people panic in such a scenario. I would too. But with time, I have run immense number of experiments from simple to complex and learnt the sweet point of complexity. :ghost:

ML is so empirical that my assumptions turn out wrong on a regular basis. Hence it’s very important to keep trying new methodologies and learn from them.

Don’t start the research when the project starts.

Be proactive. Experiment and read papers.

Setup your own research lab on Colab.

Takeaway: Hire engineers who do research to develop a dictionary of solutions and use their fast-thinking for selecting solution for production.

Conclusion :boom:

My goal is not to convince you to waste time researching and then deploying big models. My aim is to throw light on the importance of innovation and thinking from the basics.

Don’t use time constraints as an excuse to do simple modelling.

I work in a startup myself and hence I understand all types of constraints such as time, data, manpower and compute.

Google doesn’t shy away from putting heavy models to production — be it for machine translation or YouTube’s recommendation system. They have always believed in using the state of the art.

AI is their strategy. Not a part of strategy.

If the cost of production doesn’t work out, they push the limits by doing hardware innovation like TPUs.

My heroes :bow:

  • MapReduce was written by just 2 engineers Sanjay Ghemawat and Jeff Dean .
  • Keras was made by François Chollet to make deep learning easy for everyone.
  • Jeremy Howard discovered a way to make transfer learning work for NLP while doing his personal experiments and shared with everyone.

All these efforts took time to develop but had a global impact. Don’t stop your engineers from experimenting and open-sourcing.

Find your own 100x engineers who can think fast and slow.

Give them freedom.

And they will take you further than you imagined. :rocket:

If you have suggestions, please DM on my LinkedIn or Twitter .


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

计算机网络(第7版)

计算机网络(第7版)

谢希仁 / 电子工业出版社 / 2017-1 / 45.00

本书自1989年首次出版以来,曾于1994年、1999年、2003年、2008年和2013年分别出了修订版。在2006年本书通过了教育部的评审,被纳入普通高等教育“十一五”国家级规划教材;2008年出版的第5版获得了教育部2009年精品教材称号。2013年出版的第6版是“十二五”普通高等教育本科国家级规划教材。 目前2017年发行的第7版又在第6版的基础上进行了一些修订。 全书分为9章,比较......一起来看看 《计算机网络(第7版)》 这本书的介绍吧!

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

URL 编码/解码
URL 编码/解码

URL 编码/解码

RGB HSV 转换
RGB HSV 转换

RGB HSV 互转工具