How a simple textual explanation can add value to your data science results

栏目: IT技术 · 发布时间: 3年前

内容简介:The popular saying “A picture is worth a thousand words” may be wrong when it comes to data science. Take example of Uber Expected Arrival Time (ETA) algorithm which informs the user when the ride is expected to arrive.Behind the ETA , there is lot of comp

How a simple textual explanation can add value to your data science results

Enhance the power of your data exploration using textual explanations

The popular saying “A picture is worth a thousand words” may be wrong when it comes to data science. Take example of Uber Expected Arrival Time (ETA) algorithm which informs the user when the ride is expected to arrive.

Behind the ETA , there is lot of complex predictive algorithm and cutting-edge visualisation with the map getting updated in real time. But all this of no use without the single text line which says “The closest driver is approximately 1 min away”

Uber Expected Time of Arrival (ETA) algorithm in action

A data scientist or data analyst produces lot of data visualisation during a data exploration phase. All the cool visualisations look great, but you can really enhance its values using short textual explanations . Also in many cases, visualisations alone are not sufficient.

Only visualisations without explanations are source of misinterpretation

Take a simple example of a histogram. Shown below is histogram of a stock price close value

Just by looking at this visualisation, one can make many interpretation such as

Interpretation 1 — The maximum occurring value is between 13 and (something…).

Interpretation 2 — The lowest value seems to be between 5 something and 10 something

Stock trading is an area where one has to be very precise in values. So if interpretation is not precise, the visualisation alone does not help.

Data story-telling is a compulsion because visualisations alone do not do the job

Since many years data story telling has become a must-have skill for data scientist. But actually speaking, it is a compulsion because visualisations alone cannot convey the story.

A very simple visualisation can have a great story behind it. But unless it is told, it never surfaces. Take the histogram visualisation which was shown above.

The real story behind the histogram is that the stock price is swinging between 11 and 15 and stays on 12 for a very short amount of time. So the buying opportunity on 12 is very short. This kind of story is impossible to capture in a visualisation and needs to be physically told. Even if advanced visualisation such as animations are used, it still requires someone physically to tell the story

So this is where the power of a explanation comes into play. Adding a short textual explanation enhances what the value of visualisation. You go from showing visualisation to convey something meaningful

Let us see now some examples where explanations enhance interpretation of visualisation

Explaining a correlation matrix and avoid the stress of a “color-maze”

A correlation matrix visually looks stunning. However due to presence of lot of different shades of color, one has to look hard to interpret it. However just by adding a few lines of textual explanation increases vastly the interpretation of correlation matrix. The text can explain which are the most correlated data, as well as what the different shades of color mean

Shown below is correlation matrix based on car data. As you can see that just by adding a small explanation clearly enhances the value of the nice-looking correlation matrix. It will save your users “eye-balling” to see which are the most correlated data

Example of text explanation of correlation matrix

Explaining a cumulative distribution to avoid “eye-balling” x and y axis

Cumulative distributions are very important to show how a numeric value is distributed. It is also creative way of focusing on important threshold of the numeric column

However just showing the cumulative distribution without any explanation is a painful eye-balling exercise. With a short explanation text about different threshold levels immediately gets the power of cumulative distribution to the next level and starts making sense

Shown below is cumulative distribution of stock price. With text explanations on thresholds (example 80% of close prices are less then 79.31) clearly enhances the value of a cumulative distribution visualisation

Example of text explanation of cumulative distribution

Explaining result of clustering to avoid any guess work

Clustering is a very powerful tool for any data exploration activity. However it can be one of the most mis-interpreted if not clearly explained. The result of clustering is generally a scatter plot with clusters shown in different colors. However the catch here is the fact that a 2D scatter plot visually shows only 2 columns of your data, where the clustering itself resulted from much more columns

So in order to correctly explain the clustering results, you need to use textual explanation which contains the feature importance of the clustering results

Example of text explanation of clustering

Including text generation functions in your developments

As data scientists, we focus on coding for all activities from data preparation, feature engineering, hyper parameter tuning, modeling, visualisation. But most of us do not focus on automatically generating textual explanations of results. So it is an good idea to make a habit to include functions which generate textual explanations inside the code

As more and more algorithms are packaged into products meant for end-users, the textual explanations of results is becoming very evident. And will make your data science work more appealing to a wider audience


以上所述就是小编给大家介绍的《How a simple textual explanation can add value to your data science results》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

引爆点

引爆点

[美] 马尔科姆·格拉德威尔 / 钱清、覃爱冬 / 中信出版社 / 2006-1 / 29.80元

这本书是《纽约客》杂志专职作家马尔科姆·格拉德威尔的一部才华横溢之作。他以社会上突如其来的流行风潮研究为切入点,从一个全新的角度探索了控制科学和营销模式。他认为,思想、行为、信息以及产品常常会像传染病爆发一样,迅速传播蔓延。正如一个病人就能引起一场全城流感;如果个别工作人员对顾客大打出手,或几位涂鸦爱好者管不住自己,也能在地铁里掀起一场犯罪浪潮;一位满意而归的顾客还能让新开张的餐馆座无虚席。这些现......一起来看看 《引爆点》 这本书的介绍吧!

RGB转16进制工具
RGB转16进制工具

RGB HEX 互转工具

XML、JSON 在线转换
XML、JSON 在线转换

在线XML、JSON转换工具

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换