Ridgeline plots in pure matplotlib

栏目: IT技术 · 发布时间: 5年前

内容简介:A Ridgeline plot (also called Joyplot) allows us to compare several statistical distributions. In this plot each distribution is shown with a density plot, and all the distributions are aligned to the same horizontal axis and, sometimes, presented with a s

A Ridgeline plot (also called Joyplot) allows us to compare several statistical distributions. In this plot each distribution is shown with a density plot, and all the distributions are aligned to the same horizontal axis and, sometimes, presented with a slight overlap.

There are many options to make a Ridgeline plot in Python ( joypy

being one of them) but I decided to make my own function using matplotlib to have full flexibility and minimal dependencies:

from scipy.stats.kde import gaussian_kde
from scipy.stats import norm
import numpy as np
import matplotlib.pyplot as plt

def ridgeline(data, overlap=0, fill=True, labels=None, n_points=150):
    """
    Creates a standard ridgeline plot.

    data, list of lists.
    overlap, overlap between distributions. 1 max overlap, 0 no overlap.
    fill, matplotlib color to fill the distributions.
    n_points, number of points to evaluate each distribution function.
    labels, values to place on the y axis to describe the distributions.
    """
    if overlap > 1 or overlap < 0:
        raise ValueError('overlap must be in [0 1]')
    xx = np.linspace(np.min(np.concatenate(data)),
                     np.max(np.concatenate(data)), n_points)
    curves = []
    ys = []
    for i, d in enumerate(data):
        pdf = gaussian_kde(d)
        y = i*(1.0-overlap)
        ys.append(y)
        curve = pdf(xx)
        if fill:
            plt.fill_between(xx, np.ones(n_points)*y, 
                             curve+y, zorder=len(data)-i+1, color=fill)
        plt.plot(xx, curve+y, c='k', zorder=len(data)-i+1)
    if labels:
        plt.yticks(ys, labels)

The function takes in input a list of datasets where each dataset contains the values to derive a single distribution. Each distribution is estimated using Kernel Density Estimation, just as we've seenpreviously, and plotted increasing the y value.

Let's generate data from few normal distributions with different means and have a look at the output of the function:

data = [norm.rvs(loc=i, scale=2, size=50) for i in range(8)]
ridgeline(data, overlap=.85, fill='y')

Ridgeline plots in pure matplotlib

Not too bad, we can clearly see that each distribution has a different mean. Let's apply the function on real world data:

import pandas as pd
data_url = 'ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_weekly_mlo.txt'
co2_data = pd.read_csv(data_url, sep='\s+', comment='#', na_values=-999.99,
                       names=['year', 'month', 'day', 'decimal', 'ppm', 
                       'days', '1_yr_ago',  '10_yr_ago', 'since_1800'])
co2_data = co2_data[co2_data.year >= 2000]
co2_data = co2_data[co2_data.year != 2020]

plt.figure(figsize=(8, 10))
grouped = [(y, g.ppm.dropna().values) for y, g in co2_data.groupby('year')]
years, data = zip(*grouped)
ridgeline(data, labels=years, overlap=.85, fill='tomato')
plt.title('Distribution of CO2 levels per year since 2000',
          loc='left', fontsize=18, color='gray')
plt.gca().spines['left'].set_visible(False)
plt.gca().spines['right'].set_visible(False)
plt.gca().spines['top'].set_visible(False)
plt.xlabel('ppm')
plt.xlim((co2_data.ppm.min(), co2_data.ppm.max()))
plt.ylim((0, 3.1))
plt.grid(zorder=0)
plt.show()

Ridgeline plots in pure matplotlib

In the snippet above we downloaded the measurements of the concentration of CO2 in the atmosphere, the same data was also usedhere, and grouped the values by year. Then, we generated a Ridgeline plot that shows the distribution of CO2 levels each year since 2000. We easily note that the average concentration went from 370ppm to 420pmm gradually increasing over the 19 years abserved. We also note that the span of each distribution is approximatively 10ppm.


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Web前端黑客技术揭秘

Web前端黑客技术揭秘

钟晨鸣、徐少培 / 电子工业出版社 / 2013-1 / 59.00元

Web前端的黑客攻防技术是一门非常新颖且有趣的黑客技术,主要包含Web前端安全的跨站脚本(XSS)、跨站请求伪造(CSRF)、界面操作劫持这三大类,涉及的知识点涵盖信任与信任关系、Cookie安全、Flash安全、DOM渲染、字符集、跨域、原生态攻击、高级钓鱼、蠕虫思想等,这些都是研究前端安全的人必备的知识点。本书作者深入剖析了许多经典的攻防技巧,并给出了许多独到的安全见解。 本书适合前端工......一起来看看 《Web前端黑客技术揭秘》 这本书的介绍吧!

在线进制转换器
在线进制转换器

各进制数互转换器

随机密码生成器
随机密码生成器

多种字符组合密码

RGB CMYK 转换工具
RGB CMYK 转换工具

RGB CMYK 互转工具