4 Less Known Pandas Functions That Can Make Your Work Easier

栏目: IT技术 · 发布时间: 6年前

Learn Pandas for Data Science

4 Less Known Pandas Functions That Can Make Your Work Easier

Supercharge you data science projects

4 Less Known Pandas Functions That Can Make Your Work Easier

Photo by Jérémy Stenuit on Unsplash

Many data scientists have been using Python as their programming language of choice. As an open-source language, Python has gained considerable popularity by providing a variety of data science-related libraries. Particularly, the pandas library is arguably the most prevalent toolbox among Python-based data scientists.

I have to say that the pandas library is so well developed that it provides a very large collection of functions for various operations. However, the drawback of this powerful toolbox is that some useful functions can be less known to beginners. In this article, I would like to share four such functions.

1. The where() Function

Most of the time for the dataset that we’re working with, we have to do some data conversion to make the data in the analyzable format. The where() function is useful to replace the values that doesn’t satisfy the condition. Let’s consider the following example for its usage. Certainly, we first needed to import pandas and numpy as we do for all data manipulation steps.

4 Less Known Pandas Functions That Can Make Your Work Easier

Use where() With Series

In the above figure, we created a Series and applied the where() function. Specifically, the signature use of this function is where(condition, other) . In this call, the condition argument will result in boolean values, and when they’re True , the original values are kept, while they’re False , the value specified by the other argument will be used. In our case, any values that below 1000 were kept, while the ones that were equal to or greater than 1000 were assigned to 1000.

This function can’t only be used with Series, but also with DataFrame. Let’s see a similar usage with the DataFrame. In the example below, the DataFrame df0 ’s odd numbers will all be incremented by 1, and the even values are kept.

4 Less Known Pandas Functions That Can Make Your Work Easier

Use where() With DataFrame

2. The pivot_table() Function

Unlike the where() function, the pivot_table() function is only available to DataFrame. This function is to create a spreadsheet-style pivot table, and thus it’s a great tool to summarize, analyze, and present data by displaying the data in a straightforward manner. Its power can be best shown with a more realistic example.

4 Less Known Pandas Functions That Can Make Your Work Easier

Use pivot_table() With DataFrame

In the above figure, we created a DataFrame that consisted of salary and bonus records together with the employees’ gender and department information. We then created a pivot table using the pivot_table() function. Specifically, we set the salary and bonus columns to the values argument, set the department to the index argument, set the gender to the columns argument, and set [np.mean, np.median, np.amax] to the aggfunc argument.

In the output, you can see that we have a pivot table showing us the 2 (gender) by 2 (department) tables in mean, median, and maximum values for the salary and bonus variables. Some interesting observations include that in Department A, women have higher salaries than men, while the pattern is opposite in Department B. In both departments, women and men have similar bonuses.

3. The qcut() Function

When we have a dataset that involves ordinal measures, it sometimes makes more sense to create categorical quantiles to identify possible patterns instead of examining these ordinal measures parametrically. Theoretically, we can calculate the quantile cutoffs ourselves and map the data using these cutoffs to create the new categorical variable.

However, this operation can be easily realized with the qcut() function , which discretizes the variable into equal-sized pools (e.g., quantiles and deciles) based on their ranks. Let’s see how this function works with the following example.

4 Less Known Pandas Functions That Can Make Your Work Easier

Use qcut() With DataFrame

In the above figure, we created a DataFrame having 3 columns. We were interested in generating the quantiles for the var2 column. Thus, we specified the q argument to be 4 (it can be 10 if you want deciles). We also specified the label list to mark these quantiles.

4. The melt() Function

Depending on the tools that data scientists use, some prefer the “wide” format (e.g., one subject one row with multiple variables), while some others prefer the “long” format (e.g., one subject multiple rows with one variable). Thus, it’s not uncommon that we need to do data transformation between these formats.

Unlike the transposition T function that transposes the DataFrame entirely, the melt() function is particularly useful to convert the data from the wide to long format. Let’s see how it works with the following example.

4 Less Known Pandas Functions That Can Make Your Work Easier

Use melt() With DataFrame

In the above figure, we created a DataFrame in a wide format. Specifically, we have two measures before and after taking the medicine. We then used the melt() function to produce a long-format DataFrame. We specified the SubjectID as the id_vars , the two measures as the value_vars , and rename the columns to be more meaningful.

Before You Go

There are many more functions in pandas that we can explore. In this article, we just learned four functions that some of us don’t know too well, but they can be very useful in our daily data manipulation work.

I hope that you enjoyed reading this piece. You can find the code on GitHub .

About the Author

I write blogs about Python and data processing and analysis. Just in case you’ve missed some of my earlier blogs, here are the links to some articles that are relevant to the current one.


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

C++设计新思维

C++设计新思维

(美)Andrei Alexandrescu / 侯捷、於春景 / 华中科技大学出版社 / 2003-03 / 59.8

本书从根本上展示了generic patterns(泛型模式)或pattern templates(模式模板),并将它们视之为“在C++中创造可扩充设计”的一种功能强大的新方法。这种方法结合了template和patterns,你可能未曾想过,但的确存在。为C++打开了全新视野,而且不仅仅在编程方面,还在于软件设计本身;对软件分析和软件体系结构来说,它也具有丰富的内涵。一起来看看 《C++设计新思维》 这本书的介绍吧!

JS 压缩/解压工具
JS 压缩/解压工具

在线压缩/解压 JS 代码

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

XML 在线格式化
XML 在线格式化

在线 XML 格式化压缩工具