Python in Computationally-Intensive Areas: Machine Learning

栏目: IT技术 · 发布时间: 3年前

Python in Computationally-Intensive Areas: Machine Learning

Even the most ingenious learning algorithm will not suffice if it never completes.

Machine learning tends to be a computationally-intensive task for many practical use cases. It is vital that your learning algorithm performs well, or at the very least, completes.

Do not get me wrong, there are many practical algorithms and ideas that arise from Computational Learning Theory .

In fact, I have written about a few of them: Defining Goodness in Machine Learning Algorithms and What To Do If Learning Fails .

If performance is key in practical machine learning use cases, why is Python one of the most commonly used language in data science?

Introduction

Let us set the scene before we dive into answering this question. Back when I was a student in my introductory computer science course, the primary language we learned was Java (a compiled language). A year later, the same course was now teaching Python (an interpreted language) as its primary language. Why the switch?

Since Python is not the fastest for every problem, my hypothesis is that Python is just easier to learn and use. This hypothesis can be extended into data science where people of diverse levels of engineering backgrounds are creating great machine learning models, often by means of the back and forth process of experimenting, prototyping and running experiments .

Still, performance is important. So let us dive into how we can use Python in a computationally-intensive area like machine learning.

Global Temperature Prediction using Least Squares Polynomial Fitwith NumPy, SciPy and MatplotLib

Let us step through the creation of a simple Global Temperature Predictor, where we will stop along the way to discuss how Python’s libraries are key in assisting us in machine learning.

To start off, we will be using Numpy.

NumPyis an extension package to Python for multi-dimensional arrays. It is designed for scientific computation and is a memory-efficient container that provides fast numerical operations . Since Numpy is mostly written in C (a very fast compiled language), it is able to off-load its computationally-intensive tasks to its lower-layer.

Here is a quick comparison of loop performance,

Python in Computationally-Intensive Areas: Machine Learning
Python Machine Learning Colab Notebook

Python : 1000 loops, best of 3: 237 µs per loop. NumPy : 1000000 loops, best of 3: 1.22 µs per loop.

Aside from the potentially increased performance, Numpy has a plethora of useful tools. Here are some of my favorite:

numpy.reshape :Gives a new shape to an array without changing its data.

numpy.copy :Performs true copy.

numpy.flatten:Flattens our array.

numpy.empty:Does not set the array values to zero, and may therefore be marginally faster.

numpy.ma:Deals with (propagation of) missing data.

numpy.genfromtxt:Deals with (propagation of) missing data for text files.

numpy.linspace:Evenly spaces numbers over a specified interval.

numpy.clip:Trims outliers.

We will begin by using sample data before we get to the real data. Here we are generating temperature data as a function of month of the year.

We can use Matplotlib, a Python library, to easily visualize our data.

Python in Computationally-Intensive Areas: Machine Learning

Python Machine Learning Colab Notebook

We can then use SciPy to fit our data to a periodic function using the optimize library. No need to reinvent the wheel here.

scipy :A scientific toolkit for Linear algebra, Interpolation, Optimization and fit, Statistics and random numbers, Numerical integration, Fast Fourier transforms, Signal processing, and Image manipulation.

Python in Computationally-Intensive Areas: Machine Learning

Python in Computationally-Intensive Areas: Machine Learning

Python Machine Learning Colab Notebook

Let us extend the idea for our global temperature model on real data. Numpy can easily load in our real data from the NASA GLOBAL Land-Ocean Temperature Index in 0.01 degrees Celsius base period: 1951–1980 . Some of this data has NaN values, but Numpy can handle this without our assistance.

Python in Computationally-Intensive Areas: Machine Learning

Python Machine Learning Colab Notebook

We can plot a heat map with Matplotlib to get intuition about the trends in our data.

Python in Computationally-Intensive Areas: Machine Learning

Python Machine Learning Colab Notebook

We can flatten our data with Numpy so that we can have an easy data set to work with. Next, we split data into train and test, but we want to preserve the order here so we can do predictions on the “future”. Error will be calculated as the squared distance of the model’s prediction to the real data .

Python in Computationally-Intensive Areas: Machine Learning
Python Machine Learning Colab Notebook
Python in Computationally-Intensive Areas: Machine Learning
Python Machine Learning Colab Notebook
Python in Computationally-Intensive Areas: Machine Learning
Python Machine Learning Colab Notebook

Finally, we can perform training and plot our graph. Here we are training with SciPy least squares polynomial fit, where the outcome is a polynomial that minimizes the sum of the squared distance of the model’s prediction to the real data. Its coefficients are the unique model that can perform our predictions. In the below graph, the higher degree polynomial is performing the best on the test set.

Python in Computationally-Intensive Areas: Machine Learning

Python Machine Learning Colab Notebook

Python in Computationally-Intensive Areas: Machine Learning

Python Machine Learning Colab Notebook

Python in Computationally-Intensive Areas: Machine Learning

Python Machine Learning Colab Notebook

Conclusion

What Python lacks in performance, it makes up for in ease of use with its robust libraries. In addition, these libraries often improve Python performance in many use cases.

Please see the linked Colab Notebook for the associated Python source code.

References

http://scipy-lectures.org

Building Machine Learning Systems with Python by Willi Richert and Luis Pedro Coelho


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

数据库系统实现

数据库系统实现

加西亚-莫利纳(Hector Garcia-Molina)、Jeffrey D.Ullman、Jennifer Widom / 杨冬青、吴愈青、包小源 / 机械工业出版社 / 2010-5 / 59.00元

《数据库系统实现(第2版)》是斯坦福大学计算机科学专业数据库系列课程第二门课的教科书。书中对数据库系统实现原理进行了深入阐述,并具体讨论了数据库管理系统的三个主要成分——存储管理器、查询处理器和事务管理器的实现技术。此外,第2版充分反映了数据管理技术的新进展,对内容进行了扩充,除了在第1版中原有的“信息集成”一章(第10章)中加入了新的内容外,还增加了两个全新的章:“数据挖掘”(第11章)和“数据......一起来看看 《数据库系统实现》 这本书的介绍吧!

HTML 压缩/解压工具
HTML 压缩/解压工具

在线压缩/解压 HTML 代码

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具