Python in Computationally-Intensive Areas: Machine Learning

栏目: IT技术 · 发布时间: 5年前

Python in Computationally-Intensive Areas: Machine Learning

Even the most ingenious learning algorithm will not suffice if it never completes.

Machine learning tends to be a computationally-intensive task for many practical use cases. It is vital that your learning algorithm performs well, or at the very least, completes.

Do not get me wrong, there are many practical algorithms and ideas that arise from Computational Learning Theory .

In fact, I have written about a few of them: Defining Goodness in Machine Learning Algorithms and What To Do If Learning Fails .

If performance is key in practical machine learning use cases, why is Python one of the most commonly used language in data science?

Introduction

Let us set the scene before we dive into answering this question. Back when I was a student in my introductory computer science course, the primary language we learned was Java (a compiled language). A year later, the same course was now teaching Python (an interpreted language) as its primary language. Why the switch?

Since Python is not the fastest for every problem, my hypothesis is that Python is just easier to learn and use. This hypothesis can be extended into data science where people of diverse levels of engineering backgrounds are creating great machine learning models, often by means of the back and forth process of experimenting, prototyping and running experiments .

Still, performance is important. So let us dive into how we can use Python in a computationally-intensive area like machine learning.

Global Temperature Prediction using Least Squares Polynomial Fitwith NumPy, SciPy and MatplotLib

Let us step through the creation of a simple Global Temperature Predictor, where we will stop along the way to discuss how Python’s libraries are key in assisting us in machine learning.

To start off, we will be using Numpy.

NumPyis an extension package to Python for multi-dimensional arrays. It is designed for scientific computation and is a memory-efficient container that provides fast numerical operations . Since Numpy is mostly written in C (a very fast compiled language), it is able to off-load its computationally-intensive tasks to its lower-layer.

Here is a quick comparison of loop performance,

Python in Computationally-Intensive Areas: Machine Learning
Python Machine Learning Colab Notebook

Python : 1000 loops, best of 3: 237 µs per loop. NumPy : 1000000 loops, best of 3: 1.22 µs per loop.

Aside from the potentially increased performance, Numpy has a plethora of useful tools. Here are some of my favorite:

numpy.reshape :Gives a new shape to an array without changing its data.

numpy.copy :Performs true copy.

numpy.flatten:Flattens our array.

numpy.empty:Does not set the array values to zero, and may therefore be marginally faster.

numpy.ma:Deals with (propagation of) missing data.

numpy.genfromtxt:Deals with (propagation of) missing data for text files.

numpy.linspace:Evenly spaces numbers over a specified interval.

numpy.clip:Trims outliers.

We will begin by using sample data before we get to the real data. Here we are generating temperature data as a function of month of the year.

We can use Matplotlib, a Python library, to easily visualize our data.

Python in Computationally-Intensive Areas: Machine Learning

Python Machine Learning Colab Notebook

We can then use SciPy to fit our data to a periodic function using the optimize library. No need to reinvent the wheel here.

scipy :A scientific toolkit for Linear algebra, Interpolation, Optimization and fit, Statistics and random numbers, Numerical integration, Fast Fourier transforms, Signal processing, and Image manipulation.

Python in Computationally-Intensive Areas: Machine Learning

Python in Computationally-Intensive Areas: Machine Learning

Python Machine Learning Colab Notebook

Let us extend the idea for our global temperature model on real data. Numpy can easily load in our real data from the NASA GLOBAL Land-Ocean Temperature Index in 0.01 degrees Celsius base period: 1951–1980 . Some of this data has NaN values, but Numpy can handle this without our assistance.

Python in Computationally-Intensive Areas: Machine Learning

Python Machine Learning Colab Notebook

We can plot a heat map with Matplotlib to get intuition about the trends in our data.

Python in Computationally-Intensive Areas: Machine Learning

Python Machine Learning Colab Notebook

We can flatten our data with Numpy so that we can have an easy data set to work with. Next, we split data into train and test, but we want to preserve the order here so we can do predictions on the “future”. Error will be calculated as the squared distance of the model’s prediction to the real data .

Python in Computationally-Intensive Areas: Machine Learning
Python Machine Learning Colab Notebook
Python in Computationally-Intensive Areas: Machine Learning
Python Machine Learning Colab Notebook
Python in Computationally-Intensive Areas: Machine Learning
Python Machine Learning Colab Notebook

Finally, we can perform training and plot our graph. Here we are training with SciPy least squares polynomial fit, where the outcome is a polynomial that minimizes the sum of the squared distance of the model’s prediction to the real data. Its coefficients are the unique model that can perform our predictions. In the below graph, the higher degree polynomial is performing the best on the test set.

Python in Computationally-Intensive Areas: Machine Learning

Python Machine Learning Colab Notebook

Python in Computationally-Intensive Areas: Machine Learning

Python Machine Learning Colab Notebook

Python in Computationally-Intensive Areas: Machine Learning

Python Machine Learning Colab Notebook

Conclusion

What Python lacks in performance, it makes up for in ease of use with its robust libraries. In addition, these libraries often improve Python performance in many use cases.

Please see the linked Colab Notebook for the associated Python source code.

References

http://scipy-lectures.org

Building Machine Learning Systems with Python by Willi Richert and Luis Pedro Coelho


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

数据结构 Python语言描述

数据结构 Python语言描述

[美] Kenneth A. Lambert 兰伯特 / 李军 / 人民邮电出版社 / 2017-12-1 / CNY 69.00

在计算机科学中,数据结构是一门进阶性课程,概念抽象,难度较大。Python语言的语法简单,交互性强。用Python来讲解数据结构等主题,比C语言等实现起来更为容易,更为清晰。 《数据结构 Python语言描述》第1章简单介绍了Python语言的基础知识和特性。第2章到第4章对抽象数据类型、数据结构、复杂度分析、数组和线性链表结构进行了详细介绍,第5章和第6章重点介绍了面向对象设计的相关知识、......一起来看看 《数据结构 Python语言描述》 这本书的介绍吧!

HTML 压缩/解压工具
HTML 压缩/解压工具

在线压缩/解压 HTML 代码

RGB转16进制工具
RGB转16进制工具

RGB HEX 互转工具

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换