Python一步一步进行数据分析

栏目: Python · 发布时间: 4年前

内容简介:你已经决定来学习Python,但是你之前没有编程经验。因此,你常常对从哪儿着手而感到困惑,这么多Python的知识需要去学习。以下这些是那些开始使用Python数据分析的初学者的普遍遇到的问题:需要多久来学习Python?我需要学习Python到什么程度才能来进行数据分析呢?
编辑推荐:
本文来源博客园,本文主要介绍了设置的编程环境,然后学习怎么使用IPython notebook,希望对您的学习有所帮助。

你已经决定来学习Python,但是你之前没有编程经验。因此,你常常对从哪儿着手而感到困惑,这么多 Python 的知识需要去学习。以下这些是那些开始使用Python数据分析的初学者的普遍遇到的问题:

需要多久来学习Python?

我需要学习Python到什么程度才能来进行数据分析呢?

学习Python最好的书或者课程有哪些呢?

为了处理数据集,我应该成为一个Python的编程专家吗?

当开始学习一项新技术时,这些都是可以理解的困惑,这是《在20小时内学会任何东西》的作者所说的。不要害怕,我将会告诉你怎样快速上手,而不必成为一个Python编程“忍者”。

不要犯我之前犯过的错

在开始使用Python之前,我对用Python进行数据分析有一个误解:我必须不得不对Python编程特别精通。因此,我参加了Udacity的Python编程入门课程,完成了code academy上的Python教程,同时阅读了若干本Python编程书籍。就这样持续了3个月(平均每天3个小时),我那会儿通过完成小的软件项目来学习Python。敲代码是快乐的事儿,但是我的目标不是去成为一个Python开发人员,而是要使用Python数据分析。之后,我意识到,我花了很多时间来学习用Python进行软件开发,而不是数据分析。

在几个小时的深思熟虑之后,我发现,我需要学习5个Python库来有效地解决一系列的数据分析问题。然后,我开始一个接一个的学习这些库。

学习途径

从code academy开始学起,完成上面的所有练习。每天投入3个小时,你应该在20天内完成它们。Code academy涵盖了Python基本概念。但是,它不像Udacity那样以项目为导向;没关系,因为你的目标是从事数据科学,而不是使用Python开发软件。

当完成了code academy练习之后,看看这个Ipython notebook:

Python必备教程(在总结部分我已经提供了下载链接)。

它包括了code academy中没有提到的一些概念。你能在1到2小时内学完这个教程。

现在,你知道足够的基础知识来学习Python库了。

Numpy

首先,开始学习Numpy吧,因为它是利用Python科学计算的基础包。对Numpy好的掌握将会帮助你有效地使用其他 工具 例如Pandas。

我已经准备好了IPython笔记,这包含了Numpy的一些基本概念。这个教程包含了Numpy中最频繁使用的操作,例如,N维数组,索引,数组切片,整数索引,数组转换,通用函数,使用数组处理数据,常用的统计方法,等等。

Numpy Basics Tutorial

Index Numpy 遇到Numpy陌生函数,查询用法,推荐!

Pandas

Pandas包含了高级的数据结构和操作工具,它们使得Python数据分析更加快速和容易。

教程包含了series, data frams,从一个axis删除数据,缺失数据处理,等等。

Pandas Basics Tutorial

Index Pandas 遇到陌生函数,查询用法,推荐!

pandas教程-百度经验

Matplotlib

这是一个分为四部分的Matplolib教程。

1st 部分:

第一部分介绍了Matplotlib基本功能,基本figure类型。

Simple Plotting example

In [113]:
 %matplotlib inline 
 import matplotlib.pyplot as plt #importing matplot lib library
 import numpy as np 
 x = range(100) 
 #print x, print and check what is x
 y =[val**2 for val in x] 
 #print y
 plt.plot(x,y) #plotting x and y
 Out[113]:
 [<matplotlib.lines.Line2D at 0x7857bb0>] 

Python一步一步进行数据分析

for ax in axes:
 ax.plot(x, y, 'r')
 ax.set_xlabel('x')
 ax.set_ylabel('y')
 ax.set_title('title')
 
 fig.tight_layout()

Python一步一步进行数据分析

ax.plot(x, x**2, label="y = x**2")
 ax.plot(x, x**3, label="y = x**3")
 ax.legend(loc=2); # upper left corner
 ax.set_xlabel('x')
 ax.set_ylabel('y')
 ax.set_title('title');

Python一步一步进行数据分析

fig, axes = plt.subplots(1, 2, figsize=(10,4))

axes[0].plot(x, x**2, x, np.exp(x))

axes[1].plot(x, x**2, x, np.exp(x))

axes[1].set_yscale("log")

axes[1].set_title("Logarithmic scale (y)");

Python一步一步进行数据分析

n = np.array([0,1,2,3,4,5])

In [47]:

axes[0].scatter(xx, xx + 0.25*np.random.randn(len(xx)))

axes[0].set_title("scatter")

axes[1].step(n, n**2, lw=2)

axes[1].set_title("step")

axes[2].bar(n, n**2, align="center", width=0.5, alpha=0.5)

axes[2].set_title("bar")

axes[3].fill_between(x, x**2, x**3, color="green", alpha=0.5);

axes[3].set_title("fill_between");

Python一步一步进行数据分析

Using Numpy

In [17]:
 x = np.linspace(0, 2*np.pi, 100)
 y =np.sin(x)
 plt.plot(x,y)
 Out[17]:
 [<matplotlib.lines.Line2D at 0x579aef0>] 

Python一步一步进行数据分析

In [24]:
 x= np.linspace(-3,2, 200)
 Y = x ** 2 - 2 * x + 1.
 plt.plot(x,Y)
 Out[24]:
 [<matplotlib.lines.Line2D at 0x6ffb310>] 

Python一步一步进行数据分析

In [32]:

# plotting multiple plots

x =np.linspace(0, 2 * np.pi, 100)

y = np.sin(x)

z = np.cos(x)

plt.plot(x,y)

plt.plot(x,z)

# Matplot lib picks different colors for different plot.

Python一步一步进行数据分析

In [35]:
 cd C:\Users\tk\Desktop\Matplot
 
 C:\Users\tk\Desktop\Matplot
 In [39]:
 data = np.loadtxt('numpy.txt')
 plt.plot(data[:,0], data[:,1]) # plotting column 1 vs column 2
 # The text in the numpy.txt should look like this
 # 0 0
 # 1 1
 # 2 4
 # 4 16
 # 5 25
 # 6 36
 Out[39]:
 [<matplotlib.lines.Line2D at 0x740f090>] 

Python一步一步进行数据分析

In [56]:

data1 = np.loadtxt('scipy.txt') # load the file

for val in data1.T: #loop over each and every value in data1.T

plt.plot(data1[:,0], val) #data1[:,0] is the first row in data1.T

# data in scipy.txt looks like this:

# 0 0 6

# 1 1 5

# 2 4 4

# 4 16 3

# 5 25 2

# 6 36 1

[[ 0. 1. 2. 4. 5. 6.]

[ 0. 1. 4. 16. 25. 36.]

[ 6. 5. 4. 3. 2. 1.]]

Python一步一步进行数据分析

Scatter Plots and Bar Graphs

In [64]:
 sct = np.random.rand(20, 2)
 print sct
 plt.scatter(sct[:,0], sct[:,1]) # I am plotting a scatter plot.
 
 [[ 0.51454542 0.61859101]
 [ 0.45115993 0.69774873]
 [ 0.29051205 0.28594808]
 [ 0.73240446 0.41905186]
 [ 0.23869394 0.5238878 ]
 [ 0.38422814 0.31108919]
 [ 0.52218967 0.56526379]
 [ 0.60760426 0.80247073]
 [ 0.37239096 0.51279078]
 [ 0.45864677 0.28952167]
 [ 0.8325996 0.28479446]
 [ 0.14609382 0.8275477 ]
 [ 0.86338279 0.87428696]
 [ 0.55481585 0.24481165]
 [ 0.99553336 0.79511137]
 [ 0.55025277 0.67267026]
 [ 0.39052024 0.65924857]
 [ 0.66868207 0.25186664]
 [ 0.64066313 0.74589812]
 [ 0.20587731 0.64977807]]
 Out[64]:
 <matplotlib.collections.PathCollection at 0x78a7110> 

Python一步一步进行数据分析

In [65]:
 ghj =[5, 10 ,15, 20, 25]
 it =[ 1, 2, 3, 4, 5]
 plt.bar(ghj, it) # simple bar graph
 Out[65]:
 <Container object of 5 artists> 

Python一步一步进行数据分析

In [74]:
 ghj =[5, 10 ,15, 20, 25]
 it =[ 1, 2, 3, 4, 5]
 plt.bar(ghj, it, width =5)# you can change the thickness of a bar, by default the bar will have a thickness of 0.8 units
 Out[74]:
 <Container object of 5 artists> 

Python一步一步进行数据分析

In [75]:
 ghj =[5, 10 ,15, 20, 25]
 it =[ 1, 2, 3, 4, 5]
 plt.barh(ghj, it) # barh is a horizontal bar graph
 Out[75]:
 <Container object of 5 artists>

Python一步一步进行数据分析

In [95]:

new_list = [[5., 25., 50., 20.], [4., 23., 51., 17.], [6., 22., 52., 19.]]

x = np.arange(4)

plt.bar(x + 0.00, new_list[0], color ='b', width =0.25)

plt.bar(x + 0.25, new_list[1], color ='r', width =0.25)

#plt.show()

Python一步一步进行数据分析

In [100]:
 #Stacked Bar charts
 p = [5., 30., 45., 22.]
 q = [5., 25., 50., 20.]
 x =range(4)
 plt.bar(x, p, color ='b')
 plt.bar(x, q, color ='y', bottom =p) 
 Out[100]:
 <Container object of 4 artists> 

Python一步一步进行数据分析

In [35]:
 # plotting more than 2 values
 A = np.array([5., 30., 45., 22.])
 B = np.array([5., 25., 50., 20.])
 C = np.array([1., 2., 1., 1.])
 X = np.arange(4)
 plt.bar(X, A, color = 'b')
 plt.bar(X, B, color = 'g', bottom = A)
 plt.bar(X, C, color = 'r', bottom = A + B) # for the third argument, I use A+B
 plt.show() 

Python一步一步进行数据分析

In [94]:
 black_money = np.array([5., 30., 45., 22.]) 
 white_money = np.array([5., 25., 50., 20.])
 z = np.arange(4)
 plt.barh(z, black_money, color ='g')
 plt.barh(z, -white_money, color ='r')# - notation is needed for generating, back to back charts
 Out[94]:
 <Container object of 4 artists> 

Python一步一步进行数据分析

Other Plots

In [114]:
 #Pie charts
 y = [5, 25, 45, 65]
 plt.pie(y)
 Out[114]:
 ([<matplotlib.patches.Wedge at 0x7a19d50>,
 <matplotlib.patches.Wedge at 0x7a252b0>,
 <matplotlib.patches.Wedge at 0x7a257b0>,
 <matplotlib.patches.Wedge at 0x7a25cb0>],
 [<matplotlib.text.Text at 0x7a25070>,
 <matplotlib.text.Text at 0x7a25550>,
 <matplotlib.text.Text at 0x7a25a50>,
 <matplotlib.text.Text at 0x7a25f50>]) 

Python一步一步进行数据分析

In [115]:
 #Histograms
 d = np.random.randn(100)
 plt.hist(d, bins = 20)
 Out[115]:
 (array([ 2., 3., 2., 1., 2., 6., 5., 7., 10., 12., 9.,
 12., 11., 5., 6., 4., 1., 0., 1., 1.]),
 array([-2.9389701 , -2.64475645, -2.35054281, -2.05632916, -1.76211551,
 -1.46790186, -1.17368821, -0.87947456, -0.58526092, -0.29104727,
 0.00316638, 0.29738003, 0.59159368, 0.88580733, 1.18002097,
 1.47423462, 1.76844827, 2.06266192, 2.35687557, 2.65108921,
 2.94530286]),
 <a list of 20 Patch objects>) 

Python一步一步进行数据分析

In [116]:
 d = np.random.randn(100)
 plt.boxplot(d)
 #1) The red bar is the median of the distribution
 #2) The blue box includes 50 percent of the data from the lower quartile to the upper quartile. 
 # Thus, the box is centered on the median of the data.
 Out[116]:
 {'boxes': [<matplotlib.lines.Line2D at 0x7cca090>],
 'caps': [<matplotlib.lines.Line2D at 0x7c02d70>,
 <matplotlib.lines.Line2D at 0x7cc2c90>],
 'fliers': [<matplotlib.lines.Line2D at 0x7cca850>,
 <matplotlib.lines.Line2D at 0x7ccae10>],
 'medians': [<matplotlib.lines.Line2D at 0x7cca470>],
 'whiskers': [<matplotlib.lines.Line2D at 0x7c02730>,
 <matplotlib.lines.Line2D at 0x7cc24b0>]} 

Python一步一步进行数据分析

In [118]:
 d = np.random.randn(100, 5) # generating multiple box plots
 plt.boxplot(d)
 Out[118]:
 {'boxes': [<matplotlib.lines.Line2D at 0x7f49d70>,
 <matplotlib.lines.Line2D at 0x7ea1c90>,
 <matplotlib.lines.Line2D at 0x7eafb90>,
 <matplotlib.lines.Line2D at 0x7ebea90>,
 <matplotlib.lines.Line2D at 0x7ece990>],
 'caps': [<matplotlib.lines.Line2D at 0x7f2b3b0>,
 <matplotlib.lines.Line2D at 0x7f49990>,
 <matplotlib.lines.Line2D at 0x7ea14d0>,
 <matplotlib.lines.Line2D at 0x7ea18b0>,
 <matplotlib.lines.Line2D at 0x7eaf3d0>,
 <matplotlib.lines.Line2D at 0x7eaf7b0>,
 <matplotlib.lines.Line2D at 0x7ebe2d0>,
 <matplotlib.lines.Line2D at 0x7ebe6b0>,
 <matplotlib.lines.Line2D at 0x7ece1d0>,
 <matplotlib.lines.Line2D at 0x7ece5b0>],
 'fliers': [<matplotlib.lines.Line2D at 0x7e98550>,
 <matplotlib.lines.Line2D at 0x7e98930>,
 <matplotlib.lines.Line2D at 0x7ea8470>,
 <matplotlib.lines.Line2D at 0x7ea8a10>,
 <matplotlib.lines.Line2D at 0x7eb6370>,
 <matplotlib.lines.Line2D at 0x7eb6730>,
 <matplotlib.lines.Line2D at 0x7ec6270>,
 <matplotlib.lines.Line2D at 0x7ec6810>,
 <matplotlib.lines.Line2D at 0x8030170>,
 <matplotlib.lines.Line2D at 0x8030710>],
 'medians': [<matplotlib.lines.Line2D at 0x7e98170>,
 <matplotlib.lines.Line2D at 0x7ea8090>,
 <matplotlib.lines.Line2D at 0x7eaff70>,
 <matplotlib.lines.Line2D at 0x7ebee70>,
 <matplotlib.lines.Line2D at 0x7eced70>],
 'whiskers': [<matplotlib.lines.Line2D at 0x7f2bb50>,
 <matplotlib.lines.Line2D at 0x7f491b0>,
 <matplotlib.lines.Line2D at 0x7e98cf0>,
 <matplotlib.lines.Line2D at 0x7ea10f0>,
 <matplotlib.lines.Line2D at 0x7ea8bf0>,
 <matplotlib.lines.Line2D at 0x7ea8fd0>,
 <matplotlib.lines.Line2D at 0x7eb6cd0>,
 <matplotlib.lines.Line2D at 0x7eb6ed0>,
 <matplotlib.lines.Line2D at 0x7ec6bd0>,
 <matplotlib.lines.Line2D at 0x7ec6dd0>]} 

Python一步一步进行数据分析

2nd 部分:

%matplotlib inline

import numpy as np

import matplotlib.pyplot as plt

In [22]:

p =np.random.standard_normal((50,2))

q =np.random.standard_normal((50,2))

q += np.array((1,1)) #center the distribution at (-1,1)

plt.scatter(p[:,0], p[:,1], color ='.25')

plt.scatter(q[:,0], q[:,1], color = '.75')

Out[22]:

<matplotlib.collections.PathCollection at 0x71dab90>

Python一步一步进行数据分析

In [34]:
 dd =np.random.standard_normal((50,2))
 plt.scatter(dd[:,0], dd[:,1], color ='1.0', edgecolor ='0.0') # edge color controls the color of the edge
 Out[34]:
 <matplotlib.collections.PathCollection at 0x7336670> 

Python一步一步进行数据分析

Custom Color for Bar charts,Pie charts and box plots:

In [9]:
 vals = np.random.random_integers(99, size =50)
 color_set = ['.00', '.25', '.50','.75']
 color_lists = [color_set[(len(color_set)* val) // 100] for val in vals]
 c = plt.bar(np.arange(50), vals, color = color_lists)

Python一步一步进行数据分析

In [8]:
 hi =np.random.random_integers(8, size =10)
 color_set =['.00', '.25', '.50', '.75']
 plt.pie(hi, colors = color_set)# colors attribute accepts a range of values
 plt.show()
 #If there are less colors than values, then pyplot.pie() will simply cycle through the color list. In the preceding 
 #example, we gave a list of four colors to color a pie chart that consisted of eight values. Thus, each color will be used twice 

Python一步一步进行数据分析

In [27]:
 values = np.random.randn(100)
 w = plt.boxplot(values)
 for att, lines in w.iteritems():
 for l in lines:
 l.set_color('k') 

Python一步一步进行数据分析

Color Maps

In [34]:
 # how to color scatter plots
 #Colormaps are defined in the matplotib.cm module. This module provides 
 #functions to create and use colormaps. It also provides an exhaustive choice of predefined color maps.
 import matplotlib.cm as cm
 N = 256
 angle = np.linspace(0, 8 * 2 * np.pi, N)
 radius = np.linspace(.5, 1., N)
 X = radius * np.cos(angle)
 Y = radius * np.sin(angle)
 plt.scatter(X,Y, c=angle, cmap = cm.hsv)
 Out[34]:
 <matplotlib.collections.PathCollection at 0x714d9f0>

Python一步一步进行数据分析

In [44]:
 #Color in bar graphs
 import matplotlib.cm as cm
 vals = np.random.random_integers(99, size =50)
 cmap = cm.ScalarMappable(col.Normalize(0,99), cm.binary)
 plt.bar(np.arange(len(vals)),vals, color =cmap.to_rgba(vals))
 Out[44]:
 <Container object of 50 artists> 

Python一步一步进行数据分析

Line Styles

In [4]:

def pq(I, mu, sigma):

a = 1. / (sigma * np.sqrt(2. * np.pi))

b = -1. / (2. * sigma ** 2)

return a * np.exp(b * (I - mu) ** 2)

I =np.linspace(-6,6, 1024)

plt.plot(I, pq(I, 0., 1.), color = 'k', linestyle ='solid')

plt.plot(I, pq(I, 0., .5), color = 'k', linestyle ='dashed')

plt.plot(I, pq(I, 0., .25), color = 'k', linestyle ='dashdot')

Out[4]:

[<matplotlib.lines.Line2D at 0x562ffb0>]

Python一步一步进行数据分析

In [12]:
 N = 15
 A = np.random.random(N)
 B= np.random.random(N)
 X = np.arange(N)
 plt.bar(X, A, color ='.75')
 plt.bar(X, A+B , bottom = A, color ='W', linestyle ='dashed') # plot a bar graph
 plt.show() 

Python一步一步进行数据分析

In [20]:

def gf(X, mu, sigma):

a = 1. / (sigma * np.sqrt(2. * np.pi))

b = -1. / (2. * sigma ** 2)

X = np.linspace(-6, 6, 1024)

for i in range(64):

samples = np.random.standard_normal(50)

mu,sigma = np.mean(samples), np.std(samples)

plt.plot(X, gf(X, mu, sigma), color = '.75', linewidth = .5)

plt.plot(X, gf(X, 0., 1.), color ='.00', linewidth = 3.)

Out[20]:

[<matplotlib.lines.Line2D at 0x59fbab0>]

Python一步一步进行数据分析

Fill surfaces with pattern

In [27]:

N = 15

A = np.random.random(N)

B= np.random.random(N)

X = np.arange(N)

plt.bar(X, A, color ='w', hatch ='x')

# some other hatch attributes are :

#/

#\

#|

#-

#+

#x

#o

#O

#.

#*

Out[27]:

<Container object of 15 artists>

Python一步一步进行数据分析

Marker styles

In [29]:

cd C:\Users\tk\Desktop\Matplot

C:\Users\tk\Desktop\Matplot

Python一步一步进行数据分析

In [14]:

X= np.linspace(-6,6,1024)

Yb = np.sinc(X) +1

plt.plot(X, Ya, marker ='o', color ='.75')

plt.plot(X, Yb, marker ='^', color='.00', markevery= 32)# this one marks every 32 nd element

Out[14]:

[<matplotlib.lines.Line2D at 0x7063150>]

Python一步一步进行数据分析

Own Marker Shapes- come back to this later

In [31]:

# Marker Size

A = np.random.standard_normal((50,2))

B = np.random.standard_normal((50,2))

B += np.array((1, 1))

plt.scatter(A[:,0], A[:,1], color ='k', s =25.0)

plt.scatter(B[:,0], B[:,1], color ='g', s = 100.0) # size of the marker is specified using 's' attribute

Out[31]:

<matplotlib.collections.PathCollection at 0x7d015f0>

Python一步一步进行数据分析

In [20]:
 import matplotlib as mpl
 mpl.rc('lines', linewidth =3)
 mpl.rc('xtick', color ='w') # color of x axis numbers
 mpl.rc('ytick', color = 'w') # color of y axis numbers
 mpl.rc('axes', facecolor ='g', edgecolor ='y') # color of axes 
 mpl.rc('figure', facecolor ='.00',edgecolor ='w') # color of figure
 mpl.rc('axes', color_cycle = ('y','r')) # color of plots
 x = np.linspace(0, 7, 1024)
 plt.plot(x, np.sin(x))
 plt.plot(x, np.cos(x))
 Out[20]:
 [<matplotlib.lines.Line2D at 0x7b0fb70>] 

Python一步一步进行数据分析

3rd 部分:

图的注释--包含若干图,控制坐标轴范围,长款比和坐标轴。

Annotation

In [1]:
 %matplotlib inline
 import numpy as np
 import matplotlib.pyplot as plt
 In [28]:
 X =np.linspace(-6,6, 1024)
 Y =np.sinc(X)
 plt.title('A simple marker exercise')# a title notation
 plt.xlabel('array variables') # adding xlabel
 plt.ylabel(' random variables') # adding ylabel
 plt.text(-5, 0.4, 'Matplotlib') # -5 is the x value and 0.4 is y value
 plt.plot(X,Y, color ='r', marker ='o', markersize =9, markevery = 30, markerfacecolor='w', linewidth = 3.0, markeredgecolor = 'b')
 Out[28]:
 [<matplotlib.lines.Line2D at 0x84b6430>] 

Python一步一步进行数据分析

In [39]:

def pq(I, mu, sigma):

a = 1. / (sigma * np.sqrt(2. * np.pi))

b = -1. / (2. * sigma ** 2)

I =np.linspace(-6,6, 1024)

plt.plot(I, pq(I, 0., 1.), color = 'k', linestyle ='solid')

plt.plot(I, pq(I, 0., .5), color = 'k', linestyle ='dashed')

plt.plot(I, pq(I, 0., .25), color = 'k', linestyle ='dashdot')

# I have created a dictinary of styles

design = {

'facecolor' : 'y', # color used for the text box

'edgecolor' : 'g',

'boxstyle' : 'round'

}

plt.text(-4, 1.5, 'Matplot Lib', bbox = design)

plt.plot(X, Y, c='k')

plt.show()

#This sets the style of the box, which can either be 'round' or 'square'

#'pad': If 'boxstyle' is set to 'square', it defines the amount of padding between the text and the box's sides

Python一步一步进行数据分析

Alignment Control

The vertical alignment options are as follows:

'center': This is relative to the center of the textbox

'top': This is relative to the upper side of the textbox

'bottom': This is relative to the lower side of the textbox

'baseline': This is relative to the text's baseline

Horizontal alignment options are as follows:

align ='bottom' align ='baseline'

------------------------align = center--------------------------------------

align= 'top

In [41]:

cd C:\Users\tk\Desktop

C:\Users\tk\Desktop

In [44]:

from IPython.display import Image

Image(filename='text alignment.png')

#The horizontal alignment options are as follows:

#'center': This is relative to the center of the textbox

#'left': This is relative to the left side of the textbox

#'right': This is relative to the right-hand side of the textbox

Out[44]:

Python一步一步进行数据分析

In [76]:

X = np.linspace(-4, 4, 1024)

plt.annotate('Big Data',

ha ='center', va ='bottom',

xytext =(-1.5, 3.0), xy =(0.75, -2.7),

arrowprops ={'facecolor': 'green', 'shrink':0.05, 'edgecolor': 'black'}) #arrow properties

plt.plot(X, Y)

Out[76]:

[<matplotlib.lines.Line2D at 0x9d1def0>]

Python一步一步进行数据分析

In [74]:

from IPython.display import Image

Image(filename='arrows.png')

Out[74]:

Python一步一步进行数据分析

Legend properties:

'loc': This is the location of the legend. The default value is 'best', which will place it automatically. Other valid values are

'shadow': This can be either True or False, and it renders the legend with a shadow effect.

'fancybox': This can be either True or False and renders the legend with a rounded box.

'title': This renders the legend with the title passed as a parameter.

'ncol': This forces the passed value to be the number of columns for the legend

In [101]:

x =np.linspace(0, 6,1024)

y1 =np.sin(x)

y2 =np.cos(x)

plt.xlabel('Sin Wave')

plt.ylabel('Cos Wave')

plt.plot(x, y1, c='b', lw =3.0, label ='Sin(x)') # labels are specified

plt.plot(x, y2, c ='r', lw =3.0, ls ='--', label ='Cos(x)')

plt.legend(loc ='best', shadow = True, fancybox = False, title ='Waves', ncol =1) # displays the labels

plt.grid(True, lw = 2, ls ='--', c='.75') # adds grid lines to the figure

plt.show()

Python一步一步进行数据分析

Shapes

In [4]:

#Paths for several kinds of shapes are available in the matplotlib.patches module

dis = patches.Circle((0,0), radius = 1.0, color ='.75' )

plt.gca().add_patch(dis) # used to render the image.

dis = patches.Rectangle((2.5, -.5), 2.0, 1.0, color ='.75') #patches.rectangle((x & y coordinates), length, breadth)

plt.gca().add_patch(dis)

dis = patches.Ellipse((0, -2.0), 2.0, 1.0, angle =45, color ='.00')

plt.gca().add_patch(dis)

dis = patches.FancyBboxPatch((2.5, -2.5), 2.0, 1.0, boxstyle ='roundtooth', color ='g')

plt.gca().add_patch(dis)

plt.grid(True)

plt.axis('scaled') # displays the images within the prescribed axis

plt.show()

#FancyBox: This is like a rectangle but takes an additional boxstyle parameter

#(either 'larrow', 'rarrow', 'round', 'round4', 'roundtooth', 'sawtooth', or 'square')

Python一步一步进行数据分析

In [22]:

import matplotlib.patches as patches

theta = np.linspace(0, 2 * np.pi, 8) # generates an array

vertical = np.vstack((np.cos(theta), np.sin(theta))).transpose() # vertical stack clubs the two arrays.

#print vertical, print and see how the array looks

plt.gca().add_patch(patches.Polygon(vertical, color ='y'))

plt.axis('scaled')

plt.grid(True)

#The matplotlib.patches.Polygon()constructor takes a list of coordinates as the inputs, that is, the vertices of the polygon

Python一步一步进行数据分析

In [34]:
 # a polygon can be imbided into a circle
 theta = np.linspace(0, 2 * np.pi, 6) # generates an array
 vertical = np.vstack((np.cos(theta), np.sin(theta))).transpose() # vertical stack clubs the two arrays. 
 #print vertical, print and see how the array looks
 plt.gca().add_patch(plt.Circle((0,0), radius =1.0, color ='b'))
 plt.gca().add_patch(plt.Polygon(vertical, fill =None, lw =4.0, ls ='dashed', edgecolor ='w'))
 plt.axis('scaled')
 plt.grid(True)
 plt.show() 

Python一步一步进行数据分析

In [54]:
 #In matplotlib, ticks are small marks on both the axes of a figure
 import matplotlib.ticker as ticker
 X = np.linspace(-12, 12, 1024)
 Y = .25 * (X + 4.) * (X + 1.) * (X - 2.)
 pl =plt.axes() #the object that manages the axes of a figure
 pl.xaxis.set_major_locator(ticker.MultipleLocator(5))
 pl.xaxis.set_minor_locator(ticker.MultipleLocator(1))
 plt.plot(X, Y, c = 'y')
 plt.grid(True, which ='major') # which can take three values: minor, major and both
 plt.show() 

Python一步一步进行数据分析

In [59]:

name_list = ('Omar', 'Serguey', 'Max', 'Zhou', '

Abidin')

value_list = np.random.randint(0, 99, size =

len(name_list))

pos_list = np.arange(len(name_list))

ax = plt.axes()

ax.xaxis.set_major_locator(ticker.FixedLocator

((pos_list)))

ax.xaxis.set_major_formatter(ticker.FixedFormatter

((name_list)))

plt.bar(pos_list, value_list, color = '.75',align =

'center')

plt.show()

Python一步一步进行数据分析

4th 部分:

包含了一些复杂图形。

Working with figures

In [4]:
 %matplotlib inline
 import numpy as np
 import matplotlib.pyplot as plt
 In [5]:
 T = np.linspace(-np.pi, np.pi, 1024) #
 fig, (ax0, ax1) = plt.subplots(ncols =2)
 ax0.plot(np.sin(2 * T), np.cos(0.5 * T), c = 'k')
 ax1.plot(np.cos(3 * T), np.sin(T), c = 'k')
 plt.show() 

Python一步一步进行数据分析

Setting aspect ratio

In [7]:
 T = np.linspace(0, 2 * np.pi, 1024)
 plt.plot(2. * np.cos(T), np.sin(T), c = 'k', lw = 3.)
 plt.axes().set_aspect('equal') # remove this line of code and see how the figure looks
 plt.show() 

Python一步一步进行数据分析

In [12]:
 X = np.linspace(-6, 6, 1024)
 Y1, Y2 = np.sinc(X), np.cos(X)
 plt.figure(figsize=(10.24, 2.56)) #sets size of the figure
 plt.plot(X, Y1, c='r', lw = 3.)
 plt.plot(X, Y2, c='.75', lw = 3.)
 plt.show() 

Python一步一步进行数据分析

In [8]:
 X = np.linspace(-6, 6, 1024)
 plt.ylim(-.5, 1.5)
 plt.plot(X, np.sinc(X), c = 'k')
 plt.show() 

Python一步一步进行数据分析

In [16]:
 X = np.linspace(-6, 6, 1024)
 Y = np.sinc(X)
 X_sub = np.linspace(-3, 3, 1024)#coordinates of subplot
 Y_sub = np.sinc(X_sub) # coordinates of sub plot
 plt.plot(X, Y, c = 'b') 
 sub_axes = plt.axes([.6, .6, .25, .25])# coordinates, length and width of the subplot frame
 sub_axes.plot(X_detail, Y_detail, c = 'r')
 plt.show() 

Python一步一步进行数据分析

Log Scale

In [20]:

X = np.linspace(1, 10, 1024)

plt.yscale('log') # set y scale as log. we would use plot.xscale()

plt.plot(X, X, c = 'k', lw = 2., label = r'$f(x)=x$')

plt.plot(X, 10 ** X, c = '.75', ls = '--', lw = 2., label = r'$f(x)=e^x$')

plt.plot(X, np.log(X), c = '.75', lw = 2., label = r'$f(x)=\log(x)$')

plt.legend()

#The logarithm base is 10 by default, but it can be changed with the optional parameters basex and basey.

Python一步一步进行数据分析

Polar Coordinates

In [23]:
 T = np.linspace(0 , 2 * np.pi, 1024)
 plt.axes(polar = True) # show polar coordinates
 plt.plot(T, 1. + .25 * np.sin(16 * T), c= 'k')
 plt.show() 

Python一步一步进行数据分析

In [25]:
 import matplotlib.patches as patches # import patch module from matplotlib
 ax = plt.axes(polar = True)
 theta = np.linspace(0, 2 * np.pi, 8, endpoint = False)
 radius = .25 + .75 * np.random.random(size = len(theta))
 points = np.vstack((theta, radius)).transpose()
 plt.gca().add_patch(patches.Polygon(points, color = '.75'))
 plt.show() 

Python一步一步进行数据分析

In [2]:
 x = np.linspace(-6,6,1024)
 y= np.sin(x)
 plt.plot(x,y)
 plt.savefig('bigdata.png', c= 'y', transparent = True) #savefig function writes that data to a file
 # will create a file named bigdata.png. Its resolution will be 800 x 600 pixels, in 8-bit colors (24-bits per pixel) 

Python一步一步进行数据分析

In [3]:
 theta =np.linspace(0, 2 *np.pi, 8)
 points =np.vstack((np.cos(theta), np.sin(theta))).T
 plt.figure(figsize =(6.0, 6.0))
 plt.gca().add_patch(plt.Polygon(points, color ='r'))
 plt.axis('scaled')
 plt.grid(True)
 plt.savefig('pl.png', dpi =300) # try 'pl.pdf', pl.svg'
 #dpi is dots per inch. 300*8 x 6*300 = 2400 x 1800 pixels 

Python一步一步进行数据分析

总结

你学习Python时能犯的最简单的错误之一就是同时去尝试学习过多的库。当你努力一下子学会每样东西时,你会花费很多时间来切换这些不同概念之间,变得沮丧,最后转移到其他事情上。

所以,坚持关注这个过程:

1.理解Python基础

2.学习Numpy

3.学习Pandas

4.学习Matplolib


以上所述就是小编给大家介绍的《Python一步一步进行数据分析》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

如何不在网上虚度人生

如何不在网上虚度人生

[美] 肯尼思·戈德史密斯 / 刘畅 / 北京联合出版公司 / 2017-9 / 39.80元

我们平时上网多大程度上是浪费时间,多大程度是在学习、关心社会、激发创造力?我们真能彻底断网,逃离社交网络吗? 手机把都市人变成一群电子僵尸,是福是祸? 浏览记录就是我们将来的回忆录吗?文件归档属于一种现代民间艺术? 不自拍、P图、发朋友圈,我还是我吗? 美国知名概念艺术家戈德史密斯认为:上网绝不是浪费时间,而是一种创造性的活动。在本书中他以跨学科角度、散文式语言进行论证,涉及大众传播学、计算......一起来看看 《如何不在网上虚度人生》 这本书的介绍吧!

随机密码生成器
随机密码生成器

多种字符组合密码

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

SHA 加密
SHA 加密

SHA 加密工具