[机器学习]机器学习笔记整理09- 基于SVM图像识别

内容简介：[机器学习]机器学习笔记整理09- 基于SVM图像识别

前言

前面介绍了SVM的基本概念和一般操作步骤,若如不理解请参考:

[机器学习]机器学习笔记整理08- SVM算法原理及实现下面来介绍一下,利用SVM进行图像识别.

图像识别

人脸识别是一项实用的技术。但是这种技术总是感觉非常神秘，在sklearn中看到了人脸识别的example，代码网址如下：

http://scikit-learn.org/0.13/auto_examples/applications/face_recognition.html#example-applications-face-recognition-py 首先介绍一些PCA和SVM的功能，PCA叫做主元分析，它可以从多元事物中解析出主要影响因素，揭示事物的本质，简化复杂的问题。计算主成分的目的是将高维数据投影到较低维空间。

PCA降维

PCA 主要用于数据降维，对于一系列例子的特征组成的多维向量，多维向量里的某些元素本身没有区分性，比如某个元素在所有的例子中都为1，或者与1差距不大，那么这个元素本身就没有区分性，用它做特征来区分，贡献会非常小。所以我们的目的是找那些变化大的元素，即方差大的那些维，而去除掉那些变化不大的维，从而使特征留下的都是精品，而且计算量也变小了。

SVM叫做支持向量机，之前的博客有所涉及有。SVM方法是通过一个非线性映射p，把样本空间映射到一个高维乃至无穷维的特征空间中，使得在原来的样本空间中非线性可分的问题转化为在特征空间中的线性可分的问题。

实验数据采集

再看看实验采用的数据集，数据集叫做Labeled Faces in the Wild。大约200M左右。整个有10000张图片，5700个人，1700人有两张或以上的照片。相关的网址： http://vis-www.cs.umass.edu/lfw/index.html

具体实现

1.导入模块

from __future__ import print_function

from time import time
import logging
import matplotlib.pyplot as plt

from sklearn.cross_validation import train_test_split
from sklearn.datasets import fetch_lfw_people
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.decomposition import RandomizedPCA
from sklearn.svm import SVC
# 显示进度和错误信息
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')

###############################################################################

lfw_people = fetch_lfw_people(min_faces_per_person=70, resize=0.4)

# 转换为数组
n_samples, h, w = lfw_people.images.shape

# 对于机器学习，我们直接使用2个数据（由于该模型忽略了相对像素位置信息）
X = lfw_people.data
n_features = X.shape[1]

# 预测的标签是该人的身份
y = lfw_people.target
target_names = lfw_people.target_names
n_classes = target_names.shape[0]

print("Total dataset size:")
print("n_samples: %d" % n_samples)
print("n_features: %d" % n_features)
print("n_classes: %d" % n_classes)


###############################################################################
# 分为训练集和使用分层k折的测试集

# 分为培训和测试集
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25)


###############################################################################
# 在面部数据集上计算PCA（特征面）（被视为未标记的数据集）：无监督特征提取/维数降低
n_components = 150

print("Extracting the top %d eigenfaces from %d faces"
      % (n_components, X_train.shape[0]))
t0 = time()
pca = RandomizedPCA(n_components=n_components, whiten=True).fit(X_train)
print("done in %0.3fs" % (time() - t0))

eigenfaces = pca.components_.reshape((n_components, h, w))

print("Projecting the input data on the eigenfaces orthonormal basis")
t0 = time()
X_train_pca = pca.transform(X_train)
X_test_pca = pca.transform(X_test)
print("done in %0.3fs" % (time() - t0))


###############################################################################
# 训练SVM分类模型

print("Fitting the classifier to the training set")
t0 = time()
param_grid = {'C': [1e3, 5e3, 1e4, 5e4, 1e5],
              'gamma': [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.1], }
clf = GridSearchCV(SVC(kernel='rbf', class_weight='auto'), param_grid)
clf = clf.fit(X_train_pca, y_train)
print("done in %0.3fs" % (time() - t0))
print("Best estimator found by grid search:")
print(clf.best_estimator_)


###############################################################################
# 测试集上的模型质量的定量评估

print("Predicting people's names on the test set")
t0 = time()
y_pred = clf.predict(X_test_pca)
print("done in %0.3fs" % (time() - t0))

print(classification_report(y_test, y_pred, target_names=target_names))
print(confusion_matrix(y_test, y_pred, labels=range(n_classes)))


###############################################################################
# 使用matplotlib进行定性评估

def plot_gallery(images, titles, h, w, n_row=3, n_col=4):
    """Helper function to plot a gallery of portraits"""
    plt.figure(figsize=(1.8 * n_col, 2.4 * n_row))
    plt.subplots_adjust(bottom=0, left=.01, right=.99, top=.90, hspace=.35)
    for i in range(n_row * n_col):
        plt.subplot(n_row, n_col, i + 1)
        plt.imshow(images[i].reshape((h, w)), cmap=plt.cm.gray)
        plt.title(titles[i], size=12)
        plt.xticks(())
        plt.yticks(())


# 在测试集的一部分绘制预测结果

def title(y_pred, y_test, target_names, i):
    pred_name = target_names[y_pred[i]].rsplit(' ', 1)[-1]
    true_name = target_names[y_test[i]].rsplit(' ', 1)[-1]
    return 'predicted: %s\ntrue:      %s' % (pred_name, true_name)

prediction_titles = [title(y_pred, y_test, target_names, i)
                     for i in range(y_pred.shape[0])]

plot_gallery(X_test, prediction_titles, h, w)

# 绘制最有意义的特征面的画廊

eigenface_titles = ["eigenface %d" % i for i in range(eigenfaces.shape[0])]
plot_gallery(eigenfaces, eigenface_titles, h, w)

plt.show()

实验结果

[机器学习]机器学习笔记整理09- 基于SVM图像识别

以上所述就是小编给大家介绍的《[机器学习]机器学习笔记整理09- 基于SVM图像识别》，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对码农网的支持！

查看所有标签

猜你喜欢:

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

Python高效开发实战

刘长龙 / 电子工业出版社 / 2016-10 / 89

也许你听说过全栈工程师，他们善于设计系统架构，精通数据库建模、通用网络协议、后端并发处理、前端界面设计，在学术研究或工程项目上能独当一面。通过对Python及其周边Web框架的学习和实践，你就可以成为这样的全能型人才。《Python高效开发实战——Django、Tornado、Flask、Twisted》分为3部分：第1部分是基础篇，带领初学者实践Python开发环境和掌握基本语法，同时对......一起来看看《Python高效开发实战》这本书的介绍吧!

码农工具