内容简介:K-Means算法是一种基于距离的聚类算法,采用迭代的方法,计算出K个聚类中心,把若干个点聚成K类。1. 输出4699
K-Means算法是一种基于距离的聚类算法,采用迭代的方法,计算出K个聚类中心,把若干个点聚成K类。
package com.immooc.spark
import org.apache.log4j.{Level, Logger}
import org.apache.spark.mllib.clustering.KMeans
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.{SparkConf, SparkContext}
object KMeansTest {
def main(args:Array[String]): Unit = {
val conf = new SparkConf().setAppName("KMeansTest").setMaster("local[2]")
val sc = new SparkContext(conf)
Logger.getRootLogger.setLevel(Level.WARN)
// 读取样本数据1,格式为LIBSVM format
val data = sc.textFile("file:///Users/walle/Documents/D3/sparkmlib/kmeans_data.txt")
val parsedData = data.map(s => Vectors.dense(s.split(' ').map(_.toDouble))).cache()
// 新建KMeans聚类模型,并训练
val initMode = "k-means||"
val numClusters = 4
val numIterations = 100
val model = new KMeans().
setInitializationMode(initMode).
setK(numClusters).
setMaxIterations(numIterations).
run(parsedData)
val centers = model.clusterCenters
println("centers")
for (i <- 0 to centers.length - 1) {
println(centers(i)(0) + "\t" + centers(i)(1))
}
// 误差计算
val WSSSE = model.computeCost(parsedData)
println("Within Set Sum of Squared Errors = " + WSSSE)
}
}
1. 输出
centers 9.05 9.05 0.05 0.05 9.2 9.2 0.2 0.2 Within Set Sum of Squared Errors = 0.03000000000004321
4699
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
Hacking Growth
Sean Ellis、Morgan Brown / Crown Business / 2017-4-25 / USD 29.00
The definitive playbook by the pioneers of Growth Hacking, one of the hottest business methodologies in Silicon Valley and beyond. It seems hard to believe today, but there was a time when Airbnb w......一起来看看 《Hacking Growth》 这本书的介绍吧!