Hadoop 集群基准测试

栏目: 编程工具 · 发布时间: 7年前

内容简介：生产环境中，如何对 Hadoop 集群进行 Benchmark Test？如何进行服务所需的机器选型？如何快速对比出不同集群的性能？本文将通过 Hadoop 自带的 Benchmark 测试程序：TestDFSIO 和 TeraSort，简单介绍如何进行 Hadoop 的读写 & 计算性能的压测。

生产环境中，如何对 Hadoop 集群进行 Benchmark Test？如何进行服务所需的机器选型？如何快速对比出不同集群的性能？

本文将通过 Hadoop 自带的 Benchmark 测试程序：TestDFSIO 和 TeraSort，简单介绍如何进行 Hadoop 的读写 & 计算性能的压测。

回顾上篇文章：认识多队列网卡中断绑定

（本文使用 2.6.0 的 hadoop 版本进行测试，基准测试被打包在测试程序 JAR 文件中，通过无参调用 bin/hadoop jar ./share/hadoop/mapreduce/xxx.jar 可以得到其列表）

使用 TestDFSIO

进行集群的 I/O 性能测试处

TestDFSIO :

org.apache.hadoop.fs.TestDFSIO

TestDFSIO 程序原理：

使用多个 Map Task 模拟多路的并发读写。通过自己的 Mapper class 用来读写数据，生成统计信息；通过自己的 Reduce Class 来收集并汇总各个 Map Task 的统计信息，主要涉及到三个文件: AccumulatingReducer.java, IOMapperBase.java, TestDFSIO.java。

TestDFSIO 大致运行过程：

根据 Map Task 的数量将相应个数的 Control 控制文件写入 HDFS，这些控制文件仅包含一行内容：<数据文件名，数据文件大小> ;
启动 MapReduceJob，IOMapperBase Class 中的 Map 方法将 Control 文件作为输入文件，读取内容，将数据文件名和大小作为参数传递给自定义的 doIO 函数，进行实际的数据读写工作。而后将数据大小和 doIO 执行的时间传递给自定义的 collectStatus 函数，进行统计数据的输出工作 ;
doIO 的实现：TestDFSIO 重载并实现 doIO 函数，将指定大小的数据写入 HDFS 文件系统;
collectStatus 的实现：TestDFSIO 重载并实现 collectStatus 函数，将任务数量，以及数据大小，完成时间等相关数据作为 Map Class 的结果输出;
统计数据用不同的前缀标识，例如 l: (stand for long), s: (stand for string) etc;
执行唯一的一个 Reduce 任务，收集各个 Map Class 的统计数据，使用 AccumulatingReducer 进行汇总统计;
最后当 MapReduceJob 完成以后，调用 analyzeResult 函数读取最终的统计数据并输出到控制台和本地的 Log 文件中;

那么 MR 任务测试集群读写性能是否会因为数据传输影响到结果判断呢？

可以看整个过程中，实际通过 MR 框架进行读写 Shuffle 的只是 Control 文件，数据量非常小，所以 MR 框架本身的数据传输对测试的影响很小，可以忽略不计，测试结果基本是取决于 HDFS 的读写性能的。

了解到原理后，我们将运行 TestDFSIO 进行测试

测试集群版本：hadoop-2.6.0-mdh3.11

测试集群的机器情况：5 个 slave(dn/nm) 节点，每个节点机器为 32 核，128g 内存，12*4THdd 磁盘的物理机。

测试数据：5 个文件，每个文件大小为 1TB。

环境要求：集群保证完全空闲，无其他干扰任务。

1. 写测试：

bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-mdh3.11-jre8-SNAPSHOT.jar TestDFSIO -write -nrFiles 5 -size 1TB
# 查看测试结果

cat TestDFSIO_results.log

----- TestDFSIO ----- : write
           Date & time: Mon Jun 04 16:44:25 CST 2018
       Number of files: 5
Total MBytes processed: 5242880.0
     Throughput mb/sec: 213.10459447844454
Average IO rate mb/sec: 213.11135864257812
 IO rate std deviation: 1.1965074234796487
    Test exec time sec: 4972.91

2. 读测试：

bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-mdh3.11-jre8-SNAPSHOT.jar TestDFSIO -read -nrFiles 5 -size 1TB
# 查看测试结果

cat TestDFSIO_results.log

----- TestDFSIO ----- : read
           Date & time: Mon Jun 04 18:48:48 CST 2018
       Number of files: 5
Total MBytes processed: 5242880.0
     Throughput mb/sec: 164.327389903222
Average IO rate mb/sec: 164.33087158203125
 IO rate std deviation: 0.7560928117328837
    Test exec time sec: 6436.246

以上测试数据解释：

Throughput mb/sec和 Average IO rate mb/sec 是两个最重要的性能衡量指标：Throughput mb/sec 衡量每个 map task 的平均吞吐量，Average IO rate mb/sec 衡量每个文件的平均 IO 速度。

IO rate std deviation：标准差，高标准差表示数据散布在一个大的值域中，这可能意味着群集中某个节点存在性能相关的问题，这可能和硬件或软件有关。

使用 TeraSort

进行集群的计算性能测试

TeraSort: org.apache.hadoop.examples.terasort.TeraSort

TeraSort 程序原理：

对输入文件按 Key 进行全局排序。TeraSort 针对的是大批量的数据，在实现过程中为了保证 Reduce 阶段各个 Reduce Job 的负载平衡，以保证全局运算的速度，TeraSort 对数据进行了预采样分析。

TeraSort 大致运行过程：

从 job 框架上看，为了保证 Reduce 阶段的负载平衡，使用 jobConf.setPartitionerClass 自定义了 Partitioner Class 用来对数据进行分区，在 map 和 reduce 阶段对数据不做额外处理。Job 流程如下：

对数据进行分段采样：例如将输入文件最多分割为 10 段，每段读取最多 100,000 行数据作为样本，统计各个 Key 值出现的频率并对 Key 值使用内建的 QuickSort 进行快速排序（这一步是 JobClient 在单个节点上执行的，采样的运算量不能太大）;
将样本统计结果中位于样本统计平均分段处的 Key 值（例如 n/10 处 n=[1..10]）做为分区的依据以 DistributedCache 的方式写入文件，这样在 MapReduce 阶段的各个节点都能够 Access 这个文件。如果全局数据的 Key 值分布与样本类似的话，这也就代表了全局数据的平均分区的位置;
在 MapReduceJob 执行过程中，自定义的 Partitioner 会读取这个样本统计文件，根据分区边界 Key 值创建一个两级的索引树用来快速定位特定 Key 值对应的分区（这个两级索引树是根据 TeraSort 规定的输入数据的特点定制的，对普通数据不一定具有普遍适用性，比如 Hadoop 内置的 TotalPartitioner 就采用了更通用的二分查找法来定位分区）;

总结：

TeraSort 使用了 Hadoop 默认的 IdentityMapper 和 IdentityReducer。IdentityMapper 和 IdentityReducer 对它们的输入不做任何处理，将输入 k,v 直接输出；也就是说是完全是为了走框架的流程而空跑。这正是 Hadoop 的 TeraSort 的巧妙所在，它没有为排序而实现自己的 mapper 和 reducer，而是完全利用 Hadoop 的 Map Reduce 框架内的机制实现了排序。而也正因为如此，我们可以在集群上利用 TeraSort 来测试 Hadoop。

了解到原理后，我们将运行 TeraSort 进行测试

测试集群版本：hadoop-2.6.0-mdh3.11

测试集群的机器情况：

5 个 slave(dn/nm) 节点，每个节点机器为 32 核，128g 内存，12*4THdd 磁盘的物理机。

测试数据：

hadoop 自带的生成数据工具 TeraGen，输入文件是由一行行 100 字节的记录组成，每行记录包括一个 10 字节的 Key；以 Key 来对记录排序。

环境要求：

集群保证完全空闲，无其他干扰任务。

测试数据生成

按照 SortBenchmark 要求的输入数据规则（需要 gensort 工具生成输入数据）：输入文件是由一行行 100 字节的记录组成，每行记录包括一个 10 字节的 Key；以 Key 来对记录排序。（具体可参考 http://www.ordinal.com/gensort.html）

Hadoop 的 TeraSort 实现的生成数据工具 TeraGen，算法与 gensort 一致，我们将使用 TeraGen 生成测试数据：

（测试数据量为 1T，由于 100 字节一行，则设定行数为 10000000000）

bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-mdh3.11-jre8-SNAPSHOT.jar teragen 10000000000 /terasort/input1TB
   File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=248548
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=173
        HDFS: Number of bytes written=1000000000000
        HDFS: Number of read operations=8
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=4
    Job Counters
        Launched map tasks=2
        Other local map tasks=2
        Total time spent by all maps in occupied slots (ms)=32792925
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=10930975
        Total vcore-seconds taken by all map tasks=10930975
        Total megabyte-seconds taken by all map tasks=8394988800
    Map-Reduce Framework
        Map input records=10000000000
        Map output records=10000000000
        Input split bytes=173
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=193112
        CPU time spent (ms)=14325820
        Physical memory (bytes) snapshot=916639744
        Virtual memory (bytes) snapshot=12308406272
        Total committed heap usage (bytes)=712507392
    HeapUsageGroup
        HeapUsageCounter=30947608
    org.apache.hadoop.examples.terasort.TeraGen$Counters
        CHECKSUM=3028416809717741100
    File Input Format Counters
        Bytes Read=0
    File Output Format Counters
        Bytes Written=1000000000000   
 # 查看生成的数据
   bin/hadoop dfs -ls /terasort/input1TB
   Found 3 items
   -rw-r--r--   3 hdfs_admin supergroup            0 2018-06-05 11:49 /terasort/input1TB/_SUCCESS
   -rw-r--r--   3 hdfs_admin supergroup 500000000000 2018-06-05 11:45 /terasort/input1TB/part-m-00000
   -rw-r--r--   3 hdfs_admin supergroup 500000000000 2018-06-05 11:49 /terasort/input1TB/part-m-00001

运行 TeraSort 测试程序

测试数据生成好后，我们将运行 TeraSort 测试程序：

bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-mdh3.11-jre8-SNAPSHOT.jar terasort /terasort/input1TB /terasort/output1TB
   18/06/06 03:50:08 INFO mapreduce.Job: Counters: 52
    File System Counters
        FILE: Number of bytes read=5189229479006
        FILE: Number of bytes written=6238290771828
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=1000000856980
        HDFS: Number of bytes written=1000000000000
        HDFS: Number of read operations=22359
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters
        Killed map tasks=1
        Launched map tasks=7453
        Launched reduce tasks=1
        Data-local map tasks=4424
        Rack-local map tasks=3029
        Total time spent by all maps in occupied slots (ms)=356530188
        Total time spent by all reduces in occupied slots (ms)=224698152
        Total time spent by all map tasks (ms)=118843396
        Total time spent by all reduce tasks (ms)=56174538
        Total vcore-seconds taken by all map tasks=118843396
        Total vcore-seconds taken by all reduce tasks=56174538
        Total megabyte-seconds taken by all map tasks=91271728128
        Total megabyte-seconds taken by all reduce tasks=57522726912
    Map-Reduce Framework
        Map input records=10000000000
        Map output records=10000000000
        Map output bytes=1020000000000
        Map output materialized bytes=1040000044712
        Input split bytes=856980
        Combine input records=0
        Combine output records=0
        Reduce input groups=10000000000
        Reduce shuffle bytes=1040000044712
        Reduce input records=10000000000
        Reduce output records=10000000000
        Spilled Records=59896435961
        Shuffled Maps =7452
        Failed Shuffles=0
        Merged Map outputs=7452
        GC time elapsed (ms)=14193819
        CPU time spent (ms)=179564830
        Physical memory (bytes) snapshot=3104994074624
        Virtual memory (bytes) snapshot=46362045841408
        Total committed heap usage (bytes)=2586227769344
    HeapUsageGroup
        HeapUsageCounter=896956972576
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters
        Bytes Read=1000000000000
    File Output Format Counters
        Bytes Written=1000000000000
   18/06/06 03:50:08 INFO terasort.TeraSort: done   
  # 查看输出
   bin/hadoop dfs -ls /terasort/output1TB
   Found 3 items
   -rw-r--r--   1 hdfs_admin supergroup             0 2018-06-06 03:50 /terasort/output1TB/_SUCCESS
   -rw-r--r--  10 hdfs_admin supergroup             0 2018-06-05 11:52 /terasort/output1TB/_partition.lst
   -rw-r--r--   1 hdfs_admin supergroup 1000000000000 2018-06-06 03:50 /terasort/output1TB/part-r-00000

通过 Job Counters 等指标我们可以看出整个 TeraSort 的运行情况，可以通过这些数据对比出当前框架的计算性能。

结果的校验：TeraValidate

TeraSort 自带校验程序 TeraValidate，用来检验排序输出结果是否是有序的：

bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-mdh3.11-jre8-SNAPSHOT.jar teravalidate /terasort/output1TB /terasort/validate1TB

如果有错误，log 记录会放在输出目录里。

总结

Hadoop 自带的 Benchmark 测试程序看起来微不足道，如果我们能够多多挖掘，便可发挥更大的价值；既可以用来对集群上线前的测试校验，又可以用来进行集群调优测试，通过举一反三可以用到更多地地方。

参考文献

《Hadoop 权威指南》

Benchmarking and Stress Testing an Hadoop Cluster with TeraSort, TestDFSIO & Co.

Hadoop 集群基准测试

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

小群效应

徐志斌 / 中信出版集团 / 2017-11 / 58.00元

互联网经济时代，新零售、网红经济、知识经济多受益于社群。用户的获取、留存及订单转化直接决定了一个社群的存亡。无论是“做”群还是“用”群，每个人都需要迭代常识：了解用户行为习惯，了解社群运行规律。《社交红利》《即时引爆》作者徐志斌历时两年，挖掘腾讯、百度、豆瓣的一手后台数据，从上百个产品中深度解读社群行为，通过大量生动案例总结出利用社交网络和海量用户进行沟通的方法论。本书将告诉你： ......一起来看看《小群效应》这本书的介绍吧!

码农工具