您现在的位置 >> Hadoop教程 >> Hadoop实战 >> hadoop专题  
 

Ganglia监控Hadoop及Hbase集群性能(安装配置)

【作者:Hadoop实战专家】【关键词:安装 配置 节点 】 【点击:21140次】【2013-11-2】
1、如何安装Ganglia监控Hadoop及Hbase集群?在主节点上还需要配置/etc/ganglia/gmetad.conf,这里面的名字hadoop-cluster和上面gmond.conf中name应该一致。 # we are the "authority" on data source feeds. # The port gmetad will answer queries for XML.  

相关热门搜索:

大数据标签:hadoop yarn hbase hive bigdata

问题导读:
1、如何安装Ganglia监控Hadoop及Hbase集群?

2、它的效能如何?

1. 在主节点上安装ganglia-webfrontend和ganglia-monitor

1. sudo apt-get install ganglia-webfrontend ganglia-monitor

复制代码
在主节点上安装ganglia-webfrontend和ganglia-monitor。在其他监视节点上,只需要安装ganglia-monitor即可
将ganglia的文件链接到apache的默认目录下

1. sudo ln -s /usr/share/ganglia-webfrontend /var/www/ganglia

复制代码

2. 安装ganglia-monitor
在其他监视节点上,只需要安装ganglia-monitor

1. sudo apt-get install ganglia-monitor

复制代码

3. Ganglia配置
gmond.conf
在每个节点上都需要配置/etc/ganglia/gmond.conf,配置相同如下所示

1. sudo vim /etc/ganglia/gmond.conf

复制代码

修改后的/etc/ganglia/gmond.conf

1. globals {

2.   daemonize = yes  ##以后台的方式运行

3.   setuid = yes

4.   user = ganglia     #运行Ganglia的用户

5.   debug_level = 0

6.   max_udp_msg_len = 1472

7.   mute = no

8.   deaf = no

9.   host_dmax = 0 /*secs */

10.   cleanup_threshold = 300 /*secs */

11.   gexec = no

12.   send_metadata_interval = 10     #发送数据的时间间隔

13. }

14.

15. /* If a cluster attribute is specified, then all gmond hosts are wrapped inside

16. * of a  tag.  If you do not specify a cluster tag, then all  will

17. * NOT be wrapped inside of a  tag. */

18. cluster {

19.   name = "hadoop-cluster"         #集群名称

20.   owner = "ganglia"               #运行Ganglia的用户

21.   latlong = "unspecified"

22.   url = "unspecified"

23. }

24.

25. /* The host section describes attributes of the host, like the location */

26. host {

27.   location = "unspecified"

28. }

29.

30. /* Feel free to specify as many udp_send_channels as you like.  Gmond

31.    used to only support having a single channel */

32. udp_send_channel {

33.   #mcast_join = 239.2.11.71     #注释掉组播

34.   host = master                 #发送给安装gmetad的机器

35.   port = 8649                   #监听端口

36.   ttl = 1

37. }

38.

39. /* You can specify as many udp_recv_channels as you like as well. */

40. udp_recv_channel {

41.   #mcast_join = 239.2.11.71     #注释掉组播

42.   port = 8649

43.   #bind = 239.2.11.71

44. }

45.

46. /* You can specify as many tcp_accept_channels as you like to share

47.    an xml description of the state of the cluster */

48. tcp_accept_channel {

49.   port = 8649

50. }

复制代码

gmetad.conf
在主节点上还需要配置/etc/ganglia/gmetad.conf,这里面的名字hadoop-cluster和上面gmond.conf中name应该一致。 
/etc/ganglia/gmetad.conf

1. sudo vim /etc/ganglia/gmetad.conf

复制代码
修改为以下内容

1. data_source "hadoop-cluster" 10 master:8649 slave:8649

2. setuid_username "nobody"

3. rrd_rootdir "/var/lib/ganglia/rrds"

4. gridname "hadoop-cluster"

5. 注:master:8649 slave:8649为要监听的主机和端口,data_source中hadoop-cluster与gmond.conf中name一致

复制代码

4. Hadoop配置
在所有hadoop所在的节点,均需要配置hadoop-metrics2.properties,配置如下:

1. #   Licensed to the Apache Software Foundation (ASF) under one or more

2. #   contributor license agreements.  See the NOTICE file distributed with

3. #   this work for additional information regarding copyright ownership.

4. #   The ASF licenses this file to You under the Apache License, Version 2.0

5. #   (the "License"); you may not use this file except in compliance with

6. #   the License.  You may obtain a copy of the License at

7. #

8. #       http://www.apache.org/licenses/LICENSE-2.0

9. #

10. #   Unless required by applicable law or agreed to in writing, software

11. #   distributed under the License is distributed on an "AS IS" BASIS,

12. #   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

13. #   See the License for the specific language governing permissions and

14. #   limitations under the License.

15. #

16.

17. # syntax: [prefix].[source|sink].[instance].[options]

18. # See javadoc of package-info.java for org.apache.hadoop.metrics2 for details

19.

20. #注释掉以前原有配置

21.

22. #*.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink

23. # default sampling period, in seconds

24. #*.period=10

25.

26. # The namenode-metrics.out will contain metrics from all context

27. #namenode.sink.file.filename=namenode-metrics.out

28. # Specifying a special sampling period for namenode:

29. #namenode.sink.*.period=8

30.

31. #datanode.sink.file.filename=datanode-metrics.out

32.

33. # the following example split metrics of different

34. # context to different sinks (in this case files)

35. #jobtracker.sink.file_jvm.context=jvm

36. #jobtracker.sink.file_jvm.filename=jobtracker-jvm-metrics.out

37. #jobtracker.sink.file_mapred.context=mapred

38. #jobtracker.sink.file_mapred.filename=jobtracker-mapred-metrics.out

39.

40. #tasktracker.sink.file.filename=tasktracker-metrics.out

41.

42. #maptask.sink.file.filename=maptask-metrics.out

43.

44. #reducetask.sink.file.filename=reducetask-metrics.out

45.

46. *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31

47. *.sink.ganglia.period=10

48.

49. *.sink.ganglia.slope=jvm.metrics.gcCount=zero,jvm.metrics.memHeapUsedM=both

50. *.sink.ganglia.dmax=jvm.metrics.threadsBlocked=70,jvm.metrics.memHeapUsedM=40

51.

52. namenode.sink.ganglia.servers=master:8649

53. resourcemanager.sink.ganglia.servers=master:8649

54.

55. datanode.sink.ganglia.servers=master:8649

56. nodemanager.sink.ganglia.servers=master:8649

57.

58.

59. maptask.sink.ganglia.servers=master:8649

60. reducetask.sink.ganglia.servers=master:8649

复制代码

5. Hbase配置
在所有的hbase节点中均配置hadoop-metrics2-hbase.properties,配置如下:

1. # syntax: [prefix].[source|sink].[instance].[options]

2. # See javadoc of package-info.java for org.apache.hadoop.metrics2 for details

3.

4. #*.sink.file*.class=org.apache.hadoop.metrics2.sink.FileSink

5. # default sampling period

6. #*.period=10

7.

8. # Below are some examples of sinks that could be used

9. # to monitor different hbase daemons.

10.

11. # hbase.sink.file-all.class=org.apache.hadoop.metrics2.sink.FileSink

12. # hbase.sink.file-all.filename=all.metrics

13.

14. # hbase.sink.file0.class=org.apache.hadoop.metrics2.sink.FileSink

15. # hbase.sink.file0.context=hmaster

16. # hbase.sink.file0.filename=master.metrics

17.

18. # hbase.sink.file1.class=org.apache.hadoop.metrics2.sink.FileSink

19. # hbase.sink.file1.context=thrift-one

20. # hbase.sink.file1.filename=thrift-one.metrics

21.

22. # hbase.sink.file2.class=org.apache.hadoop.metrics2.sink.FileSink

23. # hbase.sink.file2.context=thrift-two

24. # hbase.sink.file2.filename=thrift-one.metrics

25.

26. # hbase.sink.file3.class=org.apache.hadoop.metrics2.sink.FileSink

27. # hbase.sink.file3.context=rest

28. # hbase.sink.file3.filename=rest.metrics

29.

30.

31. *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31

32. *.sink.ganglia.period=10

33.

34. hbase.sink.ganglia.period=10

35. hbase.sink.ganglia.servers=master:8649

复制代码

6. 启动hadoop、hbase集群

1. start-dfs.sh

2. start-yarn.sh

3. start-hbase.sh

复制代码

7. 启动Ganglia
先需要重启hadoop和hbase 。在各个节点上启动gmond服务,主节点还需要启动gmetad服务。
使用apt-get方式安装的Ganglia,可以直接用service方式启动。

1. sudo service ganglia-monitor start(每台机器都需要启动)

2.

3. sudo service gmetad start(在安装了ganglia-webfrontend的机器上启动)

复制代码

8. 检验
登录浏览器查看:http://master/ganglia,如果Hosts up为9即表示安装成功。
若安装不成功,有几个很有用的调试命令:
以调试模式启动gmetad:gmetad -d 9
查看gmetad收集到的XML文件:telnet master 8649

9. 截图

444444.png (182.4 KB, 下载次数: 0)

  

2014-6-15 07:49 上传

333333333.png (132.52 KB, 下载次数: 0)

  

2014-6-15 07:49 上传

master节点gmetad.conf配置

1. # This is an example of a Ganglia Meta Daemon configuration file

2. #                http://ganglia.sourceforge.net/

3. #

4. #

5. #-------------------------------------------------------------------------------

6. # Setting the debug_level to 1 will keep daemon in the forground and

7. # show only error messages. Setting this value higher than 1 will make

8. # gmetad output debugging information and stay in the foreground.

9. # default: 0

10. # debug_level 10

11. #

12. #-------------------------------------------------------------------------------

13. # What to monitor. The most important section of this file.

14. #

15. # The data_source tag specifies either a cluster or a grid to

16. # monitor. If we detect the source is a cluster, we will maintain a complete

17. # set of RRD databases for it, which can be used to create historical

18. # graphs of the metrics. If the source is a grid (it comes from another gmetad),

19. # we will only maintain summary RRDs for it.

20. #

21. # Format:

22. # data_source "my cluster" [polling interval] address1:port addreses2:port ...

23. #

24. # The keyword 'data_source' must immediately be followed by a unique

25. # string which identifies the source, then an optional polling interval in

26. # seconds. The source will be polled at this interval on average.

27. # If the polling interval is omitted, 15sec is asssumed.

28. #

29. # If you choose to set the polling interval to something other than the default,

30. # note that the web frontend determines a host as down if its TN value is less

31. # than 4 * TMAX (20sec by default).  Therefore, if you set the polling interval

32. # to something around or greater than 80sec, this will cause the frontend to

33. # incorrectly display hosts as down even though they are not.

34. #

35. # A list of machines which service the data source follows, in the

36. # format ip:port, or name:port. If a port is not specified then 8649

37. # (the default gmond port) is assumed.

38. # default: There is no default value

39. #

40. # data_source "my cluster" 10 localhost  my.machine.edu:8649  1.2.3.5:8655

41. # data_source "my grid" 50 1.3.4.7:8655 grid.org:8651 grid-backup.org:8651

42. # data_source "another source" 1.3.4.7:8655  1.3.4.8

43.

44. data_source "hadoop-cluster" 10 master:8649 slave:8649

45. setuid_username "nobody"

46. rrd_rootdir "/var/lib/ganglia/rrds"

47. gridname "hadoop-cluster"

48.

49. #

50. # Round-Robin Archives

51. # You can specify custom Round-Robin archives here (defaults are listed below)

52. #

53. # Old Default RRA: Keep 1 hour of metrics at 15 second resolution. 1 day at 6 minute

54. # RRAs "RRA:AVERAGE:0.5:1:244" "RRA:AVERAGE:0.5:24:244" "RRA:AVERAGE:0.5:168:244" "RRA:AVERAGE:0.5:672:244" \

55. #      "RRA:AVERAGE:0.5:5760:374"

56. # New Default RRA

57. # Keep 5856 data points at 15 second resolution assuming 15 second (default) polling. That's 1 day

58. # Two weeks of data points at 1 minute resolution (average)

59. #RRAs "RRA:AVERAGE:0.5:1:5856" "RRA:AVERAGE:0.5:4:20160" "RRA:AVERAGE:0.5:40:52704"

60.

61. #

62. #-------------------------------------------------------------------------------

63. # Scalability mode. If on, we summarize over downstream grids, and respect

64. # authority tags. If off, we take on 2.5.0-era behavior: we do not wrap our output

65. # in  tags, we ignore all  tags we see, and always assume

66. # we are the "authority" on data source feeds. This approach does not scale to

67. # large groups of clusters, but is provided for backwards compatibility.

68. # default: on

69. # scalable off

70. #

71. #-------------------------------------------------------------------------------

72. # The name of this Grid. All the data sources above will be wrapped in a GRID

73. # tag with this name.

74. # default: unspecified

75. # gridname "MyGrid"

76. #

77. #-------------------------------------------------------------------------------

78. # The authority URL for this grid. Used by other gmetads to locate graphs

79. # for our data sources. Generally points to a ganglia/

80. # website on this machine.

81. # default: "http://hostname/ganglia/",

82. #   where hostname is the name of this machine, as defined by gethostname().

83. # authority "http://mycluster.org/newprefix/"

84. #

85. #-------------------------------------------------------------------------------

86. # List of machines this gmetad will share XML with. Localhost

87. # is always trusted.

88. # default: There is no default value

89. # trusted_hosts 127.0.0.1 169.229.50.165 my.gmetad.org

90. #

91. #-------------------------------------------------------------------------------

92. # If you want any host which connects to the gmetad XML to receive

93. # data, then set this value to "on"

94. # default: off

95. # all_trusted on

96. #

97. #-------------------------------------------------------------------------------

98. # If you don't want gmetad to setuid then set this to off

99. # default: on

100. # setuid off

101. #

102. #-------------------------------------------------------------------------------

103. # User gmetad will setuid to (defaults to "nobody")

104. # default: "nobody"

105. # setuid_username "nobody"

106. #

107. #-------------------------------------------------------------------------------

108. # Umask to apply to created rrd files and grid directory structure

109. # default: 0 (files are public)

110. # umask 022

111. #

112. #-------------------------------------------------------------------------------

113. # The port gmetad will answer requests for XML

114. # default: 8651

115. # xml_port 8651

116. #

117. #-------------------------------------------------------------------------------

118. # The port gmetad will answer queries for XML. This facility allows

119. # simple subtree and summation views of the XML tree.

120. # default: 8652

121. # interactive_port 8652

122. #

123. #-------------------------------------------------------------------------------

124. # The number of threads answering XML requests

125. # default: 4

126. # server_threads 10

127. #

128. #-------------------------------------------------------------------------------

129. # Where gmetad stores its round-robin databases

130. # default: "/var/lib/ganglia/rrds"

131. # rrd_rootdir "/some/other/place"

132. #

133. #-------------------------------------------------------------------------------

134. # In earlier versions of gmetad, hostnames were handled in a case

135. # sensitive manner

136. # If your hostname directories have been renamed to lower case,

137. # set this option to 0 to disable backward compatibility.

138. # From version 3.2, backwards compatibility will be disabled by default.

139. # default: 1   (for gmetad < 3.2)

140. # default: 0   (for gmetad >= 3.2)

141. case_sensitive_hostnames 0

142.

143. #-------------------------------------------------------------------------------

144. # It is now possible to export all the metrics collected by gmetad directly to

145. # graphite by setting the following attributes.

146. #

147. # The hostname or IP address of the Graphite server

148. # default: unspecified

149. # carbon_server "my.graphite.box"

150. #

151. # The port on which Graphite is listening

152. # default: 2003

153. # carbon_port 2003

154. #

155. # A prefix to prepend to the metric names exported by gmetad. Graphite uses dot-

156. # separated paths to organize and refer to metrics.

157. # default: unspecified

158. # graphite_prefix "datacenter1.gmetad"

159. #

160. # Number of milliseconds gmetad will wait for a response from the graphite server

161. # default: 500

162. # carbon_timeout 500

163. #

164. master-gmond.conf.md Raw

复制代码

master节点gmond.conf配置

1. /* This configuration is as close to 2.5.x default behavior as possible

2.    The values closely match ./gmond/metric.h definitions in 2.5.x */

3. globals {

4.   daemonize = yes

5.   setuid = yes

6.   user = ganglia

7.   debug_level = 0

8.   max_udp_msg_len = 1472

9.   mute = no

10.   deaf = no

11.   host_dmax = 0 /*secs */

12.   cleanup_threshold = 300 /*secs */

13.   gexec = no

14.   send_metadata_interval = 10

15. }

16.

17. /* If a cluster attribute is specified, then all gmond hosts are wrapped inside

18. * of a  tag.  If you do not specify a cluster tag, then all  will

19. * NOT be wrapped inside of a  tag. */

20. cluster {

21.   name = "hadoop-cluster"

22.   owner = "ganglia"

23.   latlong = "unspecified"

24.   url = "unspecified"

25. }

26.

27. /* The host section describes attributes of the host, like the location */

28. host {

29.   location = "unspecified"

30. }

31.

32. /* Feel free to specify as many udp_send_channels as you like.  Gmond

33.    used to only support having a single channel */

34. udp_send_channel {

35.   #mcast_join = 239.2.11.71

36.   host = master

37.   port = 8649

38.   ttl = 1

39. }

40.

41. /* You can specify as many udp_recv_channels as you like as well. */

42. udp_recv_channel {

43.   #mcast_join = 239.2.11.71

44.   port = 8649

45.   #bind = 239.2.11.71

46. }

47.

48. /* You can specify as many tcp_accept_channels as you like to share

49.    an xml description of the state of the cluster */

50. tcp_accept_channel {

51.   port = 8649

52. }

53.

54. /* Each metrics module that is referenced by gmond must be specified and

55.    loaded. If the module has been statically linked with gmond, it does not

56.    require a load path. However all dynamically loadable modules must include

57.    a load path. */

58. modules {

59.   module {

60.     name = "core_metrics"

61.   }

62.   module {

63.     name = "cpu_module"

64.     path = "/usr/lib/ganglia/modcpu.so"

65.   }

66.   module {

67.     name = "disk_module"

68.     path = "/usr/lib/ganglia/moddisk.so"

69.   }

70.   module {

71.     name = "load_module"

72.     path = "/usr/lib/ganglia/modload.so"

73.   }

74.   module {

75.     name = "mem_module"

76.     path = "/usr/lib/ganglia/modmem.so"

77.   }

78.   module {

79.     name = "net_module"

80.     path = "/usr/lib/ganglia/modnet.so"

81.   }

82.   module {

83.     name = "proc_module"

84.     path = "/usr/lib/ganglia/modproc.so"

85.   }

86.   module {

87.     name = "sys_module"

88.     path = "/usr/lib/ganglia/modsys.so"

89.   }

90. }

91.

92. include ('/etc/ganglia/conf.d/*.conf')

93.

94.

95. /* The old internal 2.5.x metric array has been replaced by the following

96.    collection_group directives.  What follows is the default behavior for

97.    collecting and sending metrics that is as close to 2.5.x behavior as

98.    possible. */

99.

100. /* This collection group will cause a heartbeat (or beacon) to be sent every

101.    20 seconds.  In the heartbeat is the GMOND_STARTED data which expresses

102.    the age of the running gmond. */

103. collection_group {

104.   collect_once = yes

105.   time_threshold = 20

106.   metric {

107.     name = "heartbeat"

108.   }

109. }

110.

111. /* This collection group will send general info about this host every 1200 secs.

112.    This information doesn't change between reboots and is only collected once. */

113. collection_group {

114.   collect_once = yes

115.   time_threshold = 1200

116.   metric {

117.     name = "cpu_num"

118.     title = "CPU Count"

119.   }

120.   metric {

121.     name = "cpu_speed"

122.     title = "CPU Speed"

123.   }

124.   metric {

125.     name = "mem_total"

126.     title = "Memory Total"

127.   }

128.   /* Should this be here? Swap can be added/removed between reboots. */

129.   metric {

130.     name = "swap_total"

131.     title = "Swap Space Total"

132.   }

133.   metric {

134.     name = "boottime"

135.     title = "Last Boot Time"

136.   }

137.   metric {

138.     name = "machine_type"

139.     title = "Machine Type"

140.   }

141.   metric {

142.     name = "os_name"

143.     title = "Operating System"

144.   }

145.   metric {

146.     name = "os_release"

147.     title = "Operating System Release"

148.   }

149.   metric {

150.     name = "location"

151.     title = "Location"

152.   }

153. }

154.

155. /* This collection group will send the status of gexecd for this host every 300 secs */

156. /* Unlike 2.5.x the default behavior is to report gexecd OFF.  */

157. collection_group {

158.   collect_once = yes

159.   time_threshold = 300

160.   metric {

161.     name = "gexec"

162.     title = "Gexec Status"

163.   }

164. }

165.

166. /* This collection group will collect the CPU status info every 20 secs.

167.    The time threshold is set to 90 seconds.  In honesty, this time_threshold could be

168.    set significantly higher to reduce unneccessary network chatter. */

169. collection_group {

170.   collect_every = 20

171.   time_threshold = 90

172.   /* CPU status */

173.   metric {

174.     name = "cpu_user"

175.     value_threshold = "1.0"

176.     title = "CPU User"

177.   }

178.   metric {

179.     name = "cpu_system"

180.     value_threshold = "1.0"

181.     title = "CPU System"

182.   }

183.   metric {

184.     name = "cpu_idle"

185.     value_threshold = "5.0"

186.     title = "CPU Idle"

187.   }

188.   metric {

189.     name = "cpu_nice"

190.     value_threshold = "1.0"

191.     title = "CPU Nice"

192.   }

193.   metric {

194.     name = "cpu_aidle"

195.     value_threshold = "5.0"

196.     title = "CPU aidle"

197.   }

198.   metric {

199.     name = "cpu_wio"

200.     value_threshold = "1.0"

201.     title = "CPU wio"

202.   }

203.   /* The next two metrics are optional if you want more detail...

204.      ... since they are accounted for in cpu_system.

205.   metric {

206.     name = "cpu_intr"

207.     value_threshold = "1.0"

208.     title = "CPU intr"

209.   }

210.   metric {

211.     name = "cpu_sintr"

212.     value_threshold = "1.0"

213.     title = "CPU sintr"

214.   }

215.   */

216. }

217.

218. collection_group {

219.   collect_every = 20

220.   time_threshold = 90

221.   /* Load Averages */

222.   metric {

223.     name = "load_one"

224.     value_threshold = "1.0"

225.     title = "One Minute Load Average"

226.   }

227.   metric {

228.     name = "load_five"

229.     value_threshold = "1.0"

230.     title = "Five Minute Load Average"

231.   }

232.   metric {

233.     name = "load_fifteen"

234.     value_threshold = "1.0"

235.     title = "Fifteen Minute Load Average"

236.   }

237. }

238.

239. /* This group collects the number of running and total processes */

240. collection_group {

241.   collect_every = 80

242.   time_threshold = 950

243.   metric {

244.     name = "proc_run"

245.     value_threshold = "1.0"

246.     title = "Total Running Processes"

247.   }

248.   metric {

249.     name = "proc_total"

250.     value_threshold = "1.0"

251.     title = "Total Processes"

252.   }

253. }

254.

255. /* This collection group grabs the volatile memory metrics every 40 secs and

256.    sends them at least every 180 secs.  This time_threshold can be increased

257.    significantly to reduce unneeded network traffic. */

258. collection_group {

259.   collect_every = 40

260.   time_threshold = 180

261.   metric {

262.     name = "mem_free"

263.     value_threshold = "1024.0"

264.     title = "Free Memory"

265.   }

266.   metric {

267.     name = "mem_shared"

268.     value_threshold = "1024.0"

269.     title = "Shared Memory"

270.   }

271.   metric {

272.     name = "mem_buffers"

273.     value_threshold = "1024.0"

274.     title = "Memory Buffers"

275.   }

276.   metric {

277.     name = "mem_cached"

278.     value_threshold = "1024.0"

279.     title = "Cached Memory"

280.   }

281.   metric {

282.     name = "swap_free"

283.     value_threshold = "1024.0"

284.     title = "Free Swap Space"

285.   }

286. }

287.

288. collection_group {

289.   collect_every = 40

290.   time_threshold = 300

291.   metric {

292.     name = "bytes_out"

293.     value_threshold = 4096

294.     title = "Bytes Sent"

295.   }

296.   metric {

297.     name = "bytes_in"

298.     value_threshold = 4096

299.     title = "Bytes Received"

300.   }

301.   metric {

302.     name = "pkts_in"

303.     value_threshold = 256

304.     title = "Packets Received"

305.   }

306.   metric {

307.     name = "pkts_out"

308.     value_threshold = 256

309.     title = "Packets Sent"

310.   }

311. }

312.

313. /* Different than 2.5.x default since the old config made no sense */

314. collection_group {

315.   collect_every = 1800

316.   time_threshold = 3600

317.   metric {

318.     name = "disk_total"

319.     value_threshold = 1.0

320.     title = "Total Disk Space"

321.   }

322. }

323.

324. collection_group {

325.   collect_every = 40

326.   time_threshold = 180

327.   metric {

328.     name = "disk_free"

329.     value_threshold = 1.0

330.     title = "Disk Space Available"

331.   }

332.   metric {

333.     name = "part_max_used"

334.     value_threshold = 1.0

335.     title = "Maximum Disk Space Used"

336.   }

337. }

338. master-hadoop-metrics2-hbase.properties.md Raw

339. master节点hadoop-metrics2-hbase.properties配置

340.

341. # syntax: [prefix].[source|sink].[instance].[options]

342. # See javadoc of package-info.java for org.apache.hadoop.metrics2 for details

343.

344. #*.sink.file*.class=org.apache.hadoop.metrics2.sink.FileSink

345. # default sampling period

346. #*.period=10

347.

348. # Below are some examples of sinks that could be used

349. # to monitor different hbase daemons.

350.

351. # hbase.sink.file-all.class=org.apache.hadoop.metrics2.sink.FileSink

352. # hbase.sink.file-all.filename=all.metrics

353.

354. # hbase.sink.file0.class=org.apache.hadoop.metrics2.sink.FileSink

355. # hbase.sink.file0.context=hmaster

356. # hbase.sink.file0.filename=master.metrics

357.

358. # hbase.sink.file1.class=org.apache.hadoop.metrics2.sink.FileSink

359. # hbase.sink.file1.context=thrift-one

360. # hbase.sink.file1.filename=thrift-one.metrics

361.

362. # hbase.sink.file2.class=org.apache.hadoop.metrics2.sink.FileSink

363. # hbase.sink.file2.context=thrift-two

364. # hbase.sink.file2.filename=thrift-one.metrics

365.

366. # hbase.sink.file3.class=org.apache.hadoop.metrics2.sink.FileSink

367. # hbase.sink.file3.context=rest

368. # hbase.sink.file3.filename=rest.metrics

369.

370.

371. *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31

372. *.sink.ganglia.period=10

373.

374. hbase.sink.ganglia.period=10

375. hbase.sink.ganglia.servers=master:8649

376. master-hadoop-metrics2.properties.md Raw

377. master节点hadoop-metrics2.properties配置

378.

379. #

380. #   Licensed to the Apache Software Foundation (ASF) under one or more

381. #   contributor license agreements.  See the NOTICE file distributed with

382. #   this work for additional information regarding copyright ownership.

383. #   The ASF licenses this file to You under the Apache License, Version 2.0

384. #   (the "License"); you may not use this file except in compliance with

385. #   the License.  You may obtain a copy of the License at

386. #

387. #       http://www.apache.org/licenses/LICENSE-2.0

388. #

389. #   Unless required by applicable law or agreed to in writing, software

390. #   distributed under the License is distributed on an "AS IS" BASIS,

391. #   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

392. #   See the License for the specific language governing permissions and

393. #   limitations under the License.

394. #

395.

396. # syntax: [prefix].[source|sink].[instance].[options]

397. # See javadoc of package-info.java for org.apache.hadoop.metrics2 for details

398.

399. #*.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink

400. # default sampling period, in seconds

401. #*.period=10

402.

403. # The namenode-metrics.out will contain metrics from all context

404. #namenode.sink.file.filename=namenode-metrics.out

405. # Specifying a special sampling period for namenode:

406. #namenode.sink.*.period=8

407.

408. #datanode.sink.file.filename=datanode-metrics.out

409.

410. # the following example split metrics of different

411. # context to different sinks (in this case files)

412. #jobtracker.sink.file_jvm.context=jvm

413. #jobtracker.sink.file_jvm.filename=jobtracker-jvm-metrics.out

414. #jobtracker.sink.file_mapred.context=mapred

415. #jobtracker.sink.file_mapred.filename=jobtracker-mapred-metrics.out

416.

417. #tasktracker.sink.file.filename=tasktracker-metrics.out

418.

419. #maptask.sink.file.filename=maptask-metrics.out

420.

421. #reducetask.sink.file.filename=reducetask-metrics.out

422.

423. *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31

424. *.sink.ganglia.period=10

425.

426. *.sink.ganglia.slope=jvm.metrics.gcCount=zero,jvm.metrics.memHeapUsedM=both

427. *.sink.ganglia.dmax=jvm.metrics.threadsBlocked=70,jvm.metrics.memHeapUsedM=40

428.

429. namenode.sink.ganglia.servers=master:8649

430. resourcemanager.sink.ganglia.servers=master:8649

431.

432. datanode.sink.ganglia.servers=master:8649

433. nodemanager.sink.ganglia.servers=master:8649

434.

435.

436. maptask.sink.ganglia.servers=master:8649

437. reducetask.sink.ganglia.servers=master:8649

438. slave-gmond.conf.md Raw

439. slave节点gmond.conf配置

440.

441. /* This configuration is as close to 2.5.x default behavior as possible

442.    The values closely match ./gmond/metric.h definitions in 2.5.x */

443. globals {

444.   daemonize = yes

445.   setuid = yes

446.   user = ganglia

447.   debug_level = 0

448.   max_udp_msg_len = 1472

449.   mute = no

450.   deaf = no

451.   host_dmax = 0 /*secs */

452.   cleanup_threshold = 300 /*secs */

453.   gexec = no

454.   send_metadata_interval = 10

455. }

456.

457. /* If a cluster attribute is specified, then all gmond hosts are wrapped inside

458. * of a  tag.  If you do not specify a cluster tag, then all  will

459. * NOT be wrapped inside of a  tag. */

460. cluster {

461.   name = "hadoop-cluster"

462.   owner = "ganglia"

463.   latlong = "unspecified"

464.   url = "unspecified"

465. }

466.

467. /* The host section describes attributes of the host, like the location */

468. host {

469.   location = "unspecified"

470. }

471.

472. /* Feel free to specify as many udp_send_channels as you like.  Gmond

473.    used to only support having a single channel */

474. udp_send_channel {

475.   #mcast_join = 239.2.11.71

476.   host = master

477.   port = 8649

478.   ttl = 1

479. }

480.

481. /* You can specify as many udp_recv_channels as you like as well. */

482. udp_recv_channel {

483.   #mcast_join = 239.2.11.71

484.   port = 8649

485.   #bind = 239.2.11.71

486. }

487.

488. /* You can specify as many tcp_accept_channels as you like to share

489.    an xml description of the state of the cluster */

490. tcp_accept_channel {

491.   port = 8649

492. }

493.

494. /* Each metrics module that is referenced by gmond must be specified and

495.    loaded. If the module has been statically linked with gmond, it does not

496.    require a load path. However all dynamically loadable modules must include

497.    a load path. */

498. modules {

499.   module {

500.     name = "core_metrics"

501.   }

502.   module {

503.     name = "cpu_module"

504.     path = "/usr/lib/ganglia/modcpu.so"

505.   }

506.   module {

507.     name = "disk_module"

508.     path = "/usr/lib/ganglia/moddisk.so"

509.   }

510.   module {

511.     name = "load_module"

512.     path = "/usr/lib/ganglia/modload.so"

513.   }

514.   module {

515.     name = "mem_module"

516.     path = "/usr/lib/ganglia/modmem.so"

517.   }

518.   module {

519.     name = "net_module"

520.     path = "/usr/lib/ganglia/modnet.so"

521.   }

522.   module {

523.     name = "proc_module"

524.     path = "/usr/lib/ganglia/modproc.so"

525.   }

526.   module {

527.     name = "sys_module"

528.     path = "/usr/lib/ganglia/modsys.so"

529.   }

530. }

531.

532. include ('/etc/ganglia/conf.d/*.conf')

533.

534.

535. /* The old internal 2.5.x metric array has been replaced by the following

536.    collection_group directives.  What follows is the default behavior for

537.    collecting and sending metrics that is as close to 2.5.x behavior as

538.    possible. */

539.

540. /* This collection group will cause a heartbeat (or beacon) to be sent every

541.    20 seconds.  In the heartbeat is the GMOND_STARTED data which expresses

542.    the age of the running gmond. */

543. collection_group {

544.   collect_once = yes

545.   time_threshold = 20

546.   metric {

547.     name = "heartbeat"

548.   }

549. }

550.

551. /* This collection group will send general info about this host every 1200 secs.

552.    This information doesn't change between reboots and is only collected once. */

553. collection_group {

554.   collect_once = yes

555.   time_threshold = 1200

556.   metric {

557.     name = "cpu_num"

558.     title = "CPU Count"

559.   }

560.   metric {

561.     name = "cpu_speed"

562.     title = "CPU Speed"

563.   }

564.   metric {

565.     name = "mem_total"

566.     title = "Memory Total"

567.   }

568.   /* Should this be here? Swap can be added/removed between reboots. */

569.   metric {

570.     name = "swap_total"

571.     title = "Swap Space Total"

572.   }

573.   metric {

574.     name = "boottime"

575.     title = "Last Boot Time"

576.   }

577.   metric {

578.     name = "machine_type"

579.     title = "Machine Type"

580.   }

581.   metric {

582.     name = "os_name"

583.     title = "Operating System"

584.   }

585.   metric {

586.     name = "os_release"

587.     title = "Operating System Release"

588.   }

589.   metric {

590.     name = "location"

591.     title = "Location"

592.   }

593. }

594.

595. /* This collection group will send the status of gexecd for this host every 300 secs */

596. /* Unlike 2.5.x the default behavior is to report gexecd OFF.  */

597. collection_group {

598.   collect_once = yes

599.   time_threshold = 300

600.   metric {

601.     name = "gexec"

602.     title = "Gexec Status"

603.   }

604. }

605.

606. /* This collection group will collect the CPU status info every 20 secs.

607.    The time threshold is set to 90 seconds.  In honesty, this time_threshold could be

608.    set significantly higher to reduce unneccessary network chatter. */

609. collection_group {

610.   collect_every = 20

611.   time_threshold = 90

612.   /* CPU status */

613.   metric {

614.     name = "cpu_user"

615.     value_threshold = "1.0"

616.     title = "CPU User"

617.   }

618.   metric {

619.     name = "cpu_system"

620.     value_threshold = "1.0"

621.     title = "CPU System"

622.   }

623.   metric {

624.     name = "cpu_idle"

625.     value_threshold = "5.0"

626.     title = "CPU Idle"

627.   }

628.   metric {

629.     name = "cpu_nice"

630.     value_threshold = "1.0"

631.     title = "CPU Nice"

632.   }

633.   metric {

634.     name = "cpu_aidle"

635.     value_threshold = "5.0"

636.     title = "CPU aidle"

637.   }

638.   metric {

639.     name = "cpu_wio"

640.     value_threshold = "1.0"

641.     title = "CPU wio"

642.   }

643.   /* The next two metrics are optional if you want more detail...

644.      ... since they are accounted for in cpu_system.

645.   metric {

646.     name = "cpu_intr"

647.     value_threshold = "1.0"

648.     title = "CPU intr"

649.   }

650.   metric {

651.     name = "cpu_sintr"

652.     value_threshold = "1.0"

653.     title = "CPU sintr"

654.   }

655.   */

656. }

657.

658. collection_group {

659.   collect_every = 20

660.   time_threshold = 90

661.   /* Load Averages */

662.   metric {

663.     name = "load_one"

664.     value_threshold = "1.0"

665.     title = "One Minute Load Average"

666.   }

667.   metric {

668.     name = "load_five"

669.     value_threshold = "1.0"

670.     title = "Five Minute Load Average"

671.   }

672.   metric {

673.     name = "load_fifteen"

674.     value_threshold = "1.0"

675.     title = "Fifteen Minute Load Average"

676.   }

677. }

678.

679. /* This group collects the number of running and total processes */

680. collection_group {

681.   collect_every = 80

682.   time_threshold = 950

683.   metric {

684.     name = "proc_run"

685.     value_threshold = "1.0"

686.     title = "Total Running Processes"

687.   }

688.   metric {

689.     name = "proc_total"

690.     value_threshold = "1.0"

691.     title = "Total Processes"

692.   }

693. }

694.

695. /* This collection group grabs the volatile memory metrics every 40 secs and

696.    sends them at least every 180 secs.  This time_threshold can be increased

697.    significantly to reduce unneeded network traffic. */

698. collection_group {

699.   collect_every = 40

700.   time_threshold = 180

701.   metric {

702.     name = "mem_free"

703.     value_threshold = "1024.0"

704.     title = "Free Memory"

705.   }

706.   metric {

707.     name = "mem_shared"

708.     value_threshold = "1024.0"

709.     title = "Shared Memory"

710.   }

711.   metric {

712.     name = "mem_buffers"

713.     value_threshold = "1024.0"

714.     title = "Memory Buffers"

715.   }

716.   metric {

717.     name = "mem_cached"

718.     value_threshold = "1024.0"

719.     title = "Cached Memory"

720.   }

721.   metric {

722.     name = "swap_free"

723.     value_threshold = "1024.0"

724.     title = "Free Swap Space"

725.   }

726. }

727.

728. collection_group {

729.   collect_every = 40

730.   time_threshold = 300

731.   metric {

732.     name = "bytes_out"

733.     value_threshold = 4096

734.     title = "Bytes Sent"

735.   }

736.   metric {

737.     name = "bytes_in"

738.     value_threshold = 4096

739.     title = "Bytes Received"

740.   }

741.   metric {

742.     name = "pkts_in"

743.     value_threshold = 256

744.     title = "Packets Received"

745.   }

746.   metric {

747.     name = "pkts_out"

748.     value_threshold = 256

749.     title = "Packets Sent"

750.   }

751. }

752.

753. /* Different than 2.5.x default since the old config made no sense */

754. collection_group {

755.   collect_every = 1800

756.   time_threshold = 3600

757.   metric {

758.     name = "disk_total"

759.     value_threshold = 1.0

760.     title = "Total Disk Space"

761.   }

762. }

763.

764. collection_group {

765.   collect_every = 40

766.   time_threshold = 180

767.   metric {

768.     name = "disk_free"

769.     value_threshold = 1.0

770.     title = "Disk Space Available"

771.   }

772.   metric {

773.     name = "part_max_used"

774.     value_threshold = 1.0

775.     title = "Maximum Disk Space Used"

776.   }

777. }

778. slave-hadoop-metrics2-hbase.properties.md Raw

779. slave节点hadoop-metrics2-hbase.properties配置

780.

781. # syntax: [prefix].[source|sink].[instance].[options]

782. # See javadoc of package-info.java for org.apache.hadoop.metrics2 for details

783.

784. #*.sink.file*.class=org.apache.hadoop.metrics2.sink.FileSink

785. # default sampling period

786. #*.period=10

787.

788. # Below are some examples of sinks that could be used

789. # to monitor different hbase daemons.

790.

791. # hbase.sink.file-all.class=org.apache.hadoop.metrics2.sink.FileSink

792. # hbase.sink.file-all.filename=all.metrics

793.

794. # hbase.sink.file0.class=org.apache.hadoop.metrics2.sink.FileSink

795. # hbase.sink.file0.context=hmaster

796. # hbase.sink.file0.filename=master.metrics

797.

798. # hbase.sink.file1.class=org.apache.hadoop.metrics2.sink.FileSink

799. # hbase.sink.file1.context=thrift-one

800. # hbase.sink.file1.filename=thrift-one.metrics

801.

802. # hbase.sink.file2.class=org.apache.hadoop.metrics2.sink.FileSink

803. # hbase.sink.file2.context=thrift-two

804. # hbase.sink.file2.filename=thrift-one.metrics

805.

806. # hbase.sink.file3.class=org.apache.hadoop.metrics2.sink.FileSink

807. # hbase.sink.file3.context=rest

808. # hbase.sink.file3.filename=rest.metrics

809.

810.

811. *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31

812. *.sink.ganglia.period=10

813.

814. hbase.sink.ganglia.period=10

815. hbase.sink.ganglia.servers=master:8649

816. slave-hadoop-metrics2.properties.md Raw

817. slave节点hadoop-metrics2.properties配置

818.

819. #

820. #   Licensed to the Apache Software Foundation (ASF) under one or more

821. #   contributor license agreements.  See the NOTICE file distributed with

822. #   this work for additional information regarding copyright ownership.

823. #   The ASF licenses this file to You under the Apache License, Version 2.0

824. #   (the "License"); you may not use this file except in compliance with

825. #   the License.  You may obtain a copy of the License at

826. #

827. #       http://www.apache.org/licenses/LICENSE-2.0

828. #

829. #   Unless required by applicable law or agreed to in writing, software

830. #   distributed under the License is distributed on an "AS IS" BASIS,

831. #   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

832. #   See the License for the specific language governing permissions and

833. #   limitations under the License.

834. #

835.

836. # syntax: [prefix].[source|sink].[instance].[options]

837. # See javadoc of package-info.java for org.apache.hadoop.metrics2 for details

838.

839. #*.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink

840. # default sampling period, in seconds

841. #*.period=10

842.

843. # The namenode-metrics.out will contain metrics from all context

844. #namenode.sink.file.filename=namenode-metrics.out

845. # Specifying a special sampling period for namenode:

846. #namenode.sink.*.period=8

847.

848. #datanode.sink.file.filename=datanode-metrics.out

849.

850. # the following example split metrics of different

851. # context to different sinks (in this case files)

852. #jobtracker.sink.file_jvm.context=jvm

853. #jobtracker.sink.file_jvm.filename=jobtracker-jvm-metrics.out

854. #jobtracker.sink.file_mapred.context=mapred

855. #jobtracker.sink.file_mapred.filename=jobtracker-mapred-metrics.out

856.

857. #tasktracker.sink.file.filename=tasktracker-metrics.out

858.

859. #maptask.sink.file.filename=maptask-metrics.out

860.

861. #reducetask.sink.file.filename=reducetask-metrics.out

862.

863. *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31

864. *.sink.ganglia.period=10

865.

866. *.sink.ganglia.slope=jvm.metrics.gcCount=zero,jvm.metrics.memHeapUsedM=both

867. *.sink.ganglia.dmax=jvm.metrics.threadsBlocked=70,jvm.metrics.memHeapUsedM=40

868.

869. namenode.sink.ganglia.servers=master:8649

870. resourcemanager.sink.ganglia.servers=master:8649

871.

872. datanode.sink.ganglia.servers=master:8649

873. nodemanager.sink.ganglia.servers=master:8649

874.

875.

876. maptask.sink.ganglia.servers=master:8649

877. reducetask.sink.ganglia.servers=master:8649

878.

复制代码

大数据系列hadoop相关文章:

最新评论
2014-09-10 11:48:28
怀霜临云2014-09-09 09:30:08
at java.security.AccessController.doPrivileged(Native Method)
知心你我2014-09-08 09:47:59
配了storm的
暖空赵2014-09-08 03:36:31
每天进步一点点
Res2014-09-07 04:19:49
不会啊
 
  • Hadoop生态系统资料推荐