The subject part

stay Oracle in ,Oracle Cluster Health Monitor(CHM) What is the role of ?


Answer section

CHM(Cluster Health Monitor, Cluster health monitoring ) It’s a Oracle Tools provided , Used to automatically collect operating system resources (CPU、 Memory 、SWAP、 process 、I/O And the Internet ) Usage situation .CHM Data will be collected once a second . These system resource data restart the nodes of the diagnosis cluster system 、Hang、 Instance expulsion (Eviction)、 Performance issues are very helpful . in addition , Users can use CHM To detect some of the system load early 、 Memory exception and so on , So as to avoid more serious problems .CHM It can also be used to quickly collect the data of abnormal time when the system is abnormal . be relative to OSWatcher,CHM Call directly OS Of API To reduce costs , and OSWatcher It’s a direct call UNIX command ; in addition ,CHM More real-time , Collect data once a second , from Oracle Start to change to every 5 Seconds at a time .OSWatcher The advantage is that it can be used traceroute Command to detect connectivity between private networks , And the retention time of the generated data can be set for a long time . If you can , It’s better to install both tools .

stay Oracle after ,AIX and Linux The platform is being installed Grid Default installation CHM. The common commands are as follows :

1crsctl stat res ora.crf -init -p # see ora.crf state
2oclumon manage -get master # see CHM Current master
3oclumon manage -get reppath # see CHM Data storage path
4oclumon manage -repos reploc /shared/oracle/chm # modify CHM Data storage path
5oclumon manage -get repsize # see CHM Data retention time (s)
6oclumon manage -repos resize 68083 # modify CHM Data retention time (s)

In the cluster , You can see this with the following command CHM Corresponding resources (ora.crf) The state of :

1[root@rac2 ~]# crsctl stat res -t -init |grep -1 ora.crf 
3      1        ONLINE  ONLINE       rac2

CHM It mainly includes two services :

1、System Monitor Service(osysmond): This service will run on all nodes ,osysmond The resource usage of each node will be sent to Cluster Logger Service, The latter will receive and save the information of all nodes to CHM The database of .

1[root@rac2 ~]# ps -ef|grep osysmond
2root     29498     1  1 15:18 ?        00:01:31 /u01/app/11.2.0/grid/bin/osysmond.bin

2、Cluster Logger Service(ologgerd): In a cluster ,ologgerd There will be a master node (Master), There is also a standby node (Standby). When ologgerd After the current node encounters a problem and cannot start , It will be enabled on the standby node . The service will osysmond The collected data is saved to CHM In database ($GRID_HOME/crf/db).

Master node :

1$ ps -ef|grep ologgerd
2root 8257  1  0 Jun05 ?  00:38:26 /u01/app/11.2.0/grid/bin/ologgerd -M -d  /u01/app/11.2.0/grid/crf/db/rac2

For the node :

1$ ps -ef|grep ologgerd
2root  8353  1  0 Jun05 ?  00:18:47 /u01/app/11.2.0/grid/bin/ologgerd -m rac2 -r -d    /u01/app/11.2.0/grid/crf/db/rac1

-M or -m The latter node represents the primary node , The above results represent nodes 2 Master node .

get CHM There are two ways to generate data :

1、 One is to use Grid_home/bin/diagcollection.pl:

1/u01/app/11.2.0/grid/bin/diagcollection.pl --collect --all --incidenttime 12/30/201515:13:00 --incidentduration 00:30

among ,“–incidenttime” Indicates the start time of data acquisition , The format is MM/DD/YYYY24HH:MM:SS;“–incidentduration” Indicates the duration , The format is HH:MM. The generated files are in the current directory .

2、 Another way to get CHM The way to generate the data is oclumon:

1$oclumon dumpnodeview [[-allnodes] | [-n node1 node2] [-last "duration"] | [-s "time_stamp" -e "time_stamp"] [-v] [-warning]] [-h]

among ,-s Indicates the start time ,-e Indicates the end time , for example :

1$ oclumon dumpnodeview -allnodes -v -s "2012-06-15 07:40:00" -e "2012-06-15 07:57:00" > /tmp/chm1.txt

Use root The user can disable CHM service :

1crsctl stop res ora.crf -init
2crsctl modify res ora.crf -attr "AUTO_START=never" -init
3crsctl modify res ora.crf -attr "ENABLED=0" -init

Use the following command to view ora.crf Related properties :

1crsctl stat res ora.crf -init -p