RAC Health Checks

Oracle RAC

RAC Health Checks

This article is to present various commands used to perform rac health checks. In oracle RAC, there are various components and to know the status of individual components, these commands may help you as reference. The commands are not limited , but as an easy reference to check the status.

To verify shared storage:

cluvfy comp ssa -n all

Login to server using grid user ( clusterware software owner ) :
Execute the environment profile.

1) RAC Node Apps Health Checks :
[[email protected] ~]$ srvctl status nodeapps
VIP rac11gdb01-vip is enabled
VIP rac11gdb01-vip is running on node: rac11gdb01
VIP rac11gdb02-vip is enabled
VIP rac11gdb02-vip is running on node: rac11gdb02
Network is enabled
Network is running on node: rac11gdb01
Network is running on node: rac11gdb02
GSD is disabled
GSD is not running on node: rac11gdb01
GSD is not running on node: rac11gdb02
ONS is enabled
ONS daemon is running on node: rac11gdb01
ONS daemon is running on node: rac11gdb02
[[email protected] ~]$

2) ASM Status :
[[email protected] ~]$ srvctl status asm
ASM is running on rac11gdb02,rac11gdb01
[[email protected] ~]$ srvctl status asm -n rac11gdb01
ASM is running on rac11gdb01
[[email protected] ~]$ srvctl status asm -n rac11gdb02
ASM is running on rac11gdb02
[[email protected] ~]$

3) Database Status:
[[email protected] ~]$ srvctl status database -d racdb
Instance racdb1 is running on node rac11gdb01
Instance racdb2 is running on node rac11gdb02
[[email protected] ~]$ srvctl status instance -d racdb -i racdb1
Instance racdb1 is running on node rac11gdb01
[[email protected] ~]$ srvctl status instance -d racdb -i racdb2
Instance racdb2 is running on node rac11gdb02
[[email protected] ~]$

4) CRS Status:
[[email protected] ~]$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
[[email protected] ~]$

5) Cluster Status:
[[email protected] ~]$ crsctl check cluster
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
[[email protected] ~]$

6) RAC High Availability Services Status:
[[email protected] ~]$ crsctl check has
CRS-4638: Oracle High Availability Services is online
[[email protected] ~]$

7) Database Services Status :
[[email protected] ~]$ srvctl status service -d racdb
Service racdb_service is running on instance(s) racdb1,racdb2
[[email protected] ~]$

8) Listener Status:
[[email protected] ~]$ srvctl status listener
Listener LISTENER is enabled
Listener LISTENER is running on node(s): rac11gdb02,rac11gdb01
[[email protected] ~]$

9) SCAN VIP Status :
[[email protected] ~]$ srvctl status scan
SCAN VIP scan1 is enabled
SCAN VIP scan1 is running on node rac11gdb02
SCAN VIP scan2 is enabled
SCAN VIP scan2 is running on node rac11gdb01
SCAN VIP scan3 is enabled
SCAN VIP scan3 is running on node rac11gdb01
[[email protected] ~]$

10) Scan Listener Status :
[[email protected] ~]$ srvctl status scan_listener
SCAN Listener LISTENER_SCAN1 is enabled
SCAN listener LISTENER_SCAN1 is running on node rac11gdb02
SCAN Listener LISTENER_SCAN2 is enabled
SCAN listener LISTENER_SCAN2 is running on node rac11gdb01
SCAN Listener LISTENER_SCAN3 is enabled
SCAN listener LISTENER_SCAN3 is running on node rac11gdb01
[[email protected] ~]$

11) Server Status :
[[email protected] ~]$ srvctl status server -n rac11gdb01 -a
Server name: rac11gdb01
Server state: ONLINE
Server active pools: Generic ora.racdb ora.racdb_racdb_service
Server state details:
[[email protected] ~]$ srvctl status server -n rac11gdb02 -a
Server name: rac11gdb02
Server state: ONLINE
Server active pools: Generic ora.racdb ora.racdb_racdb_service
Server state details:
[[email protected] ~]$

12)CVU Status:
[[email protected] ~]$ srvctl status cvu
CVU is enabled and running on node rac11gdb01
[[email protected] ~]$

13) GNS Status : ( As root user)
[[email protected] ~]# /o001/home/11.2.0.2/grid/bin/srvctl status gns
PRCS-1065 : GNS is not configured.
[[email protected] ~]#

14) Serverpool Details :
[[email protected] ~]$ srvctl status srvpool
Server pool name: Free
Active servers count: 0
Server pool name: Generic
Active servers count: 2
[[email protected] ~]$

15) Cluster Interconnect Details :
[[email protected] ~]$ oifcfg getif
eth0 192.168.1.0 global public
eth1 192.168.50.0 global cluster_interconnect

16) OCR Checks:
[[email protected] ~]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 3224
Available space (kbytes) : 258896
ID : 1449084471
Device/File Name : +OCR_VOTE
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check bypassed due to non-privileged user
[[email protected] ~]$

17) OCR Backups:
[[email protected] ~]$ ocrconfig -showbackup
rac11gdb01 2013/09/23 11:46:41 /o001/home/11.2.0.2/grid/cdata/orrcdbdv-clstr/backup00.ocr
rac11gdb01 2013/09/23 04:19:26 /o001/home/11.2.0.2/grid/cdata/orrcdbdv-clstr/backup01.ocr
rac11gdb01 2013/09/23 00:19:25 /o001/home/11.2.0.2/grid/cdata/orrcdbdv-clstr/backup02.ocr
rac11gdb01 2013/09/22 08:19:24 /o001/home/11.2.0.2/grid/cdata/orrcdbdv-clstr/day.ocr
rac11gdb01 2013/09/10 08:18:54 /o001/home/11.2.0.2/grid/cdata/orrcdbdv-clstr/week.ocr
PROT-25: Manual backups for the Oracle Cluster Registry are not available
[[email protected] ~]$

18) Voting Disk Status:
[[email protected] ~]$ crsctl query css votedisk

STATE File Universal Id File Name Disk group

— —– —————– ——— ———

  1. ONLINE 9bc0828570854fa5bff3221500a1fc63 (ORCL:CRSVOL1) [OCR_VOTE]
    Located 1 voting disk(s).
    [[email protected] ~]$

19) Node apps config details :
[[email protected] ~]$ srvctl config nodeapps -a -g -s
Network exists: 1/192.168.1.0/255.255.255.0/eth0, type static
VIP exists: /rac11gdb01-vip/192.168.1.73/192.168.1.0/255.255.255.0/eth0, hosting node rac11gdb01
VIP exists: /rac11gdb02-vip/192.168.1.74/192.168.1.0/255.255.255.0/eth0, hosting node rac11gdb02
GSD exists
ONS exists: Local port 6100, remote port 6200, EM port 2016
[[email protected] ~]$

20) Diskgroups Status:
[[email protected] ~]$ crs_stat -t | grep -i dg
ora….DATA.dg ora….up.type ONLINE ONLINE rac11gdb01
ora.FLASH.dg ora….up.type ONLINE ONLINE rac11gdb01
ora….VOTE.dg ora….up.type ONLINE ONLINE rac11gdb01
[[email protected] ~]$


Cluster Health Check in Oracle 11gR2 RAC

Cluster health checkup
( Execute from grid user / root GRID_HOME/bin location)

— To find cluster name

$CRS_HOME/bin/cemutlo -n
racdbprdscan

— To Find connected nodes/ hosts

$ olsnodes
rac01
rac02

— Post installation verification:

$ cluvfy stage -post crsinst -n rac01,rac02

— Diskgroup status

$ srvctl status diskgroup -g DATA
Disk Group DATA is running on rac01,rac02
$
$ srvctl status diskgroup -g FRA
Disk Group FRA is running on rac01,rac02
$

Note: Assume DATA disk group used for data-files and FRA disk-group is used for backup and archive-log location

— Cluster-wide cluster commands

With Oracle 11gR2, you can now start, stop and verify Cluster status of all nodes from a single node. Pre Oracle 11gR2, you must login to individual nodes to start, stop and verify cluster health status. Below are some of the cluster-wide cluster commands:

$ ./crsctl check cluster ñall [verify cluster status on all nodes]
$ ./crsctl stop cluster ñall [stop cluster on all nodes]
$ ./crsctl start cluster ñall [start cluster on all nodes]
$ ./crsctl check cluster ñn [verify the cluster status on a particular remote node]

— To verify CRS services status

$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

$ crsctl get css diagwait
CRS-4678: Successful get diagwait 0 for Cluster Synchronization Services.

$ crsctl get css disktimeout
CRS-4678: Successful get disktimeout 200 for Cluster Synchronization Services.

$ crsctl query css votedisk

STATE File Universal Id File Name Disk group

— —– —————– ——— ———

  1. ONLINE 4971c1d262ef4f2fbfe925bddf51dc8f (/dev/rhdisk5) [OCRVD]
  2. ONLINE 4f8e50d644e54f1fbfff007b22c2fafa (/dev/rhdisk4) [OCRVD]
  3. ONLINE 5c668deccdae4f4dbf1d2057c6143bf8 (/dev/rhdisk6) [OCRVD]
    Located 3 voting disk(s).

— Find interconnect IPs and Interface details

$ oifcfg getif
en0 10.11.12.0 global public
en1 192.168.1.0 global cluster_interconnect

— OS Checks

$ /usr/sbin/no -a | fgrep ephemeral

tcp_ephemeral_high = 65500
tcp_ephemeral_low = 9000
udp_ephemeral_high = 65500
udp_ephemeral_low = 9000

$ lslpp -L
Fileset Level State Type Description (Uninstaller)


DirectorCommonAgent 6.3.0.3 C F All required files of Director
Common Agent, including JRE,
LWI
DirectorPlatformAgent 6.3.0.1 C F Director Platform Agent for
IBM Systems Director on AIX
ICU4C.rte 6.1.8.0 C F International Components for
Unicode
Java5.sdk 5.0.0.500 C F Java SDK 32-bit
Java5_64.sdk 5.0.0.500 C F Java SDK 64-bit
Java6.sdk 6.0.0.375 C F Java SDK 32-bit
Tivoli_Management_Agent.client.rte
3.7.1.0 C F Management Framework Endpoint
Runtime”
X11.adt.bitmaps 6.1.0.0 C F AIXwindows Application
Development Toolkit Bitmap
Files
…………
…………

Oracle Clusterware Troubleshooting ñ tools & utilities:

Oracle DBA schould know how to manage and troubleshoot the cluster system. So, DBA must aware of all the internal and external tools and utilities provided by Oracle to maintain and diagnose cluster issues. The understanding and weighing the pros and cons of each individual tool/utility is essential. You must have a great knowledge and should choose the right tool/utility at the right moment; else, you will not only waste the time to resolve the issue but also you may have a prolonged service interruption.

Here are some of the very important and mostly used tools and utilities:

Cluster Verification Utility (CVU) ñ is used to collect pre and post cluster installation configuration details at various levels and various components. With 11gR2, it also provides the ability to verify the cluster health. Look at some of the useful commands below:

$ ./cluvfy comp healthcheck ñcollect cluster ñbestpractice ñhtml
$ ./cluvfy comp healthcheck ñcollect cluster|database

Real Time RAC DB monitoring (oratop) ñ is an external Oracle utility, currently available on Linux platform, which provides OS specific top alike output where you can monitor RAC databases/single instance databases in real time. The window provides statistics real-time, such as: DB Top event, top Oracle processes, blocking session information etc. You must download the oratop.zip from support.oracle.com and configure it.

RAC configuration audit tool (RACcheck) ñ yet another Oracle provided external tool developed by the RAC support team to perform audit on various cluster configuration. You must download the tool (raccheck.zip) from the support.oracle.com and configure it on one of the nodes of cluster. The tool performs cluster-wide configuration auditing at CRS,ASM, RDMS and generic database parameters settings. This tool also can be used to assess the readiness of the system for the upgrade. However, you need to keep upgrading the tool to get the latest recommendations.

Cluster Diagnostic Collection Tool (diagcollection.sh)ñ Since cluster manages so many log files, sometime it will be time consuming and cumbersome to visit/refer all the logs to understand the nature of the problem, or diagnose the issue. The diagcollection.sh tool refers various cluster log files and gathers required information to diagnose critical cluster problems. With this tool, you can gather the stats/information at various levels: Cluster, RDBMS, Core analysis, database etc. The tool encapsulates all file in a zip file and removes the individual files. The following .zip files are collected as part of the diagcollection run:

ocrData_hostname_date.tar.gz — contains ocrdump, ocrcheck etc
coreData_hostname_date.tar.gz — contains CRS core files
osData_hostname_date.tar.gz — OS logs
ocrData_hostname_date.tar.gz — OCR details

Above all, there is many other important and useful tools:

Cluster Health Monitoring (CHM) to diagnose node eviction issues, DB Hanganalysis, OSWatcher etc are available for your use under different circumstances.

Outputs when a Two Node RAC is running fine:

1) Cluster checks:

$ crsctl check cluster -all


rac01:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online


rac02:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online


$

2) Cluster services

$ crsctl stat res -t

NAME TARGET STATE SERVER STATE_DETAILS

Local Resources

ora.DATA.dg
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.FRA.dg
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.LISTENER.lsnr
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.OCRVD.dg
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.asm
ONLINE ONLINE rac01 Started
ONLINE ONLINE rac02 Started
ora.gsd
OFFLINE OFFLINE rac01
OFFLINE OFFLINE rac02
ora.net1.network
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.ons
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.registry.acfs
ONLINE ONLINE rac01

ONLINE ONLINE rac02

Cluster Resources

ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE rac02
ora.LISTENER_SCAN2.lsnr
1 ONLINE ONLINE rac01
ora.LISTENER_SCAN3.lsnr
1 ONLINE ONLINE rac01
ora.cvu
1 ONLINE ONLINE rac01
ora.rac01.vip
1 ONLINE ONLINE rac01
ora.rac02.vip
1 ONLINE ONLINE rac02
ora.prod.db
1 ONLINE ONLINE rac01 Open
2 ONLINE ONLINE rac02 Open
ora.prod.hr_service.svc
1 ONLINE ONLINE rac01
2 ONLINE ONLINE rac02
ora.oc4j
1 ONLINE ONLINE rac01
ora.scan1.vip
1 ONLINE ONLINE rac02
ora.scan2.vip
1 ONLINE ONLINE rac01
ora.scan3.vip
1 ONLINE ONLINE rac01
$

3) Cluster daemon services

$ ps -ef|grep d.bin
grid 3211388 1 0 Mar 25 – 2:12 /u01/app/11.2.0/grid/bin/mdnsd.bin
grid 3735586 1 0 Mar 25 – 363:22 /u01/app/11.2.0/grid/bin/oraagent.bin
root 3801286 1 0 Mar 25 – 70:05 /u01/app/11.2.0/grid/bin/cssdmonitor
root 3866662 1 0 Mar 25 – 0:00 /bin/sh /u01/app/11.2.0/grid/bin/ocssd
grid 3997730 3866662 0 Mar 25 – 368:01 /u01/app/11.2.0/grid/bin/ocssd.bin
grid 4259984 1 0 Mar 25 – 337:21 /u01/app/11.2.0/grid/bin/oraagent.bin
grid 4587612 1 0 Mar 25 – 16:09 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER_SCAN2 -inherit
grid 4980868 1 0 Mar 25 – 140:32 /u01/app/11.2.0/grid/bin/evmd.bin
grid 5046470 1 0 Mar 25 – 241:22 /u01/app/11.2.0/grid/bin/gipcd.bin
root 5308634 1 0 Mar 25 – 403:18 /u01/app/11.2.0/grid/bin/crsd.bin reboot
grid 5832886 4980868 0 Mar 25 – 2:26 /u01/app/11.2.0/grid/bin/evmlogger.bin -o /u01/app/11.2.0/grid/evm/log/evmlogger.info -l /u01/app/11.2.0/grid/evm/log/evmlogger.log
grid 6684726 1 0 Mar 25 – 26:44 /u01/app/11.2.0/grid/bin/scriptagent.bin
root 8650912 1 0 Mar 27 – 736:13 /u01/app/11.2.0/grid/bin/osysmond.bin
root 3539286 1 0 Mar 25 – 73:45 /u01/app/11.2.0/grid/bin/cssdagent
root 5177672 1 0 Mar 25 – 451:41 /u01/app/11.2.0/grid/bin/orarootagent.bin
root 5570900 1 0 Mar 25 – 160:06 /u01/app/11.2.0/grid/bin/octssd.bin reboot
grid 5898748 1 0 Mar 25 – 218:13 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER -inherit
root 6357408 1 0 Mar 25 – 309:27 /u01/app/11.2.0/grid/bin/ohasd.bin reboot
root 7012806 1 2 Mar 25 – 3467:51 /u01/app/11.2.0/grid/bin/orarootagent.bin
root 7274816 1 0 Mar 25 – 1012:49 /u01/app/11.2.0/grid/bin/ologgerd -M -d /u01/app/11.2.0/grid/crf/db/rac01
grid 7340538 1 0 Mar 25 – 40:56 /u01/app/11.2.0/grid/bin/gpnpd.bin
grid 7864614 1 0 Mar 25 – 16:07 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER_SCAN3 -inherit
oracle 10093038 1 2 Mar 27 – 464:30 /u01/app/11.2.0/grid/bin/oraagent.bin
grid 13763032 19923018 0 15:12:18 pts/1 0:00 grep d.bin
$

4) SCAN listener status

$ srvctl status scan
SCAN VIP scan1 is enabled
SCAN VIP scan1 is running on node rac02
SCAN VIP scan2 is enabled
SCAN VIP scan2 is running on node rac01
SCAN VIP scan3 is enabled
SCAN VIP scan3 is running on node rac01
$
$ srvctl status scan_listener
SCAN Listener LISTENER_SCAN1 is enabled
SCAN listener LISTENER_SCAN1 is running on node rac02
SCAN Listener LISTENER_SCAN2 is enabled
SCAN listener LISTENER_SCAN2 is running on node rac01
SCAN Listener LISTENER_SCAN3 is enabled
SCAN listener LISTENER_SCAN3 is running on node rac01
$
$ srvctl config scan
SCAN name: racprdscan, Network: 1/10.11.12.0/255.255.255.0/en0
SCAN VIP name: scan1, IP: /racprdscan/10.11.12.44
SCAN VIP name: scan2, IP: /racprdscan/10.11.12.45
SCAN VIP name: scan3, IP: /racprdscan/10.11.12.46
$

5) OCR Integrety verification:

$ cluvfy comp ocr

Verifying OCR integrity
Checking OCR integrity…

Checking the absence of a non-clustered configuration…
All nodes free of non-clustered, local-only configurations

ASM Running check passed. ASM is running on all specified nodes

Checking OCR config file “/etc/oracle/ocr.loc”…
OCR config file “/etc/oracle/ocr.loc” check successful

Disk group for ocr location “+OCRVD” available on all the nodes

NOTE:
This check does not verify the integrity of the OCR contents. Execute ‘ocrcheck’ as a privileged user to verify the contents of OCR.

OCR integrity check passed
Verification of OCR integrity was successful.
$

6) Verification of attached shared storage

$ cluvfy comp ssa -n all

Verifying shared storage accessibility
Checking shared storage accessibility…

Disk Sharing Nodes (2 in count)
———————————— ————————
/dev/rhdisk3 rac01 rac02
/dev/rhdisk5 rac01 rac02
/dev/rhdisk6 rac01 rac02
/dev/rhdisk7 rac01 rac02
/dev/rhdisk8 rac01 rac02
/dev/rhdisk9 rac01 rac02
/dev/rhdisk10 rac01 rac02
/dev/rhdisk11 rac01 rac02
/dev/rhdisk12 rac01 rac02

Shared storage check was successful on nodes “rac01,rac02”
Verification of shared storage accessibility was successful.
$

Cluster Log locations:

Locating the Oracle Clusterware Component Log Files

$ORACLE_HOME/log/hostname/racg

Oracle RAC uses a unified log directory structure to store all the Oracle Clusterware component log files. This consolidated structure simplifies diagnostic information collection and assists during data retrieval and problem analysis.

The log files for the CRS daemon, crsd, can be found in the following directory:
CRS_home/log/hostname/crsd/

The log files for the CSS deamon, cssd, can be found in the following directory:
CRS_home/log/hostname/cssd/

The log files for the EVM deamon, evmd, can be found in the following directory:
CRS_home/log/hostname/evmd/

The log files for the Oracle Cluster Registry (OCR) can be found in the following directory:
CRS_home/log/hostname/client/

The log files for the Oracle RAC high availability component can be found in the following directories:
CRS_home/log/hostname/racg/