Oracle Restart is a new feature introduced in Clusterware 11.2. In simple terms it allows you to register resource such as ASM disk groups, ASM- and RDBMS instances in Clusterware in a way very similar to what we did in RAC. The aim is to make custom startup scripts obsolete by starting all resources and their dependencies on a database sever through CRS or Grid Infrastructure as itís called now.
A quick note as an introduction: CRS and Clusterware are used interchangeably in this article.
As you may already know, Oracle ASM and Clusterware are now bundled into the grid infrastructure installation zipfile. Even if you intend to run single instance Oracle based on ASM, you need to install grid infrastructure into a separate Oracle home in addition to the RDBMS software. The ASM administration option has been removed from dbca and moved into asmca, which actually looks quite nice and not so dated as dbca did and still does. But hey ñ I donít use the assistants if I can avoid it and donít worry too much about their 1990s look and feel (but OUI now looks a lot better in 11.2!). The installation of CRS, ASM and RDBMS into three different homes used to be best practice in 10.2; the installation of CRS and ASM into the same home is a deviation from the 10g theme.
So for single instance Oracle using ASM we need to install Grid Infrastructure first. The installer wants to put the binaries into /u01/app/oracle/product/11.2/grid with ORACLE_BASE set to /u01/app/oracle.
The installation is quite simple, usually a next-next-next approach leads to the desired result. The execution of root.sh takes a little longer as it creates the ASM instance as well. At the end of the installation, youíll have Oracle Restart as well as ASM plus a your diskgroups registered. Hereís the first difference to what we know from Oracle 10g: there is no need to run ì$ORACLE_HOME/bin/localconfig addî as root to enable a trimmed down veresion of CSS to be started, itís already done for you. And there is more visibility as well. The RAC admin will know most of these commands already, and generally there seems to be less of a difference between single instance and RAC administration when it comes to the tools employed. The exception of course is that this time we execute the commands on a ìsingle instanceî system. For example, after the installation finished, these were the resources registered on my system, called devbox001.
[[email protected] ~]$ crsctl status resource -t
NAME TARGET STATE SERVER STATE_DETAILS
Local Resources
ora.LISTENER.lsnr
ONLINE ONLINE devbox001
ora.DATA.dg
ONLINE ONLINE devbox001
ora.REDO.dg
ONLINE ONLINE devbox001
ora.FRA.dg
ONLINE ONLINE devbox001
ora.asm
ONLINE ONLINE devbox001 Started
Cluster Resources
ora.cssd
1 ONLINE ONLINE devbox001
ora.diskmon
1 ONLINE ONLINE devbox001
ora.dev1.db
1 ONLINE ONLINE devbox001 Open
ora.dev2.db
1 ONLINE ONLINE devbox001 Open
ora.dev3.db
1 ONLINE ONLINE devbox001 Open
The old crs_stat command is deprecated in 11.2, but still works:
[[email protected] rdbms]$ crs_stat -t
Name Type Target State Host
ora.DATA.dg ora….up.type ONLINE ONLINE rhel5
ora….ER.lsnr ora….er.type ONLINE ONLINE rhel5
ora.asm ora.asm.type ONLINE ONLINE rhel5
ora.cssd ora.cssd.type ONLINE ONLINE rhel5
ora.diskmon ora….on.type ONLINE ONLINE rhel5
The first surprise here was the diskgroup resource, ìDATAî. This is a useful addition, in previous releases up to 11.1 you had to manually set the ASM initialisation parameter asm_diskgroups to specify all the disk groups ASM should automatically mount. Now in 11.2 this parameter is actually unset:
[[email protected] ~]$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.1.0 Production on Mon Sep 28 19:53:14 2009
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 – Production
With the Automatic Storage Management option
SQL> show parameter diskgr
NAME TYPE VALUE
asm_diskgroups string
A quick query against v$asm_diskgroup_stat reveals that the diskgroup DATA is mounted nevertheless.
SQL> select name, state from v$asm_diskgroup_stat;
NAME STATE
DATA MOUNTED
From now on you canít stop ASM without stopping the diskgroup first (well, you could use the force option).
Also note that you donít set a direct dependency between asm and the database (instance) as in 10g/11.1 anymore. You now reference the diskgroup(s):
[[email protected] ~]$ srvctl add database -h
Adds a database configuration to be managed by Oracle Restart.
Usage: srvctl add database -d -o [-m ] [-p ]
[-r {PRIMARY | PHYSICAL_STANDBY | LOGICAL_STANDBY | SNAPSHOT_STANDBY}] [-s ]
[-t ] [-n ] [-y {AUTOMATIC | MANUAL}] [-a “”]
-d Unique name for the database
-o ORACLE_HOME path
-m Domain for database. Must be set if database has DB_DOMAIN set.
-p Server parameter file path
-r Role of the database (primary, physical_standby, logical_standby,
snapshot_standby)
-s Startup options for the database. Examples of startup options are open,
mount, or nomount.
-t Stop options for the database. Examples of shutdown options are normal,
transactional, immediate, or abort.
-n Database name (DB_NAME), if different from the unique name given by the
-d option
-y Management policy for the database (AUTOMATIC or MANUAL)
-a “” Comma separated list of disk groups
-h Print usage
If you wondered which of these to set ñ this is an example from dbca:
$> srvctl add database -d orcl -o /u01/app/oracle/product/11.2.0/dbhome_1 -p ‘+DATA/orcl/spfileorcl.ora’ \
-n orcl -a DATA
Administering
From now on refrain from using sqlplus for administering the instance. srvctl is used for all your needs, such as in starting (srvctl start database -d dbname), stopping (srvctl stop database -d dbname). The -o option allows you to pass abort, immediate, mount, nomount etc.
The handling of services has also seen improvement. Instead of calling dbms_service or setting service_names (in single instance only!), we can now use srvctl add service -d -s instead. Again, RAC and single instance Oracle are a little more unified. Here are the options:
Usage: srvctl add service -d -s [-l [PRIMARY][,PHYSICAL_STANDBY]
[,LOGICAL_STANDBY][,SNAPSHOT_STANDBY]] [-y {AUTOMATIC | MANUAL}][-q {true|false}] [-j {SHORT|LONG}]
[-B {NONE|SERVICE_TIME|THROUGHPUT}][-e {NONE|SESSION|SELECT}] [-m {NONE|BASIC}][-z ]
[-w ]
-d Unique name for the database
-s Service name
-l Role of the service (primary, physical_standby, logical_standby,
snapshot_standby)
-y Management policy for the service (AUTOMATIC or MANUAL)
-e Failover type (NONE, SESSION, or SELECT)
-m Failover method (NONE or BASIC)
-w Failover delay
-z Failover retries
-j Connection Load Balancing Goal (SHORT or LONG). Default is LONG.
-B Runtime Load Balancing Goal (SERVICE_TIME, THROUGHPUT, or NONE)
-q AQ HA notifications (TRUE or FALSE)
-h Print usage
Now how cool is that! We can have services depending on database role! No more messing around with startup triggers to set services based on a database role. No more ìOracle startup or shutdown in progressî when connecting to the wrong host after a switchover/failover operation. Nice.
Some of the options such clb goals donít really make sense in single instance oracle thoughÖ
If you are unsure about the configuration of a component, use srvctl config -a to display information. For a database, thatís srvctl config database -d -a. The same works for ASM as well.
Troubleshooting and logs
For the following sections it is useful to remember that Oracle Restart is called OHAS internally ñ it makes finding logs a lot easier.
ASM logs and traces are stored in the diag_dest directory, there is no difference to a 11.1 ASM instance here. Iíd recommend that you get used to using adrci, personally I find it a lot easier than navigating the file system. And finally there is a unified way to access the alert log! If you are working for a managed service provider then this is a godsent!
The logging for CRS is not yet integrated into the new diagnostic framework (why not?), so youíll have to dig out the important information yourself. Important log file locations are:
$ORACLE_HOME/log//ohasd/ohasd.log
ñ initialisation of the ohasd process
ñ reading of resource
ñ setting ACL per resource
ñ spawns the agent procesess (oraagent.bin, cssdagent, orarootagent.bin)
$ORACLE_HOME/log//agent/ohasd/oraagent_oracle/oraagent_oracle.log
$ORACLE_HOME/log//agent/ohasd/orarootagent_oracle/orarootagent_oracle.log
$ORACLE_HOME/log//agent/ohasd/oracssdagent_oracle/oracssdagent_oracle.log
$ORACLE_HOME/log//diskmon/client.log
$ORACLE_HOME/log//diskmon/diskmon.log
$ORACLE_HOME/log//cssd/ocssd.log
$ORACLE_HOME/log//cssd/cssdOUT.log
$ORACLE_HOME/log//client/crsctl.log
The information the agent reads at startup are stored in the OLR, which seems to be a bit like the OCR. You find the location of the OLR in /etc/oracle/olr.loc, but there is still a pointer to the OCR in /etc/oracle/ocr.loc. Usually they are in $CRS_HOME/cddata/hostname/. It seems though that the OCR is not used at all as it hadnít been touched since I installed the grid infrastructure home.
It seems that there is one backup of the OLR taken, as shown in ocrconfig -local -showbackup. Ommitting the ì-localî option will result in in an error message. I have no idea if that it intentional or not. Taking manual backups fails as well, but ocrconfig doesnít even report the error:
[[email protected] rhel5]$ ocrconfig -local -showbackup
rhel5 2009/09/17 08:29:25 /u01/app/oracle/product/11.2.0/grid/cdata/rhel5/backup_20090917_082925.olr
[[email protected] rhel5]$ ocrconfig -local -manualbackup
PROTL-23: Message 23 not found; product=srvm; facility=PROTL
[[email protected] rhel5]$ oerr protl 23
[[email protected] rhel5]$
Maybe this is like in 10.1 when crsctl wasnít all that useful and we can expect new functionality to be added in due course.
This is the end of part I, the next part will deal with setting up and using FAN events for automatic client failover.
Update ñ bug 9084067
Following the oracle-l mailing list I have come across a ulimit problem, described in the following Metalink note:
11gR2 Oracle Restart Does not Use ULIMIT Setting Appropriately [ID 983715.1]
In essence, starting the instance through srvctl upon node reboot huge pages arenít used and the database alert log displays a warning saying that it is running on a system with low open file descriptor limit. All the prerequisites for ulimits were implemented for the oracle account, and starting the database through sqlplus as oracle is not a problem. srvctl however hands off control to the root account to start the database and for this account limits are different. Check the note for a fix to the problem with ohasd.