How to Proceed from Failed 11gR2 Grid Infrastructure (CRS) Installation [ID 942166.1] (Last update is later than my post might be related 🙂 )
Basically
Step 1: As root, run ì$GRID_HOME/crs/install/rootcrs.pl -verbose -deconfig -forceî on all nodes, except the last one.
Step 2: As root, run ì$GRID_HOME/crs/install/rootcrs.pl -verbose -deconfig -force -lastnodeî on last node. This command will zero out OCR and VD disk also.
Last 3 days I was a bit busy with installing Oracle RAC on Solaris 10 x64 on VMWare. I am planning to write a detailed documentation ,but I want to write an issue beforehand, which I managed to solve during the installation .
During grid infrastructure everything went fine till I ran root.sh script for cluster configuration. Script failed with the error stack below (I truncated the worked part)
/u01/app/11.2.0/grid/root.sh
….
….
….
ASM created and started successfully.
DiskGroup DATA created successfully.
Errors in file :
ORA-27091: unable to queue I/O
ORA-15081: failed to submit an I/O operation to a disk
ORA-06512: at line 4
PROT-1: Failed to initialize ocrconfig
Command return code of 255 (65280) from command: /u01/grid/11.2.0/bin/ocrconfig -upgrade grid oinstall
Failed to create Oracle Cluster Registry configuration, rc 255
CRS-2500: Cannot stop resource ‘ora.crsd’ as it is not running
CRS-4000: Command Stop failed, or completed with errors.
Command return code of 1 (256) from command: /u01/grid/11.2.0/bin/crsctl stop resource ora.crsd -init
Stop of resource “ora.crsd -init” failed
Failed to stop CRSD
CRS-2673: Attempting to stop ‘ora.asm’ on ‘solarac2’
CRS-2677: Stop of ‘ora.asm’ on ‘solarac2’ succeeded
CRS-2673: Attempting to stop ‘ora.ctssd’ on ‘solarac2’
CRS-2677: Stop of ‘ora.ctssd’ on ‘solarac2’ succeeded
CRS-2673: Attempting to stop ‘ora.cssdmonitor’ on ‘solarac2’
CRS-2677: Stop of ‘ora.cssdmonitor’ on ‘solarac2’ succeeded
CRS-2673: Attempting to stop ‘ora.cssd’ on ‘solarac2’
CRS-2677: Stop of ‘ora.cssd’ on ‘solarac2’ succeeded
CRS-2673: Attempting to stop ‘ora.gpnpd’ on ‘solarac2’
CRS-2677: Stop of ‘ora.gpnpd’ on ‘solarac2’ succeeded
CRS-2679: Attempting to clean ‘ora.gpnpd’ on ‘solarac2’
CRS-2681: Clean of ‘ora.gpnpd’ on ‘solarac2’ succeeded
CRS-2673: Attempting to stop ‘ora.gipcd’ on ‘solarac2’
CRS-2677: Stop of ‘ora.gipcd’ on ‘solarac2’ succeeded
CRS-2673: Attempting to stop ‘ora.mdnsd’ on ‘solarac2’
CRS-2677: Stop of ‘ora.mdnsd’ on ‘solarac2’ succeeded
Initial cluster configuration failed. See /u01/grid/11.2.0/cfgtoollogs/crsconfig/rootcrs_solarac2.log for details
I tried to run root.sh again which I shouldnít have done because it is documented not to do. (I have to confess that I did not read the installation document well)
The error stack was different like below
/u01/app/11.2.0/grid/root.sh
Running Oracle 11g root.sh script…
………
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.
2009-12-06 22:57:05: Parsing the host name
2009-12-06 22:57:05: Checking for super user privileges
2009-12-06 22:57:05: User has super user privileges
Using configuration parameter file: /u01/11.2.0/grid/crs/install/crsconfig_params
CRS is already configured on this node for crshome=0
Cannot configure two CRS instances on the same cluster.
Please deconfigure before proceeding with the configuration of new home.
As you see it didnít allow me to re-run it. I needed to find a way to deconfigure the configuration. After a quick search on official doc I found the way here.
According to the doc, all I needed to do is run the command below and re-run the root.sh
/crs/install/rootcrs.pl -deconfig
Here is what happened when I run deconfigure
2009-12-07 00:35:17: Parsing the host name
2009-12-07 00:35:17: Checking for super user privileges
2009-12-07 00:35:17: User has super user privileges
Using configuration parameter file: /u01/grid/11.2.0/crs/install/crsconfig_params
Oracle Clusterware stack is not active on this node
Restart the clusterware stack (use /u01/grid/11.2.0/bin/crsctl start crs) and retry
Failed to verify resources
Still wasnít working ??? I tried force option and it seemed like it de-configured successfully (maybe 🙂 )
/u01/grid/11.2.0/crs/install/rootcrs.pl -deconfig -force
2009-12-07 00:39:13: Parsing the host name
2009-12-07 00:39:13: Checking for super user privileges
2009-12-07 00:39:13: User has super user privileges
Using configuration parameter file: /u01/grid/11.2.0/crs/install/crsconfig_params
PRCR-1035 : Failed to look up CRS resource ora.cluster_vip.type for 1
PRCR-1068 : Failed to query resources
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.gsd is registered
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.ons is registered
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.eons is registered
Cannot communicate with crsd
CRS-4133: Oracle High Availability Services has been stopped.
Successfully deconfigured Oracle clusterware stack on this node
It says it did successfully deconfigured but when I run the root.sh again I got this
Disk Group DATA already exists. Cannot be created again
Configuration of ASM failed, see logs for details
Did not succssfully configure and start ASM
CRS-2500: Cannot stop resource ‘ora.crsd’ as it is not running
CRS-4000: Command Stop failed, or completed with errors.
Command return code of 1 (256) from command: /u01/grid/11.2.0/bin/crsctl stop resource ora.crsd -init
Stop of resource “ora.crsd -init” failed
Failed to stop CRSD
CRS-2500: Cannot stop resource ‘ora.asm’ as it is not running
CRS-4000: Command Stop failed, or completed with errors.
Command return code of 1 (256) from command: /u01/grid/11.2.0/bin/crsctl stop resource ora.asm -init
Stop of resource “ora.asm -init” failed
Failed to stop ASM
CRS-2673: Attempting to stop ‘ora.ctssd’ on ‘solarac1’
CRS-2677: Stop of ‘ora.ctssd’ on ‘solarac1’ succeeded
CRS-2673: Attempting to stop ‘ora.cssdmonitor’ on ‘solarac1’
CRS-2677: Stop of ‘ora.cssdmonitor’ on ‘solarac1’ succeeded
CRS-2673: Attempting to stop ‘ora.cssd’ on ‘solarac1’
CRS-2677: Stop of ‘ora.cssd’ on ‘solarac1’ succeeded
CRS-2673: Attempting to stop ‘ora.gpnpd’ on ‘solarac1’
CRS-2677: Stop of ‘ora.gpnpd’ on ‘solarac1’ succeeded
CRS-2673: Attempting to stop ‘ora.gipcd’ on ‘solarac1’
CRS-2677: Stop of ‘ora.gipcd’ on ‘solarac1’ succeeded
CRS-2673: Attempting to stop ‘ora.mdnsd’ on ‘solarac1’
CRS-2677: Stop of ‘ora.mdnsd’ on ‘solarac1’ succeeded
Initial cluster configuration failed. See /u01/grid/11.2.0/cfgtoollogs/crsconfig/rootcrs_solarac2.log for details
On the mentioned logfile it says
2009-12-07 00:43:26: Executing as grid: /u01/grid/11.2.0/bin/asmca -silent -diskGroupName DATA -diskList /dev/rdsk/c1t1d0s1,/dev/rdsk/c1t2d0s1,/dev/rdsk/c1t3
d0s1,/dev/rdsk/c1t4d0s1 -redundancy EXTERNAL -configureLocalASM
2009-12-07 00:43:26: Running as user grid: /u01/grid/11.2.0/bin/asmca -silent -diskGroupName DATA -diskList /dev/rdsk/c1t1d0s1,/dev/rdsk/c1t2d0s1,/dev/rdsk/c
1t3d0s1,/dev/rdsk/c1t4d0s1 -redundancy EXTERNAL -configureLocalASM
2009-12-07 00:43:26: Invoking “/u01/grid/11.2.0/bin/asmca -silent -diskGroupName DATA -diskList /dev/rdsk/c1t1d0s1,/dev/rdsk/c1t2d0s1,/dev/rdsk/c1t3d0s1,/d
ev/rdsk/c1t4d0s1 -redundancy EXTERNAL -configureLocalASM” as user “grid”
2009-12-07 00:43:30: Configuration of ASM failed, see logs for details
Basically it configures asm with asmca command. asmca utility does not have drop diskgroup option which makes it unusable for this situation. (there is deleteasm option but it does not work fine because it needs a working asm instance which wasnít possible after failed root.sh)
I didnít want to delete all CRS installation so I needed a way to remove diskgroup information from ASM disks?
All I needed was dd command to remove the disk header information from the devices.
I had 4 disk presented for that disk group so I used dd command for all of them (I am not sure maybe I needed only the firs device I need to check invaluable presentation of Julian Dyke about ASM Internals)
dd if=/dev/zero of=/dev/rdsk/c1t2d0s1 bs=1024K count=100
dd: bad numeric argument: “1024K”
bash-3.00# dd if=/dev/zero of=/dev/rdsk/c1t2d0s1 bs=1k count=1000000
1000000+0 records in
1000000+0 records out
dd if=/dev/zero of=/dev/rdsk/c1t1d0s1 bs=1k count=1000000
1000000+0 records in
1000000+0 records out
dd if=/dev/zero of=/dev/rdsk/c1t3d0s1 bs=1k count=1000000
1000000+0 records in
1000000+0 records out
dd if=/dev/zero of=/dev/rdsk/c1t4d0s1 bs=1k count=1000000
1000000+0 records in
1000000+0 records out
After this deletion I re-run the deconfigure script and re-run the root.sh. Everything worked fine without any problem at all. The story will continue with How to install 11GR2 RAC on Solaris 10 on VMware (give me a bit more time to finish)
footnoteSmilar issue reported on metalink for Linux ( ML 955550.1)