ASM单实例由Oracle Restart引发的系列故障分析(Final Version)

ASM单实例由Oracle Restart引发的系列故障分析(Final Version)ASM单实例由Oracle Restart引发的系列故障分析(Final Version) 置顶 aaron8219 2013-08-26 03:35:50 5816 收藏 分类专栏: Oracle…

ASM单实例由Oracle Restart引发的系列故障分析(Final Version)

ASM单实例由Oracle Restart引发的系列故障分析(Final Version)

置顶 aaron8219 2013-08-26 03:35:50   5816   收藏
分类专栏: Oracle ASM
版权
今天重新打开上次安装完的一个ASM单实例环境,突然报错

SQL> startup
ORA-01078: failure in processing system parameters
ORA-01565: error in identifying file “+DATA/asmsingle/spfileasmsingle.ora”
ORA-17503: ksfdopn:2 Failed to open file +DATA/asmsingle/spfileasmsingle.ora
ORA-29701: unable to connect to Cluster Synchronization Service

以为是监听问题造成的无法读取asm磁盘组,于是运行netca删除监听和服务名,准备重新创建。

当我删除监听后,准备再重新创建一个时,报了一个错误,如下图:

 

需要重启Oracle Restart后再配置netca,否则不能注册到Oracle Restart

google了一下,Oracle Restart是11gR2用来管理单实例组件的一个组件,在装完Grid Infrastructure以后自动安装的

但是我的环境现在crsctl命令也不能用,看来是GI环境配置有点问题

在安装GI的时候,执行完root.sh脚本后,通常要执行一下roothas.pl,如:

[root@dg1 ~]# /u01/app/11.2.0/grid/root.sh
Performing root user operation for Oracle 11g

The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME=  /u01/app/11.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of “dbhome” have not changed. No need to overwrite.
The contents of “oraenv” have not changed. No need to overwrite.
The contents of “coraenv” have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.

To configure Grid Infrastructure for a Stand-Alone Server run the following command as the root user:
/u01/app/11.2.0/grid/perl/bin/perl -I/u01/app/11.2.0/grid/perl/lib -I/u01/app/11.2.0/grid/crs/install /u01/app/11.2.0/grid/crs/install/roothas.pl

To configure Grid Infrastructure for a Cluster execute the following command:
/u01/app/11.2.0/grid/crs/config/config.sh
This command launches the Grid Infrastructure Configuration Wizard. The wizard also supports silent operation, and the parameters can be passed through the response file that is available in the installation media.

[root@dg1 ~]# /u01/app/11.2.0/grid/crs/install/roothas.pl
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
User ignored Prerequisites during installation
Improper Oracle Grid Infrastructure configuration found on this host
Deconfigure the existing cluster configuration before starting
to configure a new Grid Infrastructure
run “/u01/app/11.2.0/grid/crs/install/roothas.pl -deconfig”
to configure existing failed configuration and then rerun root.sh

根据执行roothas.pl失败的指示,先用-deconfig参数来删除原来GI的配置:

[root@dg1 ~]# /u01/app/11.2.0/grid/crs/install/roothas.pl -deconfig
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
Oracle Restart stack is not active on this node
Restart the SIHA stack (use /u01/app/11.2.0/grid/bin/crsctl start has) and retry
Failed to write the checkpoint:”” with status:FAIL.Error code is 256
Failed to verify HA resources

注意上面红色部分字体,提示Oracle Restart并没有在本节点启动,也无法删除原有GI配置,难道没办法删除了吗?

[root@dg1 ~]# /u01/app/11.2.0/grid/crs/install/roothas.pl -deconfig -force -verbose
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Stop failed, or completed with errors.
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Delete failed, or completed with errors.
CLSU-00100: Operating System function: opendir failed with error data: 2
CLSU-00101: Operating System error message: No such file or directory
CLSU-00103: error location: scrsearch1
CLSU-00104: additional error information: cant open scr home dir scls_scr_getval
CRS-4544: Unable to connect to OHAS
CRS-4000: Command Stop failed, or completed with errors.
Successfully deconfigured Oracle Restart stack

发现提示不一样了,已经顺利地重新配置了Oracle Restart,然后重新执行roothas.pl来启动Oracle Restart:

[root@dg1 ~]# /u01/app/11.2.0/grid/crs/install/roothas.pl
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
User ignored Prerequisites during installation
LOCAL ADD MODE
Creating OCR keys for user “grid”, privgrp “oinstall”..
Operation successful.
LOCAL ONLY MODE
Successfully accumulated necessary OCR keys.
Creating OCR keys for user “root”, privgrp “root”..
Operation successful.
CRS-4664: Node dg1 successfully pinned.
Adding Clusterware entries to inittab

dg1     2013/08/25 23:30:16     /u01/app/11.2.0/grid/cdata/dg1/backup_20130825_233016.olr
Successfully configured Oracle Grid Infrastructure for a Standalone Server

成功地重新配置完成了一个Stand-Alone,此时crsctl命令已经可以使用了,再执行root.sh中提示的那条命令就会报错了,提示已经配置了CRS:

[root@dg1 ~]# /u01/app/11.2.0/grid/perl/bin/perl -I/u01/app/11.2.0/grid/perl/lib -I/u01/app/11.2.0/grid/crs/install /u01/app/11.2.0/grid/crs/install/roothas.pl
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
User ignored Prerequisites during installation
CRS is already configured on this node for crshome=/u01/app/11.2.0/grid
Cannot configure two CRS instances on the same cluster.
Please deconfigure before proceeding with the configuration of new home.

[grid@dg1 ~]$ crsctl check has
CRS-4638: Oracle High Availability Services is online

[grid@dg1 ~]$ crsctl status res -t
——————————————————————————–
NAME           TARGET  STATE        SERVER                   STATE_DETAILS      
——————————————————————————–
Local Resources
——————————————————————————–
ora.ons
               OFFLINE OFFLINE      dg1                                          
——————————————————————————–
Cluster Resources
——————————————————————————–
ora.cssd
      1        OFFLINE OFFLINE                                                  
ora.diskmon
      1        OFFLINE OFFLINE                                                  
ora.evmd
      1        ONLINE  ONLINE       dg1

Oracle Restart问题暂时解决了,但是随后又发现一个问题:

[grid@dg1 ~]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :        152
         Available space (kbytes) :     261968
         ID                       : 1711295372
         Device/File Name         : /u01/app/11.2.0/grid/cdata/localhost/local.ocr
                                    Device/File integrity check succeeded

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

         Cluster registry integrity check succeeded

         Logical corruption check bypassed due to non-privileged user

按理说OCR磁盘的路径应该在+DATA/asmsingle上的,怎么变成/u01/…这个本地路径了,而且grid用于居然执行不了下面这条命令:

[grid@dg1 ~]$ crsctl query css votedisk
Parse error:
  “css” is an invalid argument

此时突然醒悟,怎么这里主机名是dg1呢?明明之前配置过/etc/hosts里的主机名是asm-single的,查看一下:

[root@dg1 ~]# cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1       localhost.localdomain localhost
::1               localhost6.localdomain6 localhost6
192.168.1.99    asm-single

用hostname重新改一下:

[root@dg1 ~]# hostname asm-single

[root@dg1 ~]#su – grid

[grid@dg1 ~]# exit

[root@asm-single ~]#

修改完以后切换下用户就显示正确的主机名了,再查看一下GI资源情况:

[grid@asm-single ~]$ crsctl status res -t
——————————————————————————–
NAME           TARGET  STATE        SERVER                   STATE_DETAILS      
——————————————————————————–
Local Resources
——————————————————————————–
ora.ons
               OFFLINE OFFLINE      dg1                                          
——————————————————————————–
Cluster Resources
——————————————————————————–
ora.cssd
      1        OFFLINE OFFLINE                                                  
ora.diskmon
      1        OFFLINE OFFLINE                                                  
ora.evmd
      1        ONLINE  ONLINE       dg1

仍然是dg1作为主机名,看来要重新执行roothas.pl再配置一次了

[root@asm-single ~]# /u01/app/11.2.0/grid/crs/install/roothas.pl -deconfig -force -verbose
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
Failed to write the checkpoint:”” with status:FAIL.Error code is 256
Can”t open /etc/oracle/scls_scr/asm-single/grid/ohasdstr for write: No such file or directory at /u01/app/11.2.0/grid/crs/install/s_crsconfig_lib.pm line 1332.

该命令无法执行,提示找不到asm-single路径,查看了一下,确实没有asm-single这一路径:

[root@asm-single ~]# cd /etc/oracle/scls_scr

[root@asm-single scls_scr]# ll
total 4
drwxr-x— 4 root oinstall 4096 Aug 25 23:29 dg1
[root@asm-single scls_scr]#

既然已经认定主机是dg1,那么就先将错就错一下吧,把主机名改回dg1,直接修改/etc/hosts中的内容

然后用NETCA重新配置一下监听和NET SERVICE NAME,并把监听和数据库实例添加到Oracle Restart配置中:

[grid@dg1 ~]$ srvctl add database -d asmsingle -o /u01/app/oracle/product/11.2.0/dbhome_1
PRCD-1025 : Failed to create database asmsingle
PRKH-1014 : Current user “grid” is not the oracle owner user “oracle” of oracle home “/u01/app/oracle/product/11.2.0/dbhome_1”

应该是grid用户没有操作ORACLE_HOME的权限,用oracle用户试一下:

[oracle@dg1 ~]$ srvctl add database -d asmsingle -o /u01/app/oracle/product/11.2.0/dbhome_1

没有报错,此时再用grid用户查看一下Oracle Restart资源配置情况:

[grid@dg1 ~]$ crsctl status res -t
——————————————————————————–
NAME           TARGET  STATE        SERVER                   STATE_DETAILS      
——————————————————————————–
Local Resources
——————————————————————————–
ora.LISTENER.lsnr
               ONLINE  ONLINE       dg1                                          
ora.ons
               OFFLINE OFFLINE      dg1                                          
——————————————————————————–
Cluster Resources
——————————————————————————–
ora.asmsingle.db
      1        OFFLINE OFFLINE                                                  
ora.cssd
      1        OFFLINE OFFLINE                                                  
ora.diskmon
      1        OFFLINE OFFLINE                                                  
ora.evmd
      1        ONLINE  ONLINE       dg1

可以看到,数据库实例已经添加进去了,同时监听也自动进来了,但是ASM和磁盘组都还没有,试试启动CRS

[grid@dg1 ~]$ crsctl start crs
CRS-4013: This command is not supported in a single-node configuration.
CRS-4000: Command Start failed, or completed with errors.

这是RAC中使用的命令,这里单实例ASM不能用它来启动全部资源,查看一下数据库的配置:

[grid@dg1 ~]$ srvctl config database -d asmsingle
Database unique name: asmsingle
Database name:
Oracle home: /u01/app/oracle/product/11.2.0/dbhome_1
Oracle user: oracle
Spfile:
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Database instance: asmsingle
Disk Groups:
Services:
[grid@dg1 ~]$

此处可以看到,与ASM有关的都没有相应内容,先启动ASM实例,配置磁盘组DG

[grid@dg1 ~]$ export ORACLE_SID=+ASM
[grid@dg1 ~]$ sqlplus “/as sysdba”

SQL*Plus: Release 11.2.0.3.0 Production on Mon Aug 26 01:27:04 2013

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup
ORA-01078: failure in processing system parameters
ORA-29701: unable to connect to Cluster Synchronization Service

启动错误,无法连接CSS,查看一下CSS进程:

SQL> !
[grid@dg1 ~]$ crsctl check css
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
[grid@dg1 ~]$ ps -ef|grep cssd
grid     13704 13680  0 01:28 pts/5    00:00:00 grep cssd
[grid@dg1 ~]$ ps -ef|grep has
root     11620     1  0 Aug25 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
grid     11652     1  0 Aug25 ?        00:00:21 /u01/app/11.2.0/grid/bin/ohasd.bin reboot
grid     13706 13680  0 01:28 pts/5    00:00:00 grep has
[grid@dg1 ~]$ ps -ef|grep d.bin
grid     11652     1  0 Aug25 ?        00:00:21 /u01/app/11.2.0/grid/bin/ohasd.bin reboot
grid     11834     1  0 Aug25 ?        00:00:06 /u01/app/11.2.0/grid/bin/oraagent.bin
grid     11850     1  0 Aug25 ?        00:00:00 /u01/app/11.2.0/grid/bin/evmd.bin
grid     11887 11850  0 Aug25 ?        00:00:00 /u01/app/11.2.0/grid/bin/evmlogger.bin -o /u01/app/11.2.0/grid/evm/log/evmlogger.info -l /u01/app/11.2.0/grid/evm/log/evmlogger.log
grid     13156     1  0 00:51 ?        00:00:00 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER -inherit
grid     13714 13680  0 01:28 pts/5    00:00:00 grep d.bin

[grid@dg1 ~]$ crsctl check has
CRS-4638: Oracle High Availability Services is online

has进程正常启动的,但是没有找到cssd的demon进程

[grid@dg1 ~]$ crs_stat -t
Name           Type           Target    State     Host        
————————————————————
ora….ER.lsnr ora….er.type ONLINE    ONLINE    dg1        
ora….ngle.db ora….se.type OFFLINE   OFFLINE              
ora.cssd       ora.cssd.type  OFFLINE   OFFLINE              
ora.diskmon    ora….on.type OFFLINE   OFFLINE              
ora.evmd       ora.evm.type   ONLINE    ONLINE    dg1        
ora.ons        ora.ons.type   OFFLINE   OFFLINE      

这里发现cssd和diskmon都是OFFLINE的,这2个服务是依赖于HAS维护的,分别查看这2个进程的参数:        

[grid@dg1 ~]$ crs_stat -p ora.cssd
NAME=ora.cssd
TYPE=ora.cssd.type
ACTION_SCRIPT=
ACTIVE_PLACEMENT=0
AUTO_START=never
CHECK_INTERVAL=30
DESCRIPTION=”Resource type for CSSD”
FAILOVER_DELAY=0
FAILURE_INTERVAL=3
FAILURE_THRESHOLD=5
HOSTING_MEMBERS=
PLACEMENT=balanced
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=600
START_TIMEOUT=600
STOP_TIMEOUT=900
UPTIME_THRESHOLD=1m

[grid@dg1 ~]$ crs_stat -p ora.diskmon
NAME=ora.diskmon
TYPE=ora.diskmon.type
ACTION_SCRIPT=
ACTIVE_PLACEMENT=0
AUTO_START=never
CHECK_INTERVAL=3
DESCRIPTION=”Resource type for Diskmon”
FAILOVER_DELAY=0
FAILURE_INTERVAL=3
FAILURE_THRESHOLD=5
HOSTING_MEMBERS=
PLACEMENT=balanced
RESTART_ATTEMPTS=10
SCRIPT_TIMEOUT=60
START_TIMEOUT=600
STOP_TIMEOUT=60
UPTIME_THRESHOLD=5s

发现他们并不会随HAS启动而自己启动,于是手动启动它们:

[grid@dg1 ~]$ crsctl start res ora.cssd
CRS-2672: Attempting to start “ora.cssd” on “dg1”
CRS-2672: Attempting to start “ora.diskmon” on “dg1”
CRS-2676: Start of “ora.diskmon” on “dg1” succeeded
CRS-2676: Start of “ora.cssd” on “dg1” succeeded

这里发现启动了cssd后,diskmon也被启动了,其实他们是被绑在一起的2个服务,随便先启动哪个,另一个也会跟着启动,查看一下HAS资源情况

[grid@dg1 ~]$ crs_stat -t
Name           Type           Target    State     Host        
————————————————————
ora….ER.lsnr ora….er.type ONLINE    ONLINE    dg1        
ora….ngle.db ora….se.type OFFLINE   OFFLINE              
ora.cssd       ora.cssd.type  ONLINE    ONLINE    dg1        
ora.diskmon    ora….on.type OFFLINE   OFFLINE              
ora.evmd       ora.evm.type   ONLINE    ONLINE    dg1        
ora.ons        ora.ons.type   OFFLINE   OFFLINE              

但事实好像有点出入,尽管现实diskmon也是succeeded,但这里仍然是OFFLINE状态,就算单独再启动一次diskmon,也是一样

[grid@dg1 ~]$ crsctl start res ora.diskmon
CRS-2672: Attempting to start “ora.diskmon” on “dg1”
CRS-2676: Start of “ora.diskmon” on “dg1” succeeded
[grid@dg1 ~]$ crs_stat -t
Name           Type           Target    State     Host        
————————————————————
ora….ER.lsnr ora….er.type ONLINE    ONLINE    dg1        
ora….ngle.db ora….se.type OFFLINE   OFFLINE              
ora.cssd       ora.cssd.type  ONLINE    ONLINE    dg1        
ora.diskmon    ora….on.type OFFLINE   OFFLINE              
ora.evmd       ora.evm.type   ONLINE    ONLINE    dg1        
ora.ons        ora.ons.type   OFFLINE   OFFLINE

先不管diskmon服务了,用grid启动ASM实例:

[grid@dg1 ~]$ sqlplus “/as sysdba”

SQL*Plus: Release 11.2.0.3.0 Production on Mon Aug 26 01:46:13 2013

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup
ORA-00099: warning: no parameter file specified for ASM instance
ORA-01031: insufficient privileges
SQL> exit
Disconnected

注意:这里必须要用sysasm来连接ASM实例,在管理ASM时,他的权限比sysdba还要大

[grid@dg1 ~]$ export ORACLE_SID=+ASM
[grid@dg1 ~]$ sqlplus “/as sysasm”

SQL*Plus: Release 11.2.0.3.0 Production on Mon Aug 26 01:46:57 2013

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup
ORA-00099: warning: no parameter file specified for ASM instance
ASM instance started

Total System Global Area  283930624 bytes
Fixed Size                  2227664 bytes
Variable Size             256537136 bytes
ASM Cache                  25165824 bytes
ORA-15110: no diskgroups mounted

先查看一下磁盘组,那些磁盘没有挂载,这里并没有显示:

SQL> select name,state,type from v$asm_diskgroup;

no rows selected

因为环境是自己搭建的,所以知道有2个磁盘组,分别是DATA和FRA,直接进行挂载:

SQL> alter diskgroup DATA mount;
alter diskgroup DATA mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15017: diskgroup “DATA” cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup “DATA”

SQL> alter diskgroup FRA mount;
alter diskgroup FRA mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15017: diskgroup “FRA” cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup “FRA”

DATA和FRA是之前为单实例创建的2个磁盘组,用的是EXTERNAL模式,DATA给了4个盘,FRA给了3个盘,每个盘都是3G

现在挂载失败,提示发现数量不足,EXTERNAL模式应该没理由盘不够啊,查看一下asm_diskstring参数:

SQL> show parameter string

NAME                                 TYPE        VALUE
———————————— ———– ——————————
asm_diskstring                       string

原来asm_diskstring参数值为空,难怪说磁盘数量不足,原来是找不到磁盘,一个盘都没有,数量当然不足了

SQL> alter system set asm_diskstring=”/dev/asm*”;                              

System altered.

SQL> show parameter string

NAME                                 TYPE        VALUE
———————————— ———– ——————————
asm_diskstring                       string      /dev/asm*
SQL> alter system set asm_diskstring=”/dev/asm*” scope=both;
alter system set asm_diskstring=”/dev/asm*” scope=both
*
ERROR at line 1:
ORA-32001: write to SPFILE requested but no SPFILE is in use

这里本来是想把变更加到spfile里去的,但是这里并不是用spfile启动的,所以报错了,先不管,再试试挂载一下磁盘组

SQL> alter diskgroup DATA mount;

Diskgroup altered.

SQL> alter diskgroup FRA mount;

Diskgroup altered.

此处其实还有另外一条命令可以用:

SQL> alter system set asm_diskgroups=data,fra;(用此法加入磁盘组后是否直接是挂载状态还有待测试)

再查看一下数据库的配置情况:

[grid@dg1 ~]$ srvctl config database -t asmsingle
PRKO-2002 : Invalid command line option: -t
[grid@dg1 ~]$ srvctl config database -d asmsingle
Database unique name: asmsingle
Database name:
Oracle home: /u01/app/oracle/product/11.2.0/dbhome_1
Oracle user: oracle
Spfile:
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Database instance: asmsingle
Disk Groups:
Services:

磁盘组依旧没有添加进内容,查看参数asm_diskgroup已经有这2个磁盘组了

SQL> show parameter asm_diskgroup

NAME                                 TYPE        VALUE
———————————— ———– ——————————
asm_diskgroups                       string      DATA, FRA

 

SQL> select name,state,type from v$asm_diskgroup;

NAME                           STATE       TYPE
—————————— ———– ——
DATA                           MOUNTED     EXTERN
FRA                            MOUNTED     EXTERN

SQL> !
[grid@dg1 ~]$ asmcmd
ASMCMD> lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576     12288    10350                0           10350              0             N  DATA/
MOUNTED  EXTERN  N         512   4096  1048576      9216     8984                0            8984              0             N  FRA/

磁盘组已经正常挂载了,由于是用pfile启动的,想把结果保存到spfile中去:

SQL> show parameter spfile

NAME                                 TYPE        VALUE
———————————— ———– ——————————
spfile                               string
SQL> create spfile from pfile
  2  ;
create spfile from pfile
*
ERROR at line 1:
ORA-29786: SIHA attribute GET failed with error [Attribute “SPFILE” sts[200]
lsts[0]]

可能是由于ASM资源并没有注册到HAS

[grid@dg1 ~]$ srvctl config asm
PRCR-1001 : Resource ora.asm does not exist
[grid@dg1 ~]$ srvctl add asm
[grid@dg1 ~]$ srvctl config asm
ASM home: /u01/app/11.2.0/grid
ASM listener: LISTENER
Spfile:
ASM diskgroup discovery string: ++no-value-at-resource-creation–never-updated-through-ASM++

这里不知道为何没有值,按理说应该已经能够识别到asm_diskstring了

ASM的alert日志在路径/u01/app/grid/diag/asm/+asm/+ASM/trace/alert_+ASM.log

查看了一下,有如下报错信息:

Mon Aug 26 02:25:32 2013
NOTE: failed to discover disks from gpnp profile asm diskstring
Errors in file /u01/app/grid/diag/asm/+asm/+ASM/trace/+ASM_rbal_14046.trc:
ORA-29786: SIHA attribute GET failed with error [Attribute “ASM_DISKSTRING” sts[200] lsts[0]]

[grid@dg1 ~]$ srvctl status asm
ASM is not running.
[grid@dg1 ~]$ srvctl start asm

[grid@dg1 ~]$ srvctl status asm
ASM is running on dg1

[grid@dg1 ~]$ crs_stat -t
Name           Type           Target    State     Host        
————————————————————
ora….ER.lsnr ora….er.type ONLINE    ONLINE    dg1        
ora.asm        ora.asm.type   ONLINE    ONLINE    dg1        
ora….ngle.db ora….se.type OFFLINE   OFFLINE              
ora.cssd       ora.cssd.type  ONLINE    ONLINE    dg1        
ora.diskmon    ora….on.type OFFLINE   OFFLINE              
ora.evmd       ora.evm.type   ONLINE    ONLINE    dg1        
ora.ons        ora.ons.type   OFFLINE   OFFLINE

SQL> create spfile from pfile;
create spfile from pfile
*
ERROR at line 1:
ORA-01078: failure in processing system parameters
LRM-00109: could not open parameter file
“/u01/app/11.2.0/grid/dbs/init+ASM.ora”

查看了一下路径,确实没有init+ASM.ora这个pfile,所以用pfile来创建spfile这个方法看来不行了

SQL> create spfile from memory;

File created.

但是可以从内存创建spfile,但是要注意,这条命令执行以后,会在$GRID_HOME/dbs创建spfile,全名是”spfile+ASM.ora“

对于单实例而言,这个问题还不大,如果是RAC环境,则必须要在spfile文件名后面指定在ASM中的路径,如:”+DATA/asmsingle/spfile+ASM.ora“

那么这里,我再重新创建一个spfile,来修改默认的spfile存放路径,这里涉及到GPNP profile内容的更新问题,原则是,会按照最新的保存位置来更新spfile

SQL> shutdown immediate

ASM diskgroups dismounted
ASM instance shutdown
SQL> startup
ORA-32004: obsolete or deprecated parameter(s) specified for ASM instance
ASM instance started

Total System Global Area  283930624 bytes
Fixed Size                  2227664 bytes
Variable Size             256537136 bytes
ASM Cache                  25165824 bytes
ASM diskgroups mounted
SQL> show parameter spfile

NAME                                 TYPE        VALUE
———————————— ———– ——————————
spfile                               string      /u01/app/11.2.0/grid/dbs/spfil
                                                 e+ASM.ora

我们可以来看一下ASM中文件的存放路径:

ASMCMD> ls
DATA/
FRA/
ASMCMD> cd data
ASMCMD> ls
ASM/
ASMSINGLE/
ASMCMD> cd asmsingle
ASMCMD> ls
CONTROLFILE/
DATAFILE/
ONLINELOG/
PARAMETERFILE/
TEMPFILE/
spfileasmsingle.ora

ASMCMD> cd parameterfile

ASMCMD> ls
spfile.266.824164223

注意,如果是RAC环境,刚才的创建spfile的语句就要改成:SQL> create spfile=”+DATA/asmsingle/” from memory;

否则会造成RAC中其他节点无法访问spfile而破坏RAC环境,这个spfile.266.824164223的文件是系统自动生成的,具体作用还未研究过

注意这里RAC用的spfile是上面那个spfileasmsingle.ora,这点千万不要搞错了

做完以上全部操作以后,再启动数据库:

[oracle@dg1 ~]$ sqlplus “/as sysdba”

SQL*Plus: Release 11.2.0.3.0 Production on Mon Aug 26 03:26:14 2013

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup
ORACLE instance started.

Total System Global Area  313159680 bytes
Fixed Size                  2227944 bytes
Variable Size             117440792 bytes
Database Buffers          188743680 bytes
Redo Buffers                4747264 bytes
Database mounted.
Database opened.

[grid@dg1 ~]$ srvctl config database -d asmsingle
Database unique name: asmsingle
Database name:
Oracle home: /u01/app/oracle/product/11.2.0/dbhome_1
Oracle user: oracle
Spfile:
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Database instance: asmsingle
Disk Groups: DATA,FRA
Services:

[grid@dg1 ~]$ crs_stat -t
Name           Type           Target    State     Host        
————————————————————
ora.DATA.dg    ora….up.type ONLINE    ONLINE    dg1        
ora.FRA.dg     ora….up.type ONLINE    ONLINE    dg1        
ora….ER.lsnr ora….er.type ONLINE    ONLINE    dg1        
ora.asm        ora.asm.type   ONLINE    ONLINE    dg1        
ora….ngle.db ora….se.type ONLINE    ONLINE    dg1        
ora.cssd       ora.cssd.type  ONLINE    ONLINE    dg1        
ora.diskmon    ora….on.type OFFLINE   OFFLINE             
ora.evmd       ora.evm.type   ONLINE    ONLINE    dg1        
ora.ons        ora.ons.type   OFFLINE   OFFLINE  

OK,故障解决

PS:关于本测试还有2个疑问

1.测试重新在ASM上创建一个spfile,看默认是不是用这个spfile启动,即GPNP profile有没有生效(已解决)

ASMCMD> spget
/u01/app/11.2.0/grid/dbs/spfile+ASM.ora
ASMCMD> pwd
+data/asmsingle –注意这里是数据库实例,是我搞错了,应该进入的路径是+data/asm
ASMCMD> ls -l
Type           Redund  Striped  Time             Sys  Name
                                                 Y    CONTROLFILE/
                                                 Y    DATAFILE/
                                                 Y    ONLINELOG/
                                                 Y    PARAMETERFILE/
                                                 Y    TEMPFILE/
                                                 N    spfileasmsingle.ora => +DATA/ASMSINGLE/PARAMETERFILE/spfile.266.824164223

spfileasmsingle.ora->spfile.266.824164223(OMF格式)是一个映射关系,都是数据库实例的spfile

ASMCMD> spset +DATA/asmsingle/spfileasmsingle.ora –之前错把数据库实例的spfile当做了asm实例的spfile,这里spset了一个错误的路径
ASMCMD> spget
+DATA/asmsingle/spfileasmsingle.ora
ASMCMD> exit
[grid@dg1 ~]$ srvctl config asm -a
ASM home: /u01/app/11.2.0/grid
ASM listener: LISTENER
Spfile: +DATA/asmsingle/spfileasmsingle.ora
ASM diskgroup discovery string: ++no-value-at-resource-creation–never-updated-through-ASM++
ASM is enabled.

(注:上面的spfile其实是数据库实例的,而并不是asm实例的,所以这里ASM diskgroup discovery string提示没有值)

SQL> show parameter diskgroup

NAME                                 TYPE        VALUE
———————————— ———– ——————————
asm_diskgroups                       string      DATA, FRA
SQL> show parameter diskstring

NAME                                 TYPE        VALUE
———————————— ———– ——————————
asm_diskstring                       string     /dev/asm*  –其实是有diskgroup string的

用spset设置spfile新的路径后是即时生效的,但前提是设置的这个spfile必须本来就是存在的,否则,只有通过下面的办法先创建一个ASM实例的spfile:

SQL>create spfile=”+DATA/asm/asmparameterfile/asmspfile.ora” from pfile=”$ORACLE_HOME/dbs/init+ASM.ora”;

SQL> create spfile=”+DATA/asm/asmparameterfile/asmspfile.ora” from pfile; –此处可以省略pfile的路径,默认就是去dbs路径下去找init+ASM.ora文件的

File created.

SQL> !
[grid@dg1 ~]$ asmcmd
ASMCMD> cd data/asm
ASMCMD> ls
ASMPARAMETERFILE/
ASMCMD> cd asmparameterfile
ASMCMD> ls
REGISTRY.253.824517131
asmspfile.ora
ASMCMD> spget
+DATA/asm/asmparameterfile/asmspfile.ora

可以看到,创建完后会直接更新GPNP profile,现在ASM实例的spfile已经是新指定的+DATA路径了
这里REGISTRY.253.824517131是系统自动生成的ASM实例的spfile文件,是OMF格式的,下面的asmspfile.ora是刚才那条命令创建的
此时如果以不带参数文件的startup启动ASM实例,则使用的是+DATA这一ASM磁盘上的spfile,而不再是之前本地磁盘上的spfile了

SQL> startup force
ORA-32004: obsolete or deprecated parameter(s) specified for ASM instance
ASM instance started

Total System Global Area  283930624 bytes
Fixed Size                  2227664 bytes
Variable Size             256537136 bytes
ASM Cache                  25165824 bytes
ASM diskgroups mounted
SQL> show parameter spfile

NAME                                 TYPE        VALUE
———————————— ———– ——————————
spfile                               string      +DATA/asm/asmparameterfile/asm
                                                 spfile.ora
注意:这里已经用了刚才新创建的spfile指定的路径了,之前的理论得到验证

ASMCMD> cd +data/asm/asmparameterfile
ASMCMD> pwd
+data/asm/asmparameterfile
ASMCMD> ls -l
Type              Redund  Striped  Time             Sys  Name
ASMPARAMETERFILE  UNPROT  COARSE   AUG 27 00:00:00  Y    REGISTRY.253.824517131
                                                    N    asmspfile.ora => +DATA/ASM/ASMPARAMETERFILE/REGISTRY.253.824517131
当我们在ASM磁盘组上创建spfile后,不仅是更新GPNP profile后直接生效,还会把新建的spfile和系统原有的spifle做一个映射,asmspfile.ora->REGISTRY.253.824517131的映射,这个和数据库实例的spifle是类似的

最后在来验证一下:
ASMCMD> exit
[grid@dg1 ~]$ srvctl config asm -a
ASM home: /u01/app/11.2.0/grid
ASM listener: LISTENER
Spfile: +DATA/asm/asmparameterfile/asmspfile.ora
ASM diskgroup discovery string: /dev/asm*
ASM is enabled.
这时候发现ASM diskgroup discovery string里的值已经可以正确识别了,因为配置了正确的ASM实例的spfile

2.为何状态都正常的情况下,ora.diskmon还是OFFLINE状态(已解决)

关于ora.diskmon这个resource是专门为EXADATA环境准备的,对于11.2.0.3的非EXADATA环境,默认是被DISABLE的,我也是查了一篇帖子才知道,

它和CSS有依赖关系,虽然CSS启动的时候diskmon也显示succeeded,但是默认不会启动,让我困惑了好一会

这个是关于ora.diskmon在非EXADATA环境会DISABLE的链接:http://blog.csdn.net/mrluoe/article/details/7525457

后记:当从pfile创建spfile后,启动asm实例时,又引发了另一个错误:ORA-32004: obsolete or deprecated parameter(s) specified for ASM instance

为ASM指定的参数过期或废弃了

SQL> set line 200 pages 999
SQL> col name for a30
SQL> col value for a50
SQL> select t.NAME,t.value from v$parameter t where t.ISDEPRECATED=”TRUE”;

NAME                           VALUE
—————————— ————————————————–
lock_name_space
remote_os_authent              FALSE
background_dump_dest           /u01/app/grid/diag/asm/+asm/+ASM/trace
user_dump_dest                 /u01/app/grid/diag/asm/+asm/+ASM/trace
sql_trace                      FALSE

发现有5个过期参数,asm实例的报警日志里也记录了2个参数,内容如下:

Deprecated system parameters with specified values:
  background_dump_dest    
  user_dump_dest          
End of deprecated system parameter listing

都共同指向了background_dump_dest和user_dump_dest,那我们只要把这两个参数注释掉,就可以解决问题了

先修改pfile,把其中的关于这两个参数的行注释掉,然后重新用pfile再创建一次spfile,再用spfile启动:

SQL> startup force pfile=”/u01/app/11.2.0/grid/dbs/init+ASM.ora” –用注释掉那2个参数的本地pfile启动
ASM instance started

Total System Global Area  283930624 bytes

Fixed Size                  2227664 bytes
Variable Size             256537136 bytes
ASM Cache                  25165824 bytes
ASM diskgroups mounted

SQL> create spfile=”+DATA/asm/asmparameterfile/asmspfile.ora” from pfile; –重新用修改好的pfile创建一次spfile

File created.

SQL> startup force –重新启动后,不报ORA-32004错误了
ASM instance started

Total System Global Area  283930624 bytes
Fixed Size                  2227664 bytes
Variable Size             256537136 bytes
ASM Cache                  25165824 bytes
ASM diskgroups mounted

至此,一路碰到的ASM单实例故障全部解决,完!
————————————————
版权声明:本文为CSDN博主「aaron8219」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/aaron8219/java/article/details/10313397

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。
转载请注明出处: https://daima100.com/7444.html

(0)
上一篇 2023-03-20 11:30
下一篇 2023-03-20

相关推荐

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注