Oracle使用RMAN Advisor恢复current redo丢失故障

本文主要介绍通过RMAN Advisor恢复current redo丢失的故障。

环境说明:

    DB:Oracle 11.2.0.4.0
    OS:Red Hat Enterprise Linux Server release 7.5 (Maipo)

    准备环境:

    查看归档

      SQL> archive log list;
      Database log mode No Archive Mode
      Automatic archival Disabled
      Archive destination USE_DB_RECOVERY_FILE_DEST
      Oldest online log sequence 12
      Current log sequence 14

      启动归档

        [root@cjc-db-01 ~]# mkdir arch
        [root@cjc-db-01 ~]# chown oracle:oinstall arch
        sqlplus as sysdba
        alter system set log_archive_dest_1='location=/arch';
        alter system set log_archive_format = "cjc_%t_%s_%r.arc" scope=spfile;
        shutdown immediate;
        startup mount;
        alter database archivelog;
        alter database open;
        archive log list;
        alter system switch logfile;

        创建测试数据

          SQL> conn cjc/***
          SQL>
          create table t1(id number,time varchar2(100));
          insert into t1 values (1, to_char(sysdate, 'yyyy-mm-dd hh24:mi:ss'));
          insert into t1 values (2, to_char(sysdate, 'yyyy-mm-dd hh24:mi:ss'));
          insert into t1 values (3, to_char(sysdate, 'yyyy-mm-dd hh24:mi:ss'));
          commit;
          col time for a25
          select * from t1;
          ID TIME
          ---------- -------------------------
          1 2024-03-28 09:24:31
          2 2024-03-28 09:24:39
          3 2024-03-28 09:24:45

          执行rman全备份

            rman target
            run
            {
            allocate channel c1 type disk;
            allocate channel c2 type disk;
            backup incremental level = 0 format '/back/rman/rman_level0_%d_%T_%U.bak' database;
            sql 'alter system archive log current';
            backup archivelog all format '/back/rman/rman_arch_%d_%T_%U.bak';
            backup current controlfile format '/back/rman/rman_con_%d_%T_%U.bak';
            release channel c1;
            release channel c2;
            }

            备份后插入新数据

              insert into t1 values (4, to_char(sysdate, 'yyyy-mm-dd hh24:mi:ss'));
              insert into t1 values (5, to_char(sysdate, 'yyyy-mm-dd hh24:mi:ss'));
              commit;

              查看redo信息 

                SQL> col member for a50
                select a.GROUP#,a.STATUS,b.member from v$log a,v$logfile b where a.group#=b.group#;SQL>
                GROUP# STATUS MEMBER
                ---------- ---------------- --------------------------------------------------
                3 CURRENT oracle/app/oracle/oradata/cjc/redo03.log
                2 INACTIVE oracle/app/oracle/oradata/cjc/redo02.log
                1 INACTIVE oracle/app/oracle/oradata/cjc/redo01.log

                重命名current redo logfile,模拟故障

                  [oracle@cjc-db-01 cjc]$ cd oracle/app/oracle/oradata/cjc
                  [oracle@cjc-db-01 cjc]$ mv redo03.log redo03.log.1

                  继续插入数据,仍可以正常提交事物

                    insert into t1 values (6, to_char(sysdate, 'yyyy-mm-dd hh24:mi:ss'));
                    insert into t1 values (7, to_char(sysdate, 'yyyy-mm-dd hh24:mi:ss'));
                    commit;

                    手动杀掉pmon进程,模拟异常宕机

                      [oracle@cjc-db-01 cjc]$ ps -ef|grep pmon
                      oracle 17623 1 0 12:27 ? 00:00:00 ora_pmon_cjc
                      oracle 18287 11168 0 12:35 pts/3 00:00:00 grep --color=auto pmon
                      [oracle@cjc-db-01 cjc]$ kill -9 17623

                      重新启动数据库,启动失败

                        SQL> startup
                        ORACLE instance started.


                        Total System Global Area 2071076864 bytes
                        Fixed Size 2254784 bytes
                        Variable Size 536873024 bytes
                        Database Buffers 1526726656 bytes
                        Redo Buffers 5222400 bytes
                        Database mounted.
                        ORA-00313: open failed for members of log group 3 of thread 1
                        ORA-00312: online log 3 thread 1: '/oracle/app/oracle/oradata/cjc/redo03.log'
                        ORA-27037: unable to obtain file status
                        Linux-x86_64 Error: 2: No such file or directory
                        Additional information: 3

                        查看丢失的是current redo logfile

                          SQL> col member for a50
                          select a.GROUP#,a.STATUS,b.member from v$log a,v$logfile b where a.group#=b.group#;SQL>
                          GROUP# STATUS MEMBER
                          ---------- ---------------- --------------------------------------------------
                          1 INACTIVE oracle/app/oracle/oradata/cjc/redo01.log
                          3 CURRENT oracle/app/oracle/oradata/cjc/redo03.log
                          2 INACTIVE /oracle/app/oracle/oradata/cjc/redo02.log

                          GROUP# 3丢失,理论上最多能恢复到 1187045,也就是GROUP# 2的NEXT_CHANGE#

                            SQL> select GROUP#,SEQUENCE#,FIRST_CHANGE#,NEXT_CHANGE# from v$log order by 1;


                            GROUP# SEQUENCE# FIRST_CHANGE# NEXT_CHANGE#
                            ---------- ---------- ------------- ------------
                            1 7 1186324 1187037
                            2 8 1187037 1187045
                            3 9 1187045 2.8147E+14

                            使用RMAN Advisor工具进行恢复

                            查看故障,检测到Redo log group 3丢失

                              RMAN> list failure;


                              using target database control file instead of recovery catalog
                              List of Database Failures
                              =========================


                              Failure ID Priority Status Time Detected Summary
                              ---------- -------- --------- ------------- -------
                              1483 CRITICAL OPEN 28-MAR-24 Redo log group 3 is unavailable
                              1486 HIGH OPEN 28-MAR-24 Redo log file /oracle/app/oracle/oradata/cjc/redo03.log is missing

                              查看建议,生成Repair script

                                RMAN> advise failure;


                                List of Database Failures
                                =========================


                                Failure ID Priority Status Time Detected Summary
                                ---------- -------- --------- ------------- -------
                                1483 CRITICAL OPEN 28-MAR-24 Redo log group 3 is unavailable
                                1486 HIGH OPEN 28-MAR-24 Redo log file /oracle/app/oracle/oradata/cjc/redo03.log is missing


                                analyzing automatic repair options; this may take some time
                                allocated channel: ORA_DISK_1
                                channel ORA_DISK_1: SID=20 device type=DISK
                                analyzing automatic repair options complete


                                Mandatory Manual Actions
                                ========================
                                no manual actions available


                                Optional Manual Actions
                                =======================
                                1. If file /oracle/app/oracle/oradata/cjc/redo03.log was unintentionally renamed or moved, restore it


                                Automated Repair Options
                                ========================
                                Option Repair Description
                                ------ ------------------
                                1 Perform incomplete database recovery to SCN 1187045
                                Strategy: The repair includes point-in-time recovery with some data loss
                                Repair script: /oracle/app/oracle/diag/rdbms/cjc/cjc/hm/reco_3624337172.hm

                                查看Repair script,需要执行不完全恢复,scn 1187045,和之前查的一致

                                  [oracle@cjc-db-01 cjc]$ cat /oracle/app/oracle/diag/rdbms/cjc/cjc/hm/reco_3624337172.hm
                                  # database point-in-time recovery
                                  restore database until scn 1187045;
                                  recover database until scn 1187045;
                                  alter database open resetlogs;

                                  根据建议执行恢复

                                  可以手动恢复,也可以执行repair failure进行恢复

                                    RMAN> restore database until scn 1187045;
                                    RMAN> recover database until scn 1187045;
                                    RMAN> alter database open resetlogs;

                                    查询数据

                                      SQL> col time for a50
                                      SQL> select * from cjc.t1;


                                      ID TIME
                                      ---------- -----------------
                                      1 2024-03-28 12:30:14
                                      2 2024-03-28 12:30:14
                                      3 2024-03-28 12:30:14

                                      丢失了全备后新增的4条数据。

                                      ###chenjuchao 20240328###