Previous Topic

Next Topic

Book Contents

Book Index

Recovery Scenario Example

On January 30, at 11:36 A.M., after applying some maintenance on LPAR SYSA, CMEM stopped working. By 12:10 P.M. the problem was resolved and the CMEM monitor was brought back up.

Since the site has many incoming FTP datasets during the day, it was necessary to recover those events that are usually caught by CMEM (using DSNEVENT rules), but were lost during the CMEM outage.

Note: BMC recommends that you initiate the recovery process as close as possible to the outage, although it is possible to do so in a later stage.

Since batch work and FTP transfers were a little delayed around the outage time (11:36 A.M.), as they mostly run in a discretionary Service Class and the LPAR was highly utilized, the site estimated that jobs were denied CPU for no more than 2 minutes, and therefore set the delay duration to 2 minutes.

To make sure they recovered all the lost DSNEVENTs the site did the following:

  1. Issued a Switch SMF command in SYSA.

    The switch ensured that the SMF records from the outage period were written to a sequential dataset (probably a new generation of a GDG).

  2. Submitted sample job CTMEVRT with the following input (making sure that both the INCLTIME SYSIN parameter and the DASMF DD allocations for the SMF generations cover the delay duration (2 minutes prior to the CMEM outage, 11:34 – 11:36) and the outage duration (11:36 – 12:10):

    //DASMF    DD   DISP=SHR,DSN=SMF.SYSA(0)

    //              DISP=SHR,DSN=SMF.SYSA(-1)

    //              DISP=SHR,DSN=SMF.SYSA(-2)

    //SYSIN    DD   *                                                      

    INCLTIME=2016/01/30,11:34-2016/01/30,12:10                              

    SYSID=SYSA

    Note: There is no problem of over-supplying SMF records (for example in this case, supplying SMF records collected from 10:00 to 13:00) since the events are selected for the period specified with the INCLTIME statement.

After CTMEVRT successfully executed, the site reviewed the following DAEVENTS file output, the events list dataset CTMEVRT.EVENTS (DAEVENTS DD) containing the list of extracted DSNEVENTs:

*Date      Time     SIDC Data Set Name       JobNameDisp RuleName        Actions

2016/01/30 11:34:14 MVS3 N FTP.FILE.INCOMING1FTPUSERC    0001-PROD:FTP* DO COND FILE_INCOMING1 ODAT +

2016/01/30 11:34:14 MVS3 N FTP.FILE.INCOMING1FTPUSERC    0001-PROD:FTP* DO FORCEJOB TABLE FTP JOB FTPA UFLOW N DATE ODAT LIBRARY PROD.JCL

2016/01/30 11:39:22 MVS3 Y FTP.FILE.INCOMING2FTPUSERC    0001-PROD:FTP* DO COND FILE_INCOMING2 ODAT +

2016/01/30 11:39:22 MVS3 Y FTP.FILE.INCOMING2FTPUSERC    0001-PROD:FTP* DO FORCEJOB TABLE FTP JOB FTPA UFLOW N DATE ODAT LIBRARY PROD.JCL

(Note: This is an incomplete listing of DAEVENTS; some columns have been removed for better readability.)

By cross-referencing the extracted DSNEVENTs with the IOA Log, the site realized that the event from 11:34 A.M. (the cataloging of data set FTP.FILE.INCOMING1) was actually already fully handled by CMEM. Therefore, it was excluded from further processing by commenting it out.

This is how CTMEVRT.EVENTS looked like after the change:

*Date      Time     SIDC Data Set Name       JobNameDisp RuleName          Actions

*2016/01/30 11:34:14 MVS3 N FTP.FILE.INCOMING1FTPUSERC    0001-PROD:FTP* DO COND FILE_INCOMING1 ODAT +

*2016/01/30 11:34:14 MVS3 N FTP.FILE.INCOMING1FTPUSERC    0001-PROD:FTP* DO FORCEJOB TABLE FTP JOB FTPA UFLOW N DATE ODAT LIBRARY PROD.JCL

2016/01/30 11:39:22 MVS3 Y FTP.FILE.INCOMING2FTPUSERC    0001-PROD:FTP* DO COND FILE_INCOMING2 ODAT +

2016/01/30 11:39:22 MVS3 Y FTP.FILE.INCOMING2FTPUSERC    0001-PROD:FTP* DO FORCEJOB TABLE FTP JOB FTPA UFLOW N DATE ODAT LIBRARY PROD.JCL

After the site decided that the edited DAEVENTS file was appropriate, the site submitted the sample job CTMEVEX. The CTMEVEX utility converted the CTMEVRT.EVENTS (DAEVENTS DD) DO actions to commands, and because the user set the SIMULATE parameter to NO, executed them using IOACND and CTMJOB.

By reviewing DAPRINT for IOACND and SYSPRINT for CTMJOB the site ensured that all the DO actions (DO FORCEJOB or DO COND) were executed successfully. On successful completion, the DSNEVENTs that CMEM had missed during its outage had been recovered and their actions executed.

Parent Topic

Recovering lost dataset events from SMF