Previous Topic

Next Topic

Book Contents

Book Index

Recovery Scenario Example

On January 30, at 11:36 A.M., after applying some maintenance on LPAR SYSA, Control-O stopped working. By 12:10 P.M. the problem was resolved and Control-O was brought back up.

Since the site has many incoming FTP datasets during the day, it was necessary to recover those events that are usually caught by Control-O (using DSNEVENT rules), but were lost during the Control-O outage.

Note: BMC recommends that you initiate the recovery process as close as possible to the outage, although it is possible to do so in a later stage.

Since batch work and FTP transfers were a little delayed around the outage time (11:36 A.M.), as they mostly run in a discretionary Service Class and the LPAR was highly utilized, the site estimated that jobs were denied CPU for no more than 2 minutes, and therefore set the delay duration to 2 minutes.

To make sure they recovered all the lost DSNEVENTs, the site did the following:

  1. Issued a Switch SMF command in SYSA.

    The switch ensured that the SMF records from the outage period were written to a sequential dataset (probably a new generation of a GDG).

  2. Submitted sample job CTMEVRT with the following input (making sure that both the INCLTIME SYSIN parameter and the DASMF DD allocations for the SMF generations cover the delay duration (2 minutes prior to the Control-O outage, 11:34 – 11:36) and the outage duration (11:36 – 12:10):

    //DASMF    DD   DISP=SHR,DSN=SMF.SYSA(0)

    //              DISP=SHR,DSN=SMF.SYSA(-1)

    //              DISP=SHR,DSN=SMF.SYSA(-2)

    //SYSIN    DD   *                                                      

    INCLTIME=2016/01/30,11:34-2016/01/30,12:10                              

    SYSID=SYSA

    Note: There is no problem of over-supplying SMF records (for example in this case, supplying SMF records collected from 10:00 to 13:00) since the events are selected for the period specified with the INCLTIME statement.

After CTMEVRT successfully executed, the site reviewed the following DAEVENTS file output, the events list dataset CTMEVRT.EVENTS (DAEVENTS DD) containing the list of extracted DSNEVENTs:

*Date      Time     SIDC Data Set Name        JobName Disp RuleName       Actions

2016/01/30 11:34:14 MVS3 N FTP.FILE.INCOMING1 FTPUSER C    0001-PROD:FTP* DO COND FILE_INCOMING1 ODAT +

2016/01/30 11:34:14 MVS3 N FTP.FILE.INCOMING1 FTPUSER C    0001-PROD:FTP* DO FORCEJOB TABLE FTP JOB FTPA UFLOW N DATE ODAT LIBRARY PROD.JCL

2016/01/30 11:39:22 MVS3 Y FTP.FILE.INCOMING2 FTPUSER C    0001-PROD:FTP* DO COND FILE_INCOMING2 ODAT +

2016/01/30 11:39:22 MVS3 Y FTP.FILE.INCOMING2 FTPUSER C    0001-PROD:FTP* DO FORCEJOB TABLE FTP JOB FTPA UFLOW N DATE ODAT LIBRARY PROD.JCL

2016/01/30 11:40:08 MVS3 Y FTP.FILE.INCOMING3 FTPUSER C    0001-PROD:FTP*?DO FORCEJOB TABLE FTP JOB FTPC UFLOW N DATE ODAT LIBRARY PROD.JCL  (?) Conditional - IF/WHILE/TERM precedes

2016/01/30 11:40:08 MVS3 Y FTP.FILE.INCOMING3 FTPUSER C    0001-PROD:FTP**DO RES      RESOURCE1           0004+                       (*) Unsupported action - not executed

(Note: This is an incomplete listing of DAEVENTS; some columns have been removed for better readability.)

By cross-referencing the extracted DSNEVENTs with the IOA Log, the site realized that the event from 11:34 A.M. (the cataloging of data set FTP.FILE.INCOMING1) was actually already fully handled by Control-O. Therefore, it was excluded from further processing by commenting it out.

Also, analyzing the IF statement in the rule referred to by 0001-PROD:FTP* in the RuleName column of CTMEVRT.EVENTS, the site came to the conclusion that the conditional action should be excluded from the run, as well.

The last line, with (*) Unsupported action - not executed informs the site that a DO RESOURCE action was missed, and the site should take care of this action beyond the limits of the CTMEVRT-CTMEVEX process. The line is removed.

This is how CTMEVRT.EVENTS looked after the change:

*Date      Time      SID  C Data Set Name      JobName Disp RuleName       Actions

*2016/01/30 11:34:14 MVS3 N FTP.FILE.INCOMING1 FTPUSER C    0001-PROD:FTP* DO COND FILE_INCOMING1 ODAT +

*2016/01/30 11:34:14 MVS3 N FTP.FILE.INCOMING1 FTPUSER C    0001-PROD:FTP* DO FORCEJOB TABLE FTP JOB FTPA UFLOW N DATE ODAT LIBRARY PROD.JCL

 2016/01/30 11:39:22 MVS3 Y FTP.FILE.INCOMING2 FTPUSER C    0001-PROD:FTP* DO COND FILE_INCOMING2 ODAT +

 2016/01/30 11:39:22 MVS3 Y FTP.FILE.INCOMING2 FTPUSER C    0001-PROD:FTP* DO FORCEJOB TABLE FTP JOB FTPA UFLOW N DATE ODAT LIBRARY PROD.JCL

*2016/01/30 11:40:08 MVS3 Y FTP.FILE.INCOMING3 FTPUSER C    0001-PROD:FTP* DO FORCEJOB TABLE FTP JOB FTPC UFLOW N DATE ODAT LIBRARY PROD.JCL  (?) Conditional - IF/WHILE/TERM precedes

After the site decided that the edited DAEVENTS file was appropriate, the site submitted the sample job CTMEVEX. The CTMEVEX utility converted the CTMEVRT.EVENTS (DAEVENTS DD) DO actions to commands, and because the user set the SIMULATE parameter to NO, executed them using IOACND and CTMJOB.

By reviewing DAPRINT for IOACND and SYSPRINT for CTMJOB the site ensured that all the DO actions (DO FORCEJOB or DO COND) were executed successfully. On successful completion, the DSNEVENTs that Control-O had missed during its outage had been recovered and their actions executed.

Parent Topic

Recovering lost dataset events from SMF