On January 30, at 11:36 A.M., after applying some maintenance on LPAR SYSA, CMEM stopped working. By 12:10 P.M. the problem was resolved and the CMEM monitor was brought back up.
Since the site has many incoming FTP datasets during the day, it was necessary to recover those events that are usually caught by CMEM (using DSNEVENT rules), but were lost during the CMEM outage.
Note: BMC recommends that you initiate the recovery process as close as possible to the outage, although it is possible to do so in a later stage.
Since batch work and FTP transfers were a little delayed around the outage time (11:36 A.M.), as they mostly run in a discretionary Service Class and the LPAR was highly utilized, the site estimated that jobs were denied CPU for no more than 2 minutes, and therefore set the delay duration to 2 minutes.
To make sure they recovered all the lost DSNEVENTs the site did the following:
The switch ensured that the SMF records from the outage period were written to a sequential dataset (probably a new generation of a GDG).
//DASMF DD DISP=SHR,DSN=SMF.SYSA(0)
// DISP=SHR,DSN=SMF.SYSA(-1)
// DISP=SHR,DSN=SMF.SYSA(-2)
//SYSIN DD *
INCLTIME=2016/01/30,11:34-2016/01/30,12:10
SYSID=SYSA
Note: There is no problem of over-supplying SMF records (for example in this case, supplying SMF records collected from 10:00 to 13:00) since the events are selected for the period specified with the INCLTIME statement.
After CTMEVRT successfully executed, the site reviewed the following DAEVENTS file output, the events list dataset CTMEVRT.EVENTS (DAEVENTS DD) containing the list of extracted DSNEVENTs:
*Date Time SIDC Data Set Name JobNameDisp RuleName Actions
2016/01/30 11:34:14 MVS3 N FTP.FILE.INCOMING1FTPUSERC 0001-PROD:FTP* DO COND FILE_INCOMING1 ODAT +
2016/01/30 11:34:14 MVS3 N FTP.FILE.INCOMING1FTPUSERC 0001-PROD:FTP* DO FORCEJOB TABLE FTP JOB FTPA UFLOW N DATE ODAT LIBRARY PROD.JCL
2016/01/30 11:39:22 MVS3 Y FTP.FILE.INCOMING2FTPUSERC 0001-PROD:FTP* DO COND FILE_INCOMING2 ODAT +
2016/01/30 11:39:22 MVS3 Y FTP.FILE.INCOMING2FTPUSERC 0001-PROD:FTP* DO FORCEJOB TABLE FTP JOB FTPA UFLOW N DATE ODAT LIBRARY PROD.JCL
(Note: This is an incomplete listing of DAEVENTS; some columns have been removed for better readability.)
By cross-referencing the extracted DSNEVENTs with the IOA Log, the site realized that the event from 11:34 A.M. (the cataloging of data set FTP.FILE.INCOMING1) was actually already fully handled by CMEM. Therefore, it was excluded from further processing by commenting it out.
This is how CTMEVRT.EVENTS looked like after the change:
*Date Time SIDC Data Set Name JobNameDisp RuleName Actions
*2016/01/30 11:34:14 MVS3 N FTP.FILE.INCOMING1FTPUSERC 0001-PROD:FTP* DO COND FILE_INCOMING1 ODAT +
*2016/01/30 11:34:14 MVS3 N FTP.FILE.INCOMING1FTPUSERC 0001-PROD:FTP* DO FORCEJOB TABLE FTP JOB FTPA UFLOW N DATE ODAT LIBRARY PROD.JCL
2016/01/30 11:39:22 MVS3 Y FTP.FILE.INCOMING2FTPUSERC 0001-PROD:FTP* DO COND FILE_INCOMING2 ODAT +
2016/01/30 11:39:22 MVS3 Y FTP.FILE.INCOMING2FTPUSERC 0001-PROD:FTP* DO FORCEJOB TABLE FTP JOB FTPA UFLOW N DATE ODAT LIBRARY PROD.JCL
After the site decided that the edited DAEVENTS file was appropriate, the site submitted the sample job CTMEVEX. The CTMEVEX utility converted the CTMEVRT.EVENTS (DAEVENTS DD) DO actions to commands, and because the user set the SIMULATE parameter to NO, executed them using IOACND and CTMJOB.
By reviewing DAPRINT for IOACND and SYSPRINT for CTMJOB the site ensured that all the DO actions (DO FORCEJOB or DO COND) were executed successfully. On successful completion, the DSNEVENTs that CMEM had missed during its outage had been recovered and their actions executed.
Parent Topic |