Restarts under Control-M/Restart
This chapter includes the following topics:
OverviewLink copied to clipboard
In the last chapter, you used a DO FORCEJOB statement in an ON/DO block to force a "remedial" job following a job failure. However, rather than run a remedial job following job failure, it is more likely that you will want to correct the problem and then restart the job that failed.
In this chapter, you will learn to use Control-M/Restart to perform job restarts when they become necessary. Before you do, however, you should be clear about the difference between a job rerun and a job restart.
Job rerun is the re-execution of a scheduled job, starting from the beginning. For example, if a job fails, the entire job can be rerun. At best, rerunning a job can waste processing time on already successfully completed job steps; and unless certain precautions are taken, if successful job steps from the prior run performed updates before the job failed, rerunning the job can create problematic results by repeating those updates.
Job restart is the re-execution of a job beginning at a particular step. In general, the results of successful job steps before the failure are utilized, and re-execution continues from the end of the last successful step.
Control-M/Restart automates restart by identifying the step at which to initiate a job restart, and by performing necessary tasks to ensure that job restart is error-free.
Two separate processes are required for Control-M/Restart to restart under Control-M:
-
Defining the restart parameters in the Control-M job scheduling definition appropriately, so that restart can be performed if it becomes necessary.
-
Activating the restart process when restart becomes necessary.
In this chapter, you will define restart parameters in the job scheduling definition, and then, following job failure and correction of the problem, see and involve yourself in the process of restart.
PreparationsLink copied to clipboard
For this chapter, you will create a new job and its job scheduling definitions. You will use the name IDJOB5, and the same library and table that you used in the preceding chapter.
To create the JCL for IDJOB5, begin with a copy of the TESTUTIL JCL, such as the JCL you used for IDJOB3. Recall that for job IDJOB3 this JCL had one step, which you defined to end with a condition code of C0008. For IDJOB5, you should make the following changes:
-
Change the step so that it ends with a condition code of C0000 (so the step ends OK).
-
Copy the changed step and its accompanying DDstatements four times, so that you have five steps in the job. Name those steps S1, S2, S3, S4, and S5, respectively.
-
Change the third step (S3) so that it ends with a condition code of C0008.
Your job should now have five steps, four of which end okay, but one, the third, which ends with a condition code of C0008.
You can now continue with the first part of these exercises; the creation of the job scheduling definition.
Defining Restart in the Job Scheduling DefinitionLink copied to clipboard
-
Enter the IOA Online Facility and open a job scheduling definition for job IDJOB5.
-
Ensure that the following values are part of the job scheduling definition:
-
In the MEMNAME field, specify IDJOB5. Fill in the appropriate JCL library name in the MEMLIB field.
-
In the GROUP field, specify IDGRP3.
-
In the DESC field, specify RESTART JOB UNDER CTM/RESTART.
-
In the DAYS field, specify ALL, and specify Y in all the MONTHS fields. These are the only Basic Scheduling parameters you should define.
-
Do not defined any Runtime Scheduling parameters.
-
In the OUT fields, define the condition
IDJOB5-ENDED-OK ODAT +
You are now ready to define the ON and DO statements. These will include your restart parameters.
-
-
In the ON block, specify ANYSTEP as the program step (PGMST) value. Specify >C0004 as the codes value.
The meaning of the >C0004 value should be clear. Generally, a step is considered to have ended OK if the routine returns a code of C0004 or lower. Therefore, a CODES value of >C0004 instructs Control-M to perform the accompanying DO statements if the job ended NOTOK.
In the preceding chapter, and again in this step, you specified ANYSTEP as the PGMST step values. These ON step values should be examined more closely.
ON Steps
The ON step fields identify the possible steps against which Control-M will check for the specified CODES values.
Two types of step values can be specified:
-
Program Step (PGMST) value
-
Procedure Step (PROCST) value
You can specify either or both types of values, but you must specify at least one value if you use an ON block.
-
Valid step values can be any of the following:
-
Literal value (for example, S3, if this is a step name)
-
Keyword value that represents a step
Valid keyword values are
-
ANYSTEP, which is valid only as a PGMST value
DO statements are performed if the CODES criteria are satisfied for any program step.
-
+EVERY
DO statements are performed only if the CODES criteria are satisfied for all steps, program and/or procedure, depending on the definition.
-
You can define a range of steps in the STEP RANGE statement, immediately above the ON statement. You must assign a name to this step range.
You can then specify this step range name, preceded by an asterisk, as the step value in the ON statement. The asterisk prefix in the ON statement instructs Control-M to check the program step range defined in the STEP RANGE field, rather than looking for an actual program step by that name.
DO statements are performed if the CODES criteria are satisfied for any program step in the step range.
-
-
Step Range Name, which is valid only for a PGMST value
-
-
-
Define the following DO COND statement as the first DO statement in the ON block:
IDJOB5-END-NOTOK ODAT +
You can now to define a Shout statement to the operator. However, because this is only an exercise and you do not want the shout to actually go to the operator, you will instead send the shout to yourself.
-
Define a DO SHOUT statement as the next DO statement in the ON block. The target location, or TO value, defaults to your userID. Specify the message:
IDJOB5 RUN FAILED. CORRECT PROBLEM AND CONFIRM RESTART.
You are now ready to define your restart parameters. Two statements are generally used in combination, to define a restart:
-
DO IFRERUN
-
DO RERUN
The DO IFRERUN statement is used to indicate that a restart is desired. It defines parameters to be used for restart, such as FROM step and TO step. This statement tells Control-M that if the job is rerun, hence the name DO IFRERUN, it should be rerun in accordance with the restart values specified in the DO IFRERUN statement.
The DO RERUN parameter merely instructs Control-M to run the job again. If you wanted a rerun without a restart, you would specify only the DO RERUN parameter, and the job would be rerun from the beginning.
The DO IFRERUN statement precedes the DO RERUN statement.
-
-
Enter IFRERUN in the blank DO statement. The DO IFRERUN statement is opened.
This statement contains FROM, TO, and CONFIRM fields. The default value of the CONFIRM field should be N (No).
-
Set the CONFIRM value to Y (Yes).
The CONFIRM field of the DO IFRERUN statement is similar in meaning to the CONFIRM runtime scheduling parameter that you used in Chapter1, "Introduction to Control-M." However, it applies only to restarted jobs, whereas the CONFIRM runtime scheduling parameter applies to all job runs.
You can now take a look at the FROM and TO fields.
The FROM field indicates the step from which the restart should begin, and the TO field indicates the step to which the restart should continue.
Both the FROM and the TO fields allow specification of a program step, to the left of the period, and/or a procedure step, to the right of the period. You can specify either or both values.
-
A TO step value can only be a literal value, such as S3, if this is a step name, or blank. If no TO value is specified, job processing continues to the end.
-
The FROM field allows specification of either a literal value or a keyword that represents a step. Some of the valid keyword values for the FROM step are listed in the following table:
Table 9 DO IFRERUN: Selected FROM Keyword values
Keyword
Description
$FIRST
First step of the job.
$ABEND
Step of the job that ended NOTOK due to system abend, user abend, condition code C2000 (PL/1 abend), or JFAIL (job failed on JCL error). $ABEND is a subset of $EXERR, described below.
$FIRST.$ABEND
First step of the abended procedure.
$EXERR
Job step that ended with any error, including an abend, or that ended with a condition code that is redefined using the ON and DO statements, as ENDED NOTOK
-
-
Specify $EXERR in the FROM field.
-
Enter RERUN in the blank DO statement. This statement has no subparameter values.
Before exiting the job scheduling definition, return and define one more OUT condition.
-
In the OUT condition line, add the following condition as the second condition on that line:
IDJOB5-END-NOTOK ODAT -
If the job fails, and then successfully finishes following restart, this OUT condition deletes the DO COND condition that was added by the job failure. If the job does not fail, there is no IDJOB5-END-NOTOK condition to delete. In this case, no deletion occurs and processing continues.
Your job scheduling definition appears as shown in Job Scheduling Definition Figures.
-
Exit the Job Scheduling Definition screen to the Job List screen.
-
Exit the Job List screen and save the changes, by entering Y in SAVE field of the Exit Option window. The Table List screen is displayed.
-
Reenter the Job List screen for the table, and order job IDJOB5. The Job Order Messages screen is displayed. and the job is ordered.
-
Exit the Job Order Messages screen and display the Active Environment screen by entering =3 in the COMMAND field. The Active Environment screen is displayed.
-
Ensure that filter IDGS is displayed. If it does not appear in the Filter field, enter SHOW IDGS in the COMMAND field).
-
Refresh the display as often as needed. A message similar to the following message is Shouted to your terminal:
CTM- IDJOB5 RUN FAILED. CORRECT PROBLEM AND RESTART 02.02 12:38 CN(INTERNAL)
The job goes through a process of submission and execution, and finally ends with the status: Ended- Not "OK" Due to CC - Rerun Needed:
The Active Environment screen appears as follows Job Scheduling Definition Figures. If the jobs from the exercises in the preceding chapter have not been deleted by site maintenance, they will also appear in the screen.
CopyCopied to clipboardFilter: IDGS ------- Control-M Active Environment ------ UP <D> - (3)
COMMAND ===> SCROLL ==> CRSR
O Name Owner Odate Jobname JobID Typ ----------- Status ------------
IDJOB5 ID 020201 M21 /29162 JOB Ended- Not "OK" Due to CC -
Rerun Needed
========= >>>>>>>>>>>>> Bottom of Jobs List <<<<<<<<<<<<< ========The CC in the status refers to a condition code. You can now review the job log and identify the condition code problem.
-
Call up the log for the job by entering L in the OPTION field. The Control-M Log screen is displayed for the job Job Scheduling Definition Figures.
Each event in the life cycle of the job appears as a message issued by Control-M. Note the following messages:
-
SEL214I indicates that rerun is needed, reflecting what you deliberately defined in the job.
-
SEL216W identifies a problem: An unexplained condition code of 0008 in step S3, reflecting what you deliberately defined in the job.
-
SEL219I indicates that the job ended "NOT OK"
-
Job Scheduling Definition FiguresLink copied to clipboard
Figure 34 Job Scheduling Definition
JOB: IDJOB5 LIB CTM.TEST.SCHEDULE TABLE: IDGS1
COMMAND ===> SCROLL===> CRSR
+-----------------------------------------------------------------------------+
MEMNAME IDJOB5 MEMLIB CTM.TEST.JCL
OWNER ID TASKTYPE JOB PREVENT-NCT2 DFLT N
APPL GROUP IDGS3
DESC
OVERLIB STAT CAL
SCHENV SYSTEM ID NJE NODE
SET VAR
CTB STEP AT NAME TYPE
DOCMEM IDJOB5 DOCLIB
===========================================================================
DAYS ALL DCAL
AND/OR
WDAYS WCAL
MONTHS 1- Y 2- Y 3- Y 4- Y 5- Y 6- Y 7- Y 8- Y 9- Y 10- Y 11- Y 12- Y
DATES
CONFCAL SHIFT RETRO N MAXWAIT 00 D-CAT
MINIMUM PDS
DEFINITION ACTIVE FROM UNTIL
===========================================================================
IN
CONTROL
RESOURCE
PIPE
FROM TIME + DAYS UNTIL TIME + DAYS
DUE OUT TIME + DAYS PRIORITY SAC CONFIRM Y
TIME ZONE:
===========================================================================
OUT IDJOB5-ENDED-OK ODAT + IDJOB5-END-NOTOK ODAT -
AUTO-ARCHIVE Y SYSDB Y MAXDAYS MAXRUNS
RETENTION: # OF DAYS TO KEEP # OF GENERATIONS TO KEEP
SYSOUT OP (C,D,F,N,R) FROM
MAXRERUN RERUNMEM INTERVAL FROM
STEP RANGE FR (PGM.PROC) . TO .
ON PGMST ANYSTEP PROCST CODES >C0004 A/O
DO COND IDJOB5-END-NOTOK ODAT +
DO SHOUT TO ID URGENCY R
= IDJOB5 RUN FAILED. CORRECT PROBLEM AND RESTART
DO IFRERUN FROM $EXERR . TO . CONFIRM Y
DO RERUN
DO
ON PGMST PROCST CODES A/O
DO
SHOUT WHEN TIME + DAYS TO URGN
MS
======= >>>>>>>>>>>>>>>>>>> END OF SCHEDULING PARAMETERS <<<<<<<<<<<<<<<< =====
COMMANDS: EDIT, DOC, PLAN, JOBSTAT 14.20.57
The Active Environment screen:
Filter: IDGS ------- Control-M Active Environment ------ UP <D> - (3)
COMMAND ===> SCROLL ==> CRSR
O Name Owner Odate Jobname JobID Typ ----------- Status ------------
IDJOB5 ID 020201 M21 /29162 JOB Ended- Not "OK" Due to CC -
Rerun Needed
========= >>>>>>>>>>>>> Bottom of Jobs List <<<<<<<<<<<<< ========
The Job Log:
--------------------- LOG MESSAGES FOR JOB(S) IDJOB5 -----------------(3.LOG)
COMMAND ===> SCROLL===> CRSR
SHOW LIMIT ON ==> USERID GROUP MEM/MIS DATE 020201 - 020201
DATE TIME ODATE USERID CODE ------ M E S S A G E --------------------
020201 123835 020201 ID SEL203I JOB IDJOB5 OID=0008Y ELIGIBLE FOR RUN
020201 123835 020201 ID SUB133I JOB IDJOB5 M21 /08134 OID=0008Y
SUBMITTED FROM LIBRARY (P) CTM.TEST.JCL
020201 123844 020201 ID SPY28GI JOB IDJOB5 M21 /08134 OID=0008Y TAPE
DRIVE UNITS USED=00 00
020201 123844 020201 ID SPY281I JOB IDJOB5 M21 /08134 OID=0008Y START
01033.1238 STOP 01033.1238 CPU 0MIN
00.92SEC SRB 0MIN 00.05SEC 0.13 2AOS35
020201 123845 020201 ID SPY254I JOB IDJOB5 M21 /08134 OID=0008Y SCANNED
020201 123845 020201 ID SEL216W JOB IDJOB5 M21 /08134 OID=0008Y
UNEXPLAINED COND CODE 0008 STEP S3 /
020201 123845 020201 ID SEL214I JOB IDJOB5 M21 /08134 OID=0008Y RERUN
NEEDED
020201 123845 020201 ID SEL215W JOB IDJOB5 M21 /08134 OID=0008Y NO
(MORE) RERUNS
020201 123845 020201 ID SEL219I JOB IDJOB5 M21 /08134 OID=0008Y ENDED
"NOT OK"
======== >>>>>>>>>>>>>>>> NO MORE LOG MESSAGES <<<<<<<<<<<<<<<< =======
CMDS: SHOW, GROUP, CATEGORY, SHPF 12.44.23
Editing JCL from the Active EnvironmentLink copied to clipboard
Clearly there is no point in restarting a job that will continue to fail because the problem has not been corrected. But now that you know the cause of the failure, you can correct it and then restart the job.
You can correct the JCL of this job by using the JCL option in the Active Environment screen.
-
Exit the Control-M Log screen. The Active Environment screen is displayed.
Note the option J (JCL) at the bottom of the screen. If the list of commands is displayed instead of the list of options, enter the OPT command to toggle to the list of options.
-
Enter option J for the job. The JCL is displayed in ISPF edit mode.
-
Change the condition code of C0008 in step S3 to C0000, and exit the JCL. The Active Environment screen is displayed. You can now restart the job.
Restarting the JobLink copied to clipboard
-
Enter option R (Rerun) for the job. Option R performs job rerun.
However, as was discussed earlier, when restart instructions are defined in a DO IFRERUN statement, restart is performed when the job is run again.
When a rerun is requested, a window is opened. The window is different for regular reruns and restart reruns. Because you defined a DO IFRERUN statement in the job scheduling definition, you see the Confirm Restart window:
Figure 35 Confirm Restart Window
CopyCopied to clipboardFilter: IDGS ------- Control-M Active Environment ------ UP <D> - (3)
COMMAND ===> SCROLL ==> CRSR
O Name Owner Odate Jobname JobID Typ ----------- Status ------------
R IDJOB5 ID 020201 M21 /29162 JOB Ended- Not "OK" Due to CC -
+---------------------------------(3.R)+
========= >>>>>>>>>>>>> | Job IDJOB5 Is to be Rerun | < ========
| Please Confirm (Y/N) |
| With Restart Y (?/Y/N) |
| ---------------------------------- |
| From Step/Proc S3 . |
| To Step/Proc . |
| Recapture Abend Codes (Y/N) |
| Recapture Cond Codes (Y/N) |
| Step Adjustment (Y/N) |
| Restart Parm Member Name IDJOB5 |
+--------------------------------------+
Opt: ? Why L Log H Hold Z Zoom R Rerun A Activate O Force OK V View Sysout
N Net D Del F Free S Stat G Group U Undelete J JCL Edit C Confirm 15.46.06In the top half of the window you see that
-
the first line informs you which job (IDJOB5) is to be rerun
-
the next line asks for confirmation, and you will shortly specify Y (Yes)
-
the next line tells you that the rerun has been defined to include a restart, it defaults to Y, but you can specify N (No) if you prefer a full rerun
The bottom half of the window deals with restart information. In this exercise, you will only look at the first line, which tells you from which step, and to which step, the restart will be performed.
-
The FROM value is S3. This makes sense because steps S1 and S2 ended successfully.
-
The TO step is blank, which means that once restart begins, it will continue till the end of the job. Consider the following:
-
If you do not want the steps after the restart step to run again, you can specify restart step S3 as the TO step.
-
If you defined the JCL so that steps after the failed step do not run, and you want them to run following the restart, you should leave the TO value blank.
At this point, all you need to do is enter Y in the CONFIRM field, and the job will restart. However, you should not take that action at this time.
There might be instances in which you want the job to restart from a different step than the one determined by Control-M/Restart as the logical restart step. It is possible to change the FROM and TO steps in the Confirm Restart window. To facilitate this change, you can display the list of steps in the job.
Notice that ? is a valid value for the With Restart field. Entering ? displays the list of steps.
-
-
-
Enter ? in the With Restart field. The Restart Step List window is opened over the Restart Window.
Figure 36 Restart Step List Window
CopyCopied to clipboardFilter: IDGS ------- Control-M Active Environment ------ UP <D> - (3)
COMMAND ===> SCROLL ==> CRSR
O Name Owner Odate Jobname JobID Typ ----------- Status ------------
R IDJOB5 ID 020201 M21 /29162 JOB Ended- Not "OK" Due to CC -
+---------------------------------(3.R)+
========= >>>>>>>>>>>>> | Job IDJOB5 Is to be Rerun | < ========
| Please Confirm (Y/N) |
| +----------- Control-R Step List ------------+
| | Command ==> |
| | O Num Pgm-stp Proc-stp Pgm= Comp |
| | 001 S1 IOATEST C0000 |
| | 002 S2 IOATEST C0000 |
| | 003 S3 IOATEST C0008 |
| | 004 S4 IOATEST C0000 |
| | 005 S5 IOATEST C0000 |
+- | |
| |
| |
| |
| |
| Opt: F From T To O Only |
+--------------------------------------------+
Opt: ? Why L Log H Hold Z Zoom R Rerun A Activate O Force OK V View Sysout
N Net D Del F Free S Stat G Group U Undelete J JCL Edit C Confirm 15.46.06The Control-M/Restart Step List window sequentially lists all the steps in the job, assigning each of them a sequence number.
At the bottom of the window are three options that can be specified in the O (Option) field for the appropriate step:
-
Option F can be used to specify a From step.
-
Option T can be used to specify a To step.
-
Option O can be used to indicate that only the specified step should be rerun.
You can specify option F for step S3, but this is not necessary, since that value is already indicated in the Restart Window. So just exit the Restart Step List window.
-
-
Press PF03/PF15 to exit the Restart Step List window. The Restart Step List window is closed, and the Restart Window is displayed.
-
Enter Y in the Please Confirm field. The Restart window is closed, and the rerun with restart now begins.
Notice the progression of status changes for the job in the Active Environment screen. When the rerun with restart is complete, the job appears as shown below:
CopyCopied to clipboardFilter: IDGS ------- Control-M Active Environment ------ UP <D> - (3)
COMMAND ===> SCROLL ==> CRSR
O Name Owner Odate Jobname JobID Typ ----------- Status ------------
IDJOB5 ID 201200 M21 /29191 JOB Ended "OK" (Restarted) (Run 2)
Prior Run: Ended- Not "OK" Due
to CC - Rerun was Needed
========= >>>>>>>>>>>>> Bottom of Jobs List <<<<<<<<<<<<< ========Notice that there are two status descriptions for the job—each belonging to one of the runs:
-
The current status, Ended "OK" (Restarted), applies to Run2. The job was successfully restarted.
-
The original status with the problematic CC now appears as being associated with the prior run.
You can now look at the message log for the restarted job.
-
-
Call up the log of the job by entering L in the OPTION field. The Control-M Log screen is displayed for the job.
CopyCopied to clipboard--------------------- LOG MESSAGES FOR JOB(S) IDJOB5 -----------------(3.LOG)
COMMAND ===> SCROLL===> CRSR
SHOW LIMIT ON ==> USERID GROUP MEM/MIS DATE 020201 - 020201
DATE TIME ODATE USERID CODE ------ M E S S A G E --------------------
020201 135255 020201 ID CTM65AI JOB IDJOB5 OID=0008Y ODATE 020201 RERUN
PERFORMED BY ID
020201 135256 020201 ID SEL220I JOB IDJOB5 OID=0008Y WILL BE RERUN
020201 135256 020201 ID SEL203I JOB IDJOB5 OID=0008Y ELIGIBLE FOR RUN
020201 135257 020201 ID SUB133I JOB IDJOB5 M21 /08223 OID=0008Y
SUBMITTED FROM LIBRARY (P) CTMP.V610.JCL
020201 135311 020201 ID CTR082I JOB IDJOB5 M21 /08223 OID=0008Y
RESTARTING FROM STEP S3 . TO STEP S5 .
020201 135311 020201 ID CTR066I JOB IDJOB5 M21 /08223 OID=0008Y NUMBER
OF SKIPPED STEPS 2 WITH A TOTAL ELAPSED
TIME 00.00 CPU TIME 0MIN 00.36SEC
020201 135311 020201 ID SPY28GI JOB IDJOB5 M21 /08223 OID=0008Y TAPE
DRIVE UNITS USED=00 00
020201 135311 020201 ID SPY281I JOB IDJOB5 M21 /08223 OID=0008Y START
01033.1352 STOP 01033.1353 CPU 0MIN
01.37SEC SRB 0MIN 00.07SEC 0.21 6AOS35
020201 135311 020201 ID SPY254I JOB IDJOB5 M21 /08223 OID=0008Y SCANNED
020201 135311 020201 ID SEL208I JOB IDJOB5 M21 /08223 OID=0008Y ENDED
"OK"
CMDS: SHOW, GROUP, CATEGORY, SHPF 13.54.12Notice the message SEL208I. This message indicates that the job ended "OK."
-
Exit the Online facility.
This completes the steps in this chapter of the Control-M for z/OS Getting Started Guide.
ReviewLink copied to clipboard
In this chapter you
-
defined a restart in your job scheduling definition using parameters DO IFRERUN and DO RERUN
-
learned valid restart step keyword values and specified that the job should restart from step $EXERR
-
entered the JCL of the failed job from the Active Environment screen by entering the J (JCL) option, and corrected the JCL
-
confirmed a rerun/restart request (Option C) for the failed job in the Active Environment screen, and in the process you displayed the Confirm Restart Window and Restart Step List window, in which you confirmed the restart
-
checked the log of the job following the failed run, and again following the restart
Recommended ReadingLink copied to clipboard
Before continuing with the next chapter, it is recommended that you read the following:
-
In the Control-M/Restart User Guide
-
In the Control-M for z/OS User Guide.
-
In Chapter 2, the description of the Control-M/Restart information related to the Confirm Restart window and Rerun/Restart window, and the Restart Step List window.
-
In Chapter 3, detailed parameter descriptions of parameters DO IFRERUN and DO RERUN.
-