Introduction to Control-M/Restart
This chapter includes the following topics:
Overview
Control-M/Restart is an automated job restart system, and it also performs many functions not related to restart. It is primarily designed to work with jobs that were run under Control-M, and many of its features utilize Control-M capabilities. However, it can also work on jobs that did not run under Control-M. In this case, Control-M/Restart works in standalone mode, and it does not have the full range of capabilities that are available when Control-M/Restart works under Control-M.
This chapter presents a brief introduction to Control-M/Restart features and functionality.
Rerun and Restart
To understand what Control-M/Restart does, it is necessary to distinguish between job rerun and job restart.
Job rerun is the re-execution of a scheduled job from the beginning. For example, if a job fails, the entire job can be rerun.
At best, rerunning a job can waste processing time on already successfully completed job steps. And unless certain precautions are taken, if successful job steps from the prior run performed updates before the job failed, rerunning the job can create problematic results by repeating those updates.
Job restart is the re-execution of a job from a particular step. In general, the results of successful job steps from before the failure are utilized, and re-execution continues from the end of the last successful step. Many complex decisions are made and several necessary tasks are performed during this process. These are described briefly in this chapter.
Control-M/Restart Capabilities
Control-M/Restart provides the following major capabilities:
-
Restart
Performing job restart is the main function of Control-M/Restart. When job restart is necessary, Control-M/Restart automates restart by identifying at which step to initiate a job restart and by performing necessary tasks (described later) to ensure that job restart is error-free.
This capability is available both for restarts under Control-M and standalone restarts.
-
Data set cleanup and Prevent-NCT2 processing
Another important function of Control-M/Restart is data set cleanup, which is described in Data set cleanup prior to the original run.
Before restarts and reruns, Control-M/Restart automatically performs data set cleanup. It does not have to be requested.
Data set cleanup can also be performed by request prior to the original job run:
-
When the data set cleanup request is connected to the original job run request, it is termed Prevent-NCT2 processing.
The term is derived from the error message generated following an attempt to catalog a data set that was already cataloged. The error message indicates a reason of NOT CATLGD for reason code2. As part of data set cleanup, Control-M/Restart prevents these types of errors.
-
When the data set cleanup request is independent of, and not accompanied by, a job run request, no special term is applied and it is called data set cleanup.
-
-
Maintaining Previous Runs in the History Jobs File
Jobs that have already executed and are ready for removal from the Control-M Active Jobs file can be saved in the Control-M History Jobs file (instead of being deleted). Parameters in the Control-M job scheduling definition determine if and when a job is placed in the History Jobs file and how long the job is maintained there.
This capability is available only for jobs submitted under Control-M.
Main Components
The following components are critical to Control-M/Restart when it operates under Control-M.
Control-M Job Scheduling Definition
Most Control-M/Restart functions for a job are defined using parameters in the job scheduling definition. These parameters can be defined so that Control-M/Restart processing is completely automatic, requiring no manual intervention.
However, if manual intervention is needed, for example, a manual confirmation before a restart under Control-M/Restart, these parameters can be defined accordingly.
Table 1 lists Control-M/Restart functions, and the parameters in the Control-M job scheduling definition that are used to define them.
Table 1 Control-M/Restart Functions and Control-M Job Scheduling Definitions Used to Define Them
Control-M/Restart Function |
Control-M Job Scheduling Definition |
---|---|
Job Restart |
DO IFRERUN |
Archive SYSDATA (defined later in this section) that is necessary for job restart. |
AUTO-ARCHIVE |
Perform Prevent-NCT2 (data set cleanup) processing prior to, but as part of, the original job run. |
PREVENT-NCT2 |
Determine if and how long a job is retained in the History Jobs file:
|
|
These parameters are defined using the Control-M Job Scheduling Definition screen (Screen 2). The parameters are described in detail in the job scheduling parameters chapter of the Control-M for z/OS User Guide.
Control-M Monitor
The heart of the Control-M Production Control System is the Control-M monitor. The monitor is usually activated as a started task.
The Control-M monitor selects jobs for execution, submits the jobs, tracks the jobs, analyzes job execution results, and so on. The monitor executes user instructions (defined in the job scheduling definition) that describe when and how a job is executed.
Jobs requiring Control-M/Restart processing enter the normal processing flow of Control-M under the management of the Control-M monitor. Additional logic has been added to the Control-M monitor to facilitate handling of Control-M/Restart functions.
CONTROLR Step
The CONTROLR step is a special processing step that is automatically generated by Control-M/Restart and inserted into the JCL of the job. The CONTROLR step provides the necessary instructions for the appropriate Control-M/Restart processing of the job.
When job restart or data set cleanup processing is requested, the CONTROLR step is inserted as the first step of the JCL.
Manual adjustment of the CONTROLR step is permitted.
For details of the CONTROLR step, see The CONTROLR Step and Control Parameters.
Control-M/Restart Parameter Members
In the IOA PARM library, the CTRPARM member is used to define many default Control-M/Restart parameters. Several of these parameters impact the way in which Control-M/Restart and the CONTROLR step handle processing.
The Control-M/Restart PARM library contains members that are also used to define Control-M/Restart processing defaults. The $DEFAULT member in this library contains definitions that apply to all jobs processed by Control-M/Restart. The $EXCLUDE member identifies data sets to be excluded from Control-M/Restart processing. The $KEEP member identifies data sets that must not be deleted by Control-M/Restart. And local members in this library define processing defaults that apply to a particular job.
Control-M Active Environment Screen
As with any job running under Control-M, the Control-M Active Environment screen (Screen 3) enables the user to see the status of, and manually intervene in the processing of, restarted jobs.
When Control-M/Restart processing has been defined so as to require manual intervention, this intervention is generally performed in the Active Environment screen. For example, if a manual confirmation is required before restart, the confirmation can be entered using the Confirm Restart window in the Active Environment screen.
The Active Environment screen is the gateway to several windows and screens relevant to Control-M/Restart. Below is a list of the windows and screens available from the Active Environment screen. They are described in detail in the online facilities chapter of the Control-M for z/OS User Guide.
Table 2: Screens and Windows Aavailable from the Active Environment Screen
Window or Screen |
Description |
---|---|
Confirm Restart window |
Used to confirm job restart when the DO IFRERUN statement requires manual confirmation |
Rerun Restart window |
Used to activate the restart when automatic rerun (DO RERUN) for the job is not specified |
Restart Step List window |
Displays the list of steps from the previous run of the job. The steps can then be selected for use in the Confirm Restart or Rerun Restart window. |
Job Order Execution History screen |
Displays the execution history of the job. From this screen, the Sysout Viewing screen (that displays the archived SYSDATA of the job) can be accessed. |
Sysout Viewing screen |
Displays the archived SYSDATA of the job |
History Environment Screen |
This screen, a special format of the Active Environment screen, displays jobs in the History Jobs file. |
Control-M/Restart Online Utilities
Table 3 describes the Control-M/Restart utilities in the IOA Online Utility facility that are available under ISPF (they are also available as TSO CLISTs).
Table 3 Control-M/Restart Online Utilities
Utility |
Description |
---|---|
R1 |
Control-M/Restart Simulation Simulates restart under Control-M/Restart or Prevent-NCT2 processing. |
R2 |
Dataset Cleanup Performs data set cleanup and adjustments without running the job. |
R3 |
Job Dataset List Prepares the list of permanent data sets used in a job. The list is generated in the Control-M Statistics file. |
R4 |
Control-M/Restart Standalone Performs restarts under Control-M/Restart, or Prevent-NCT2 processing, for jobs not run under Control-M. |
Reporting Facility
Several Control-M/Restart reports produced by IOA KeyStroke Language (KSL) scripts are provided. KSL is a general purpose language that mimics keystrokes entered in IOA applications. It is described in detail in the KeyStroke Language (KSL) User Guide.
Table 4 describes the KSL reports that are provided. Sample outputs for these reports are provided in the KeyStroke Language (KSL) User Guide.
Table 4 Control-M/Restart KSL Reports
Report |
Description |
---|---|
Manual Restart Confirmation Report |
Details restart jobs that were manually released for execution using the Control-M/Restart CONFIRM option within a specified period. |
Restart Detail Report |
List of restarted jobs executed over a particular period. The report displays the restart job, the restart step, use of the CONFIRM option, and so on. |
Last Night Restart History Report |
Details the complete execution history of all jobs that were restarted during the previous night. Job start time, end time, restart step and termination condition codes for both successful and unsuccessful restarts are displayed. |
Restart Time Savings Report |
Lists job restarts by Control-M/Restart during the specified period. For each listed job restart, the report provides summary information about the execution time saved as a result of using a Control-M/Restart restart instead of a rerun (number of steps skipped, elapsed time saved, and the CPU time saved). It also provides general information about the job. |
Last Night Sysout Scan Summary Report |
Provides an execution history for jobs with archived sysouts that ran the previous night. Either the first archived sysout or all archived sysouts can be displayed for the specified jobs. |
Control-M/Restart under Control-M
Two separate processes are required for restart under Control-M:
-
defining the restart parameters in the job scheduling definition appropriately, so that restart can be performed if it becomes necessary
-
activation of the restart process when restart becomes necessary
These are described below.
Defining Restart Parameters in the Job Scheduling Definition
The Control-M job scheduling definition contains post-processing parameters that tell Control-M what to do following a job run. The ON/DO statements enable specification of particular actions to be performed in particular situations. The job scheduling definition can therefore contain different instructions for what to do in different situations (if the job ends OK, if the job ends NOTOK, if the job abends, and so on).
Restart instructions are generally defined in these ON/DO statements. It is important to note that these parameters are defined in advance of any need to perform a restart. Possible situations requiring restart are anticipated at the time the job scheduling definition is being defined. The job scheduling definitions can, however, be modified at any time.
The ON statement indicates the situation in which the defined restart actions are taken. For example, it may indicate that the defined restart actions are performed in case of an abend.
The DO statements indicate the actions to perform. A DO IFRERUN statement defines restart criteria if the particular ON criteria are satisfied. The DO IFRERUN statement can indicate the step at which the restart must begin, and (if desired) the step at which it must end, and whether manual confirmation is necessary. For the restart to be automatic, a DO RERUN statement must also be defined. The combination of DO IFRERUN and DO RERUN parameters define an automatic restart.
Basic Control-M/Restart Process Overview
Once a job is submitted under Control-M, a restart may become necessary. The basic restart process is outlined below. Variations to this process are described in The CONTROLR Step and Control Parameters.
The job is selected for restart
If a job fails and its job scheduling definition indicates that the job is restarted following such a failure, the job can be automatically restarted.
Jobs are placed in WAIT SCHEDULE status in the Active Environment screen until all conditions required for the execution are fulfilled (time limits, prerequisite conditions, Quantitative resources, Control resources, and so on.). When all conditions for the execution of a job have been fulfilled, the JCL of the job to be restarted is prepared for submission.
Any job under Control-M can be restarted, even if the job scheduling definition does not contain restart parameters. In this case, restart is manually requested from the Active Environment screen.
The JCL of the job is prepared for submission
The following steps are performed in the preparation of the JCL of the job for submission:
-
The JCL of the job is retrieved from the appropriate JCL Library.
-
Control-M AutoEdit variables are resolved.
The JCL of the job retrieved from the user library may contain Control-M AutoEdit variables. These AutoEdit variables can be replaced with different values based on how and where the previous runs of the job terminated, using SET VAR and DO SET parameters of the job scheduling definition. If the criteria for replacement of an AutoEdit variable have been met, based on the results of the previous runs of the job, the AutoEdit variables are replaced by the predefined values specified by the user.
-
The CONTROLR step is inserted into the JCL of the job.
Many of the Control-M/Restart facilities that make the job restart process automatic and error-free are activated during execution of this step. This restart information is derived from the restart specifications provided by the user in the job scheduling definition, and from the CTRPARM member.
The JCL of the job is submitted for execution
The JCL of the job is submitted for execution.
The restarted job is tracked and controlled by Control-M
Jobs restarted by Control-M/Restart enter the normal flow of Control-M processing under the management of the Control-M monitor. Therefore, all Control-M tracking and control capabilities apply equally to restarted jobs as well as to originally scheduled production jobs.
Error handling
When Control-M/Restart detects a restart error situation, for example, if a mandatory input data set is missing, it generates a restart error. Control-M/Restart then continues to check and report on all error situations (other missing input data sets in the job, and so on). This provides a report of all errors after the first Control-M/Restart run.
Control-M/Restart Components and Concepts
The following components and concepts are also important to restarts under Control-M/Restart.
ORDERID
Each job order under Control-M is assigned a unique order ID. As a result, it is possible for multiple job orders to exist for the same job name in the Control-M Active Jobs file. One job order may terminate OK while the other may fail and require a restart. Each job order is considered a unique, totally independent entity, and Control-M/Restart processing is always performed on a specific job order.
SYSDATA
SYSDATA is the term used to designate the following job sysout data sets:
-
job log (console messages)
-
expanded JCL
-
system output messages
SYSDATA data sets are usually produced for each execution of a job or started task; however, not all of these data sets are necessarily present in all cases.
SYSDATA is archived if job restart is to be performed. SYSDATA is important to job restart for the following reasons:
-
Control-M/Restart allows the same job to be automatically restarted multiple times. The restart function of Control-M/Restart requires the complete picture of the execution history of a job. Archiving the SYSDATA of jobs processed in the Control-M environment provides that complete picture of the execution history of a job.
-
Control-M/Restart facilities that are activated within the CONTROLR step require the SYSDATA of all previous runs of the job. These facilities (described below) are
-
restart step adjustment
-
file catalog and GDG adjustment
-
Condition Code and Abend Code Recapture
-
Even if a job finished executing OK, it can be manually rerun or restarted at a user-specified job step. In this case, a complete history of previous executions of the job is required by Control-M/Restart facilities.
SYSDATA archiving is requested by appropriately filling in the AUTO-ARCHIVE parameter and its subparameters in the Control-M job scheduling definition. It is performed by Control-M during job post-processing; the SYSDATA is compressed and written to the specified data set.
In certain situations, SYSDATA archiving is not desirable and is not requested (for example, cyclic started tasks).
The user can view SYSDATA of previous runs of a requested job order online. For more information, see "Job Order Execution History screen" and "Sysout Viewing screen" in Control-M Active Environment Screen.
Data Set Cleanup and Prevent-NCT2 Processing
Before executing a restart job, catalog and VTOC maintenance are often required in order to prevent file-related errors during the processing of the restarted job.
When a job tries to create a data set that already exists or that has a name that is already cataloged, the job may fail with a DUPLICATE DATASET ON VOLUME error, or a NOT CATLGD 2 error. As a result, the production workflow continues using an incorrect version of the data set. In either case, the impact on the production workflow can be severe. This problem is especially likely in non-restart reruns. Therefore, data set cleanup is necessary.
The data set cleanup process automatically performs all required catalog adjustment. It accesses the SYSDATA of previous runs of the job order to analyze file creation and deletion and catalog information. Since a job may fail multiple times, analysis of the SYSDATA begins with the archived SYSDATA of the most recent non-restarted run.
As part of data set cleanup, Control-M/Restart
-
deletes and uncatalogs the old data sets
This prevents DUPLICATE DATSET ON VOLUME and NOT CATLGD 2 errors. -
performs Generation Data Set (GDG) Adjustment (described below)
The user can, however, exclude files from data set cleanup if desired, in either of the following ways:
-
by specifying the names of the data sets to be excluded in appropriate control statements that are placed in a user-defined library member
For more information, see EXCLUDE DSN parameter. -
by specifying the DDname in the appropriate parameter member
For more information, see Format of the $EXCLUDE member.
Control-M/Restart automatically performs data set cleanup prior to any restart.
Data set cleanup can also be performed even prior to the original run of a job. This can be important because data sets accessed by the job can have DUPLICATE DATA SET or NOT CATLGD 2 errors that were generated by an entirely different job. As mentioned earlier in this chapter
-
when data set cleanup is performed as part of the original job request, it is called Prevent-NCT2 processing
-
otherwise (that is, when performed independently of the original job request), the term "data set cleanup" is used
For details, see Data Set Cleanup prior to the Original Run.
Automatic GDG Adjustment
Generation data set (GDG) bias numbers must be adjusted so that the relative references to them within the restarted job refer to the correct generation of the data set.
For example, adjusting GDG bias numbers enables a job that creates data set A.B(+1) in STEP1 and reads A.B(+1) in STEP2 to be successfully restarted in STEP2 without manually changing the JCL relative generation number from +1 to 0.
Because it works completely automatically, the GDG Adjustment facility allows the user to restart jobs without being concerned about the technical details of GDG maintenance.
Because Control-M/Restart can handle jobs that dynamically allocate GDG data sets, but does not perform adjustments for such data sets, it may be necessary to exclude dynamically allocated GDG files from Control-M/Restart processing when these files are referenced both through JCL and by dynamic allocation.
Recoverable and Non-recoverable Job Steps
Restart of a job must begin at a job step that ensures re-creation of all deleted data sets required as input to the steps to be processed in the restart job. Such a step is called a recoverable job step.
Nonrecoverable job steps are steps that can result from any of the following situations:
-
The step contains data sets that are not yet kept or cataloged (meaning, temporary or NEW/PASS data sets) at the point the job failed are deleted by the operating system. If these deleted data sets are required as input to job steps to be processed in the job restart, the restart cannot be successfully performed.
-
A DDstatement contains a VOL=REF parameter that backward references a tape data set that is not the first file on the tape, the step is not recoverable. In this case, the earliest recoverable step is to the step that contains the original volume reference for the tape.
-
The step was manually marked as non-restartable. This is discussed in Non-restartable Step.
Automatic Restart Step Adjustment
The user normally specifies the step at which the restart must begin, either in a DO IFRERUN statement in the job scheduling definition, or in the Restart window used to manually issue a restart request.
If, however, the restart job step chosen by the user is not recoverable, the Restart Step Adjustment facility automatically can replace the user-specified restart step with the closest recoverable job step prior to the requested restart step, and issues an appropriate message to notify the user of the change.
The facility thereby enables the user to choose the restart step on the basis of application considerations without worrying if the step is actually recoverable.
By default, the Restart Step Adjustment facility is operational and performs step adjustment as needed. However, step adjustment can be disabled in either of the following ways:
-
by specifying the appropriate parameter in the Control-M/Restart PARM library
For more information, see [NO]STEPADJUST parameters. -
by specifying N (No) in the STEP ADJUSTMENT field in the Rerun/Restart or Confirm Restart window
If step adjustment is needed but step adjustment was disabled, job restart is terminated with a non-zero return code.
Non-restartable Step
The user can label any steps as non-restartable steps. Restart cannot start at a step that is defined as a non-restartable step, even if the step would otherwise be recoverable. When the Restart Step adjustment facility arrives at a non-restartable step, it continues rolling back to a preceding step.
Defining steps as non-restartable steps can possibly cause the restart to not be performed. For example if the step adjustment reaches the first job step but that step is defined as a non-restartable step, restart cannot be performed.
A step can be defined as a non-restartable step in either of two ways:
-
A special DDstatement can be placed in the JCL of the job. This impacts restarts of that job only. For more information, see Indicating non-restartable steps: CTRNORST DD.
-
An appropriate parameter definition can be placed in the Control-M/Restart PARM library. Depending on which member in the library is used, the parameter can apply to all jobs or only to the relevant job. For more information, see NONRESTARTABLE_STEP parameter and NONRESTARTABLE_PGM parameter.
Condition Code Recapture and ABEND Code Recapture
Sometimes the decision whether to execute a particular step is dependent upon the execution results (resulting condition code or ABEND code) of a previous step. The COND JCL parameter and IF/THEN/ELSE JCL statements are commonly used to establish this dependency.
The following statement is specified:
//STEPF EXEC ...,COND=(04,EQ,STEPB)
STEPF is executed only if STEPB does not terminate with a condition code of 04.
If the backward-referenced step is not executed in the restart run because it was executed in the previous run, the condition code or ABEND code information from the backward-referenced step would not normally be available for the COND or IF/THEN/ELSE JCL statements.
The Condition Code / ABEND Code Recapture facility analyzes the SYSDATA of the previous runs of a job order. It determines the condition codes and ABEND codes of backward-referenced steps and makes the recaptured values available during the restarted run.
The Condition Code / ABEND Code Recapture facility allows resetting those values to ended OK or not abended states with zero condition codes for previous not OK condition codes and ABENDed steps.
These codes can then be used by the COND parameter and IF/THEN/ELSE JCL statements.
If the ALLRUNS parameter in the CTRPARM member in the IOA PARM library is set to YES, the recaptured codes are also available for Control-M to use when evaluating the previous runs or restarts of a job during post processing.
If one step finished successfully in an original run and another step finished successfully after a restart, an ON block containing both criteria are satisfied by the successful step in each of the runs.
By default, condition code ABEND code recaptures operate automatically. They can also be set using the following methods:
-
By specifying the RECAPTCC/RECAPTABEND parameters in the appropriate Control-M/Restart PARM library.
-
By specifying Y (Yes) in the RECAPTURE CONDITION CODES and/or RECAPTURE ABEND CODES fields of the Rerun/Restart or Confirm Restart window.
On the other hand, you can reset or suppress these facilities in the following ways:
-
By specifying the appropriate parameters in the Control-M/Restart PARM library.
For more information, see [RESET/NO]RECAPTCC/[RESET/NO]RECAPTABEND Parameters
-
By specifying N (No) or R (Reset) in the RECAPTURE CONDITION CODES and/or RECAPTURE ABEND CODES fields of the Rerun/Restart or Confirm Restart window.
Standalone Control-M/Restart
If a job that did not run under Control-M (for example, an unscheduled job that does not have a job scheduling definition) requires restart, the restart can be requested from the Control-M/Restart Standalone panel. This panel corresponds to the R4 Control-M/Restart utility.
To perform Standalone restart under Control-M/Restart, access the R4 utility (or activate CLIST CTRCCTR in the TSO Command Processor).
The Control-M/Restart Standalone panel is described in detail in Operating Control-M/Restart in standalone mode.
Data Set Cleanup prior to the Original Run
As discussed in Data Set Cleanup and Prevent-NCT2 Processing, data set cleanup is automatically performed as part of restart and non-restarted rerun processing, but can also be performed prior to the original job run, as follows:
-
Automatic Prevent NCT2 processing can be defined for all jobs by setting the NCAT2 parameter in the CTRPARM member in the IOA PARM library to YES. Data set cleanup is then be performed prior to each original job run. This is applicable only to jobs that are run under Control-M.
-
Automatic Prevent NCT2 processing can be defined for specific jobs by specifying Y (Yes) for the PREVENT-NCT2 parameter in the corresponding Control-M job scheduling definitions. Data set cleanup is then be performed prior to the original runs of those jobs. The PREVENT-NCT2 parameter is described in detail in the Control-M for z/OS User Guide.
-
The Data Set Cleanup Online Utility (R2) is used to request data set cleanup without an accompanying job run. A CONTROLR step is inserted in the job stream and the edited job JCL is submitted. The CONTROLR step performs the necessary data set adjustment (including step adjustment, if needed) and then stops. No further job steps are executed.
The R2 utility is available only for jobs that have a Control-M job scheduling definition. It is described in detail in Online Facilities.
-
For jobs without a Control-M job scheduling definition, Prevent-NCT2 processing (data set cleanup prior to the original run) can be requested by selecting Prevent-NCT2 as the type of processing in the Control-M/Restart Standalone panel (the R4 online utility). The utility is described in Operating Control-M/Restart in standalone mode.
Maintaining Previous Runs in the History Jobs File
Under Control-M, active jobs are maintained in the Active Jobs file. Once these jobs are ended and likely no longer needed, they are generally deleted from the Active Jobs file during maintenance. However, if Control-M/Restart is used at the site, these job runs can be placed in the History Jobs file before being deleted from the Active Jobs file, in case they are needed again. Jobs in the History Jobs file can be restored back to the Active Jobs file.
Whether a job is placed in History Jobs file, and for how long it remains, depends on either of two RETENTION parameters in the job scheduling definition:
-
The RETENTION - # OF DAYS TO KEEP parameter indicates the maximum number of days the job remains in the History Jobs file before being deleted.
-
The RETENTION - # OF GENERATIONS TO KEEP parameter indicates the maximum number of generations of the job to keep in the History File. Once that number of generations is reached, older job runs are deleted for each new job run added to the file.
Retention of jobs in the History Jobs file is available only for jobs that are run under Control-M.