Service-Level Agreement Management Job
Service-Level Agreement (SLA) Management jobs enable you to identify a chain of jobs that comprise a critical service and must complete by a certain time. The SLA Management job is always defined as the last job in the chain of jobs.
To manage SLA Management jobs, you must install the SLA Management add-on (previously, Control-M Batch Impact Manager) on your Control-M environment.
The following examples show how to define an SLA Management job:
-
The first nested JSON defines a Command job that prints Hello and then adds an event, Hello-TO-SLA_Job_for_SLA-GOOD.
-
The second nested JSON defines an SLA Management job for a critical service, SLA-GOOD. This job waits for the event added by the Command job (Hello-TO-SLA_Job_for_SLA-GOOD) and then deletes it.
Copy{
"SLARobotTestFolder_Good":
{
"Type": "SimpleFolder",
"ControlmServer": "LocalControlM",
"Hello":
{
"Type": "Job:Command",
"CreatedBy": "emuser",
"RunAs": "controlm",
"Command": "echo \"Hello.\"",
"eventsToAdd":
{
"Type": "AddEvents",
"Events": [
{
"Event": "Hello-TO-SLA_Job_for_SLA-GOOD"
} ]
}
},
"SLA":
{
"Type": "Job:SLAManagement",
"ServiceName": "SLA-GOOD",
"ServicePriority": "1",
"CreatedBy": "emuser",
"RunAs": "DUMMYUSR",
"JobRunsDeviationsTolerance": "2",
"CompleteIn":
{
"Time": "00:01"
},
"eventsToWaitFor":
{
"Type": "WaitForEvents",
"Events": [
{
"Event": "Hello-TO-SLA_Job_for_SLA-GOOD"
} ]
},
"eventsToDelete":
{
"Type": "DeleteEvents",
"Events": [
{
"Event": "Hello-TO-SLA_Job_for_SLA-GOOD"
} ]
}
}
}
}
The following table describes the SLA Management job parameters.
Parameter |
Description |
---|---|
ServiceName |
Defines a logical name, from a user or business perspective, for the critical service, in alphanumeric characters. BMC recommends that the service name be unique. Valid Values: 1–64 |
ServicePriority |
Defines the priority level of this service, from a user or business perspective, with 1 being the highest priority and 5 the lowest. Valid Values: 1–5 Default: 3 |
JobRunsDeviationsTolerance |
Determines the extent of tolerated deviation from the average completion time for a job in the service, expressed as a number of standard deviations based on percentile ranges. If the run time falls within the tolerance set, it is considered on time, otherwise it has run too long or ended too early. Valid Values:
The JobRunsDeviationsTolerance parameter and the AverageRunTimeTolerance parameter are mutually exclusive. Specify only one of these two parameters. |
AverageRunTimeTolerance |
Defines extent of tolerated deviation from the average completion time for a job in the service, expressed as a percentage of the average time or as the number of minutes that the job can be early or late. If the run time falls within the tolerance set, it is considered on time, otherwise it has run too long or ended too early. The following examples demonstrate how to set this parameter, as follows:
The AverageRunTimeTolerance parameter and the JobRunsDeviationsTolerance parameter are mutually exclusive. You must use only one of these two parameters. |
CompleteBy |
Defines the time (in HH:MM) and number of days that the critical service must be completed in for it to be considered on time. Defaults:
In the following example, the critical service must complete by 11:51 PM, three days after it begins. Copy
The CompleteBy parameter and the CompleteIn parameter are mutually exclusive. You must use only one of these two parameters. |
CompleteIn |
Defines the number of hours and minutes for the critical service to complete and be considered on time, as in the following example: Copy
The CompleteIn parameter and the CompleteBy parameter are mutually exclusive. You must use only one of these two parameters. |
ServiceActions |
Defines automatic interventions—actions, such as rerunning a job or extending the service due time—in response to specific occurrences, which are defined with conditional, If statements, such as if a job finishes too quickly or a service finishes late. For more information, see Service Actions. |
Service Actions
The following example shows how to define a Service Actions job:
"ServiceActions":
{
"If:SLA:ServiceIsLate_0":
{
"Type": "If:SLA:ServiceIsLate",
"Action:SLA:Notify_0":
{
"Type": "Action:SLA:Notify",
"Severity": "Regular",
"Message": "this is a message"
},
"Action:SLA:Mail_1":
{
"Type": "Action:SLA:Mail",
"Email": "email@okmail.com",
"Subject": "this is a subject",
"Message": "this is a message"
},
"If:SLA:JobFailureOnServicePath_1":
{
"Type": "If:SLA:JobFailureOnServicePath",
"Action:SLA:Order_0":
{
"Type": "Action:SLA:Order",
"Server": "LocalControlM",
"Folder": "folder",
"Job": "job",
"Date": "OrderDate",
"Library": "library"
}
},
"If:SLA:ServiceEndedNotOK_5":
{
"Type": "If:SLA:ServiceEndedNotOK",
"Action:SLA:Set_0":
{
"Type": "Action:SLA:Set",
"Variable": "varname",
"Value": "varvalue"
},
"Action:SLA:Increase_2":
{
"Type": "Action:SLA:Increase",
"Time": "04:03"
}
},
"If:SLA:ServiceLatePastDeadline_6":
{
"Type": "If:SLA:ServiceLatePastDeadline",
"Action:SLA:Event:Add_0":
{
"Type": "Action:SLA:Event:Add",
"Server": "LocalControlM",
"Name": "addddd",
"Date": "AnyDate"
}
}
}
If Statements
The following table describes If statements that can be used to define situations where an action must be taken.
If Statement |
Description |
---|---|
If:SLA:ServiceIsLate |
Determines that the service will be late according to SLA Management calculations. |
If:SLA:JobFailureOnServicePath |
Determines that one or more of the jobs in the service failed and caused a delay in the service. An SLA Management service is considered OK even if one of its jobs fails, provided that another job, with an Or relationship to the failed job, runs successfully. |
If:SLA:JobRanTooLong |
Determines that one of the jobs in the critical service is late. Lateness is calculated according to the average run time and Job Runtime Tolerance settings. A service is considered on time even if one of its jobs is late, provided that the service itself is not late. |
If:SLA:JobFinishedTooQuickly |
Determines that one of the jobs in the critical service is early. The end time is calculated according to the average run time and Job Runtime Tolerance settings. A service is considered on time even if one of its jobs is early. |
If:SLA:ServiceEndedOK |
Determines that the service ended OK. |
If:SLA:ServiceEndedNotOK |
Determines that the service ended late, after the deadline. |
If:SLA:ServiceLatePastDeadline |
Determines that the service is late, and passed its deadline. |
Service Actions Job Parameters
The following table describes the Service Actions job parameters.
Parameter |
Description |
Sub-parameters |
---|---|---|
Action:SLA:Notify |
Sends a notification to the Alerts Window. |
|
Action:SLA:Mail |
Sends an email to a specific email recipient. |
|
Action:SLA:Remedy |
Opens a ticket in the Remedy Help Desk. |
|
Action:SLA:Order |
Runs a job, regardless of its scheduling criteria. |
|
Action:SLA:SetToOK |
Sets the job completion status to OK, regardless of its actual completion status. |
|
Action:SLA:SetToOK:ProblematicJob |
Sets the completion status to OK for a job that is not running on time and will impact the service. |
No parameters |
Action:SLA:Rerun |
Reruns the job, regardless of its scheduling criteria. |
|
Action:SLA:Rerun:ProblematicJob |
Reruns a job that is not running on time and will impact the service. |
No parameters |
Action:SLA:Kill |
Kills a job when it is still executing. |
|
Action:SLA:Kill:ProblematicJob |
Kills a problematic job, which is a job that is not running on time in the service, while it is still executing. |
No parameters. |
Action:SLA:Set |
Assigns a value to a variable for use in a job rerun. |
|
Action:SLA:SIM |
Sends early warning notification about the critical service to BMC Service Impact Manager. |
|
Action:SLA:Increase |
Allows the job or critical service to continue running by extending the deadline, in hours and minutes, that the job or service can execute for and still be considered on time. |
|
Action:SLA:Event:Add |
Adds an event. |
|
Action:SLA:Event:Delete |
Deletes an event. |
|