Service-Level Agreement Management Job

Service-Level Agreement (SLA) Management jobs enable you to identify a chain of jobs that comprise a critical service and must complete by a certain time. The SLA Management job is always defined as the last job in the chain of jobs.

To manage SLA Management jobs, you must install the SLA Management add-on (previously, Control-M Batch Impact Manager) on your Control-M environment.

The following examples show how to define an SLA Management job:

  • The first nested JSON defines a Command job that prints Hello and then adds an event, Hello-TO-SLA_Job_for_SLA-GOOD.

  • The second nested JSON defines an SLA Management job for a critical service, SLA-GOOD. This job waits for the event added by the Command job (Hello-TO-SLA_Job_for_SLA-GOOD) and then deletes it.

    Copy
    {
       "SLARobotTestFolder_Good"
       {
          "Type": "SimpleFolder",
          "ControlmServer": "LocalControlM",
          "Hello"
          {
             "Type": "Job:Command",
             "CreatedBy": "emuser",
             "RunAs": "controlm",
             "Command": "echo \"Hello.\"",
             "eventsToAdd"
             {
                "Type": "AddEvents",
                "Events": [
                {
                   "Event": "Hello-TO-SLA_Job_for_SLA-GOOD"
                } ]
             }
          },
          "SLA"
          {
             "Type": "Job:SLAManagement",
             "ServiceName": "SLA-GOOD",
             "ServicePriority": "1",
             "CreatedBy": "emuser",
             "RunAs": "DUMMYUSR",
             "JobRunsDeviationsTolerance": "2",
             "CompleteIn"
             {
                "Time": "00:01"
             },
             "eventsToWaitFor"
             {
                "Type": "WaitForEvents",
                "Events": [
                {
                   "Event": "Hello-TO-SLA_Job_for_SLA-GOOD"
                } ]
             },
             "eventsToDelete"
             {
                "Type": "DeleteEvents",
                "Events": [
                {
                   "Event": "Hello-TO-SLA_Job_for_SLA-GOOD"
                } ]
             }
          }
       }
    }

The following table describes the SLA Management job parameters.

Parameter

Description

ServiceName

Defines a logical name, from a user or business perspective, for the critical service, in alphanumeric characters.

BMC recommends that the service name be unique.

Valid Values: 1–64

ServicePriority

Defines the priority level of this service, from a user or business perspective, with 1 being the highest priority and 5 the lowest.

Valid Values: 1–5

Default: 3

JobRunsDeviationsTolerance

Determines the extent of tolerated deviation from the average completion time for a job in the service, expressed as a number of standard deviations based on percentile ranges.

If the run time falls within the tolerance set, it is considered on time, otherwise it has run too long or ended too early.

Valid Values:

  • 2: 95.5% (highest confidence in the completion time).

  • 3: 99.73%

  • 4: 99.99% (lowest confidence).

The JobRunsDeviationsTolerance parameter and the AverageRunTimeTolerance parameter are mutually exclusive. Specify only one of these two parameters.

AverageRunTimeTolerance

Defines extent of tolerated deviation from the average completion time for a job in the service, expressed as a percentage of the average time or as the number of minutes that the job can be early or late.

If the run time falls within the tolerance set, it is considered on time, otherwise it has run too long or ended too early.

The following examples demonstrate how to set this parameter, as follows:

  • Based on a percentage of the average run time:

    Copy
    "AverageRunTimeTolerance"
    {
       "Units": "Percentage",
       "AverageRunTime": "94"
    }
  • Based on a number of minutes:

    Copy
    "AverageRunTimeTolerance"
    {
       "Units": "Minutes",
       "AverageRunTime": "10"
    }

The AverageRunTimeTolerance parameter and the JobRunsDeviationsTolerance parameter are mutually exclusive. You must use only one of these two parameters.

CompleteBy

Defines the time (in HH:MM) and number of days that the critical service must be completed in for it to be considered on time.

Defaults:

  • Time: 12:00

  • Days: 0 (On the same day).

In the following example, the critical service must complete by 11:51 PM, three days after it begins.

Copy
"CompleteBy"
{
   "Time": "23:51",
   "Days": "3"
}

The CompleteBy parameter and the CompleteIn parameter are mutually exclusive. You must use only one of these two parameters.

CompleteIn

Defines the number of hours and minutes for the critical service to complete and be considered on time, as in the following example:

Copy
"CompleteIn"
{
   "Time": "15:21"
}

The CompleteIn parameter and the CompleteBy parameter are mutually exclusive. You must use only one of these two parameters.

ServiceActions

Defines automatic interventions—actions, such as rerunning a job or extending the service due time—in response to specific occurrences, which are defined with conditional, If statements, such as if a job finishes too quickly or a service finishes late.

For more information, see Service Actions.

Service Actions

The following example shows how to define a Service Actions job:

Copy
"ServiceActions"
{
   "If:SLA:ServiceIsLate_0"
   {
      "Type": "If:SLA:ServiceIsLate",
      "Action:SLA:Notify_0"
      {
         "Type": "Action:SLA:Notify",
         "Severity": "Regular",
         "Message": "this is a message"
      },
      "Action:SLA:Mail_1"
      {
         "Type": "Action:SLA:Mail",
         "Email": "email@okmail.com",
         "Subject": "this is a subject",
         "Message": "this is a message"
      },
   "If:SLA:JobFailureOnServicePath_1"
   {
      "Type": "If:SLA:JobFailureOnServicePath",
      "Action:SLA:Order_0"
      {
         "Type": "Action:SLA:Order",
         "Server": "LocalControlM",
         "Folder": "folder",
         "Job": "job",
         "Date": "OrderDate",
         "Library": "library"
      }      
   },
   "If:SLA:ServiceEndedNotOK_5"
   {
      "Type": "If:SLA:ServiceEndedNotOK",
      "Action:SLA:Set_0"
      {
         "Type": "Action:SLA:Set",
         "Variable": "varname",
         "Value": "varvalue"
      },
      "Action:SLA:Increase_2"
      {
         "Type": "Action:SLA:Increase",
         "Time": "04:03"
      }
   },
   "If:SLA:ServiceLatePastDeadline_6"
   {
      "Type": "If:SLA:ServiceLatePastDeadline",
      "Action:SLA:Event:Add_0"
      {
         "Type": "Action:SLA:Event:Add",
         "Server": "LocalControlM",
         "Name": "addddd",
         "Date": "AnyDate"
      }
   }
}

If Statements

The following table describes If statements that can be used to define situations where an action must be taken.

If Statement

Description

If:SLA:ServiceIsLate

Determines that the service will be late according to SLA Management calculations.

If:SLA:JobFailureOnServicePath

Determines that one or more of the jobs in the service failed and caused a delay in the service.

An SLA Management service is considered OK even if one of its jobs fails, provided that another job, with an Or relationship to the failed job, runs successfully.

If:SLA:JobRanTooLong

Determines that one of the jobs in the critical service is late. Lateness is calculated according to the average run time and Job Runtime Tolerance settings.

A service is considered on time even if one of its jobs is late, provided that the service itself is not late.

If:SLA:JobFinishedTooQuickly

Determines that one of the jobs in the critical service is early. The end time is calculated according to the average run time and Job Runtime Tolerance settings.

A service is considered on time even if one of its jobs is early.

If:SLA:ServiceEndedOK

Determines that the service ended OK.

If:SLA:ServiceEndedNotOK

Determines that the service ended late, after the deadline.

If:SLA:ServiceLatePastDeadline

Determines that the service is late, and passed its deadline.

Service Actions Job Parameters

The following table describes the Service Actions job parameters.

Parameter

Description

Sub-parameters

Action:SLA:Notify

Sends a notification to the Alerts Window.

  • Severity: (Optional) Severity level: Regular (default), Urgent, or VeryUrgent.

  • Message: Notification text.

    You can include any of the following variables in your message:

    • %%PROBLEMATIC_JOBS

    • %%SERVICE_DUE_TIME

    • %%SERVICE_EXPECTED_END_TIME

    • %%SERVICE_NAME

    • %%SERVICE_PRIORITY

Action:SLA:Mail

Sends an email to a specific email recipient.

  • Email: Email address.

  • Subject: Subject line.

  • Message: (Optional) Message body text.

    You can include any of the following variables in your message:

    • %%PROBLEMATIC_JOBS

    • %%SERVICE_DUE_TIME

    • %%SERVICE_EXPECTED_END_TIME

    • %%SERVICE_NAME

    • %%SERVICE_PRIORITY

Action:SLA:Remedy

Opens a ticket in the Remedy Help Desk.

  • Urgency: (Optional) Urgency level: Low (default), Medium, High, or Urgent.

  • Summary: Summary line.

  • Message: Message body text.

    You can include any of the following variables in your message:

    • %%PROBLEMATIC_JOBS

    • %%SERVICE_DUE_TIME

    • %%SERVICE_EXPECTED_END_TIME

    • %%SERVICE_NAME

    • %%SERVICE_PRIORITY

Action:SLA:Order

Runs a job, regardless of its scheduling criteria.

  • Server: Control-M/Server.

  • Folder: Folder name that contains the job.

  • Job: Job name.

  • Date: (Optional) When to run, one of the following:

    NextOrderDate, PrevOrderDate, NoDate, OrderDate (default), AnyDate, or a specific date in mm/dd format.

  • Library: (z/OS job only) Name of the z/OS library that contains the job.

Action:SLA:SetToOK

Sets the job completion status to OK, regardless of its actual completion status.

  • Server: Control-M/Server.

  • Folder: Folder name that contains the job..

  • Job: Job name.

  • Date: (Optional) Schedule for setting to OK, one of the following:

    NextOrderDate, PrevOrderDate, NoDate, OrderDate (default), AnyDate, or a specific date in mm/dd format.

Action:SLA:SetToOK:ProblematicJob

Sets the completion status to OK for a job that is not running on time and will impact the service.

No parameters

Action:SLA:Rerun

Reruns the job, regardless of its scheduling criteria.

  • Server: Control-M/Server.

  • Folder: Folder name that contains the job.

  • Job: Job name.

  • Date: (Optional) When to rerun, one of the following:

    NextOrderDate, PrevOrderDate, NoDate, OrderDate (default), AnyDate, or a specific date in mm/dd format.

Action:SLA:Rerun:ProblematicJob

Reruns a job that is not running on time and will impact the service.

No parameters

Action:SLA:Kill

Kills a job when it is still executing.

  • Server: Control-M/Server.

  • Folder: Folder name that contains the job.

  • Job: Job name.

  • Date: (Optional) When to kill the job, one of the following:

    NextOrderDate, PrevOrderDate, NoDate, OrderDate (default), AnyDate, or a specific date in mm/dd format.

Action:SLA:Kill:ProblematicJob

Kills a problematic job, which is a job that is not running on time in the service, while it is still executing.

No parameters.

Action:SLA:Set

Assigns a value to a variable for use in a job rerun.

  • Variable: Name of the variable.

  • Value: Value to assign to the variable.

Action:SLA:SIM

Sends early warning notification about the critical service to BMC Service Impact Manager.

  • ConnectTo: Target ProactiveNet Server/Cell, defined as hostname[:port].

    Default Port: 1828

  • Message: Notification text.

    Valid Values: 1–211 characters.

    You can include any of the following variables in your message:

    • %%PROBLEMATIC_JOBS

    • %%SERVICE_DUE_TIME

    • %%SERVICE_EXPECTED_END_TIME

    • %%SERVICE_NAME

    • %%SERVICE_PRIORITY

Action:SLA:Increase

Allows the job or critical service to continue running by extending the deadline, in hours and minutes, that the job or service can execute for and still be considered on time.

  • Time: Amount of time to add to the service, in HH:MM format.

 

Action:SLA:Event:Add

Adds an event.

  • Server: Control-M/Server.

  • Name: Event name.

  • Date: (Optional) When to kill the job, one of the following:

    NextOrderDate, PrevOrderDate, NoDate, OrderDate (default), AnyDate, or a specific date in mm/dd format.

Action:SLA:Event:Delete

Deletes an event.

  • Server: Control-M/Server.

  • Name: Event name.

  • Date: (Optional) When to kill the job, one of the following:

    NextOrderDate, PrevOrderDate, NoDate, OrderDate (default), AnyDate, or a specific date in mm/dd format.