PATROL® Express: A Technical Overview

Contents

Introduction

Purpose of this paper

What is PATROL Express?

Terms related to PATROL Express

How PATROL Express Remotely Monitors Infrastructure

The different components of PATROL Express

How parts of PATROL Express interact

Security

RSM monitoring protocols

Parameter sets

Configuring Web transaction element

PATROL Express Service Measures Reports and Status Views

Service Measures reports

Critical alarm formulas

Current Status views

Alerts

Integrating PATROL Express with Advanced PATROL Solutions

Enterprise integration support

PATROL data into PATROL Express

PATROL Express data into PATROL

Administrative Support

Account list

Access control

Parameter Set Editor

Logo and style sheet support

Usage report

Deployment Options

RSM failover

RSM requirements

Service Integration Portal (SIP) architecture

Minimum system requirements for SIP

Summary

BMC Software

Introduction

Purpose of this paper

The goal of this paper is to provide more in-depth information about the following:

What is PATROL Express?

PATROL Express is an infrastructure monitoring solution that can help improve service to customers while driving down operational costs. It reduces costs by remotely monitoring the availability of network devices, systems and application infrastructure. Its agentless technology enables it to be deployed rapidly because it does not require software to be installed on the elements being monitored. This Web-based solution provides monitoring, notification of outages, and reporting across servers, networks and application infrastructures.

PATROL Express also monitors the performance and availability of Web transactions. It measures both transactions and infrastructure against user-defined service level objectives. Monitoring both infrastructure and transactions – and providing immediate problem notification and escalation – enables IT to quickly determine how infrastructure problems affect the end-user experience.

Terms related to PATROL Express

Terms that would help the reader better understand this document are defined in this section.

Element

PATROL Express remotely monitors elements. Two types of elements exist – infrastructure and Web transaction.

An infrastructure element can refer to any device that is IP addressable. Examples include operating system servers, Web servers, network routers and switches.

A Web transaction element monitors the performance and availability of HTTP and HTTPS Web transactions. Examples of Web transaction elements include buying a book online or conducting a Web search.

Service

A service is a logical group of elements. These groups allow the user to better organize a PATROL Express account for easier navigation. In PATROL Express, an element can belong to more than one service. Common types of services include elements of the same type, such as all Oracle databases, or systems that work together to provide a common function, such as a Web site.

Account

PATROL Express supports multiple-user accounts. Each account has a separate user name and password. Users determine the criteria for organizing the services within an account. As many services as necessary may be set up to monitor an account, but each account must always contain at least one service.

Availability

In PATROL Express, availability is expressed as the percentage of time the element is available during a selected time period. An element is considered unavailable when in critical alarm state. (For more information about critical alarm state, please refer to Critical alarm formulas on page 11.)

Mean Time to Repair (MTTR)

MTTR is the average length of time it takes to fix a problem that caused an alarm state on a monitored element.

Parameter set

A parameter set is a logical grouping of parameters. Each parameter set represents an independent component of an infrastructure element. For example, the operating system parameter sets contain various general system performance and resources parameters, such as CPU usage, virtual memory and physical memory.

How PATROL Express Remotely Monitors Infrastructure

The different components of PATROL Express

PATROL Express consists of the following two main components.

Service Integration Portal (SIP)

The SIP is a Web-based portal application that resides at the IT organization’s or service provider’s data center. The SIP is where end users configure elements, organize elements into services, view reports and set up notifications.

Remote Service Monitor (RSM)

The purpose of the RSM is to collect performance data and relay it to the SIP. It is downloaded from the SIP and installed and deployed in the user’s network. Once installed, it will remotely monitor elements that are configured at the SIP. The RSM remotely (agentless) monitors elements using a set of industry-accepted protocols.

Dedicated versus shared RSMs

A dedicated RSM monitors the elements of only one PATROL Express account. A shared RSM monitors the elements of multiple accounts. An end user can create a dedicated RSM, while a PATROL Express administrator can create a shared RSM.

As seen in the following graphics, a user account can use any of the following:

Users can perform the following:

The diagram above – one of many possible deployment options for PATROL Express – shows how data is sent between RSMs and the SIP.

The diagram above shows how one account can use multiple RSMs to monitor various elements.

How parts of PATROL Express interact

Where the RSM is located

The RSM resides in the customer’s environment and must have IP addressability to the elements that it monitors. The RSM must also be able to resolve the PATROL Express SIP IP address.

The RSM runs as a service and supports a number of remote monitoring protocols. Each RSM can monitor hundreds of elements. Typically, the RSM will retrieve parameters from an element once a minute, which is known as the monitoring interval. The RSM also has a manager application that resides in the Microsoft Windows system tray, enabling the user to easily stop and start the RSM as well as obtain diagnostic and communication information.

The RSM’s manager application, which resides in the Windows system tray, enables the user to easily stop and start the monitoring application as well as obtain diagnostic and communication information.

How the RSM communicates with the SIP

The RSM communicates with the SIP using HTTP and HTTPS. The data is compressed and encrypted prior to being sent. The RSM initiates all communication between the SIP and the RSM. These communications fall into one of the three following categories:

Security

PATROL Express was designed with security in mind. As previously mentioned, all traffic, including log on credentials between the SIP and the RSM, is compressed and encrypted using HTTPS. In addition, the following precautions have been taken:

RSM monitoring protocols

The following table lists the type of data PATROL Express collects and the protocols used by the RSM to collect that data.

What it monitors/Type of data it collects

Protocol

  • MS Windows operating systems
  • MS Exchange
  • IIS
  • MS SQL Server
  • Other Windows-based solutions
  • PerfMon

  • Specific criteria in the Windows application, system and security event logs
  • WMI (alternative for monitoring Windows event logs)

  • Unix operating systems
  • rstatd, Secure Shell, SNMP

  • Network devices, operating systems and
    applications
  • SNMP

  • PATROL Knowledge Modules® (KMs)
  • PATROL protocol

  • Oracle database information
  • Sybase
  • SQL*Net

  • Response time and availability of HTTP/HTTPS requests
  • HTTP/HTTPS

  • Domain Name System (DNS) lookup time
  • DNS

  • Network connectivity of an element
  • Ping

  • Text log files
  • File system

    Parameter sets

    Below are some of the different categories of PATROL Express parameter sets. For a complete list of PATROL Express parameter sets and the parameters they monitor, please refer to the PATROL Express User Guide. You can view or download a PDF version of this document at: www.bmc.com/patrolexpress/support.

    Operating systems

    PATROL Express monitors basic operating system parameters, including memory and CPU and disk utilization. PATROL Express can be configured to monitor specific processes as well as how much memory and CPU a process is using. It can also monitor both Windows Event logs and text log files for user-defined messages databases.

    Databases

    PATROL Express takes a snapshot of database performance, which includes ensuring that the database is up and running and that it can monitor parameters such as number of transactions, lock usage and active SQL statements.

    Network devices

    PATROL Express monitors each interface (port) on the device to ensure it is running and reports how much data has been transmitted – and at what speed. In addition, it monitors the status of network devices, checking on availability and reporting the system description of each device.

    Storage devices

    When monitoring storage devices, PATROL Express focuses primarily on availability, such as up/down status and environmental systems, including fans, power supplies and temperature; asset information, such as vendor, model, serial number and firmware; and configuration, such as device capacity and number of ports.

    Web transactions

    PATROL Express monitors the performance and availability of Web transactions using HTTP and HTTPS. PATROL Express supports all major dynamic HTML techniques, such as JSP, ASP and CGI, and popular content types, such as Microsoft Word, PDF and plain text.

    Configuring Web transaction element

    Teaching

    From the SIP, customers teach PATROL Express the transaction, or path, they want it to monitor.

    Teaching is a three-step process for the user:

    1. Log on to PATROL Express and select Add an Element.

    2. Provide PATROL Express with the starting URL of the transaction to monitor.

    3. Click through the path of Web pages to be monitored.

    Note: teaching is an online process. No software needs to be downloaded to teach PATROL Express a path of Web pages to monitor.

    During the teaching process, requests made by the user go through the SIP (for preprocessing) before going out to the Web site to be monitored. PATROL Express records all the HTTP and HTTPS requests made when the element is taught. After the request is processed at the site, the returned information passes through the SIP before appearing in the user’s browser. Later, when PATROL Express monitors the site, it will simulate the requests to appear as if a user is making them.

    It takes just seconds for someone to teach or reteach PATROL Express a transaction. An element can be taught unlimited times. Once teaching is complete, the user selects which locations (RSMs) from which to monitor, then PATROL Express deploys the transaction and begins monitoring almost immediately.

    Diagnostics

    PATROL Express provides two types of diagnostics for Web pages. First, customers can run trace routes from the RSMs to diagnose network-related problems. Second, customers can request an object-level breakdown for a specific URL. This breakdown shows the download time for all Web-page components such as images and HTML.

    Customers can run trace routes from the RSMs to diagnose network-related problems.

    Customers can request an object-level breakdown for a specific URL.

    PATROL Express Service Measures Reports and Status Views

    Service Measures reports

    Service Measures reports show the availability and mean time to repair (MTTR) for accounts, services and elements. Both availability and MTTR are measured against defined service level objectives. Users can select options to compare multiple services and multiple elements in either Availability or MTTR reports.

    The Service Measures report above is displaying availability by account.

    Account reports

    The following are charts at the account level of the Service Measures tab.

    Service reports

    The following are charts at the service group level of the Service Measures tab.

    Infrastructure element reports

    The following are charts at the element level of Service Measures.

    Web transaction element reports

    Critical alarm formulas

    Both infrastructure and Web transaction elements have critical alarm rules, which enable customers to specify the conditions that must be met for an element to be considered in critical alarm status. Critical alarm rules offer more granular control on what conditions constitute an alarm.

    Critical alarms for Web transaction elements

    For Web transactions, customers can specify the number of locations that must detect an outage before it is considered a critical alarm. For example, customers may create a rule that requires only one location (RSM) to detect an outage before the element is considered in critical alarm. Alternatively, customers may choose to create a rule that requires three or more locations (RSMs) to detect an outage. By allowing this type of flexibility, customers can prevent sporadic network outages at single locations from affecting the overall availability. Additionally, notification rules can be set up to send alerts only when critical alarms are detected. This significantly reduces the likelihood of false alarms, which are also known as false positives.

    Critical alarms for infrastructure elements

    For infrastructure elements, the customers specify how long a parameter must exceed the specified threshold. This is known as the go critical duration. An element will only be in critical alarm if the threshold has been violated for the specified time. This functionality helps prevent false alarms because it requires a problem to be detected for a sustained amount of time. For example, a CPU utilization may spike up only momentarily when an application is loading and then return to its normal range. Critical alarms allow customers to prevent false alarms in situations such as this one.

    The following are examples of critical alarm rules.

    Current Status views

    Current Status views show a quick snapshot of the current services and elements. Possible states include: critical alarm, alarm, warning and normal.

    Critical alarm, alarm and warning exceptions within elements are rolled up to the account and service level. This allows the user to quickly see if problems exist for the account. For example, if one parameter for one element detects a warning condition, yet the remaining parameters for other elements do not detect any problems, the status view of the service is “warning.” Another example is if an element is in critical alarm, then the service containing that element will also be in critical alarm.

    As seen in the following graphic, current status views for the accounts and services have “quick view” graphic bars that display the overall health of the account or service. This enables IT organizations and service providers, as well as their customers, to quickly see if there are problems, and to determine the severity of those problems.

    Status views, which have quick-view graphic bars that display the overall account/service health, enable IT organizations and service providers, as well as their customers, to quickly see if there are problems, and to determine the severity of those problems.

    This status view provides a picture of how Web transactions are performing at various locations.

    Alerts

    PATROL Express includes flexible alerting rules. An alert rule consists of whom to notify, how to notify them, what elements and services they are responsible for and when to notify them.

    Users can set up notifications based on the types of alerts and for different services or elements. Some examples include:

    In addition, PATROL Express enables users to create an escalation hierarchy for notification within a department or company. For example, a company that uses PATROL Express may want one of the members of its IT department to be notified about a problem immediately. However, if after 30 minutes the alert still occurs, the IT manager would be notified.

    Integrating PATROL Express with Advanced PATROL Solutions

    The following are three methods for integrating PATROL Express with advanced PATROL solutions:

    Enterprise integration support

    Enterprise integration enables the administrator to configure PATROL Express to send all PATROL Express alerts to the PATROL Enterprise Manager or another enterprise management console that accepts SNMP traps. PATROL Express sends SNMP traps that conform to the SNMP version 1 format. The trap information includes customer account information as well as information about the cause of the problem.

    PATROL data into PATROL Express

    To integrate data from advanced PATROL solutions into PATROL Express, PATROL Express collects PATROL Knowledge Module (KM) data at defined reporting intervals. PATROL Express supports PATROL 3.3 and above and includes support for the following KMs: Windows, Unix and Oracle. Support for other KMs can be added using the Parameter Set Editor.

    PATROL Express data into PATROL

    To integrate PATROL Express data into advanced PATROL solutions, the PATROL Express KM collects data from the PATROL Express SIP. Once data is in the PATROL Express KM, it can be leveraged by the PATROL family of products. The PATROL Express KM communicates with the SIP via HTTP or HTTPS protocols, and can access that data virtually anywhere across network boundaries. Users with PATROL running in one network and PATROL Express in another can exchange information between the two solutions, as long as they have access to the SIP. In summary, this integration allows users to monitor and manage what PATROL Express is monitoring – via a single PATROL Console.

    The figure above shows PATROL Express in the management architecture of advanced PATROL solutions.

    Administrative Support

    PATROL Express has both customer accounts and administrative accounts. Typically, the customers of IT organizations and service providers have customer accounts, while system administrators and service providers have administrative accounts.

    Administrators have special authorities. Logging on to PATROL Express as an administrator provides users with access to tools to perform the following tasks:

    Account list

    Logging on as a PATROL Express administrator allows users to immediately identify which customer accounts are in alarm or warning status. From the Account List screen, administrators can identify a problem with a specific account and then log on to that account to further investigate the cause of the problem.

    Access control

    Access control support enables administrators of PATROL Express to support different types of users. Also, administrators can set access control for each account. The following are the three levels of access that can be set by administrators:

    Administrators can immediately identify which customer accounts are out of compliance with their service level objectives and which accounts are in alarm or warning status.

    Parameter Set Editor

    In addition to using the default parameter sets to monitor the infrastructure elements in an account, PATROL Express administrators can now create custom parameter sets. The Parameter Set Editor (PSE) provides an interface that guides the administrator through the required steps of adding parameter sets that publish monitoring data using PerfMon, SNMP and PATROL. Once the customized parameter set has been created it can be used by anyone in that system to monitor elements.

    Logo and style sheet support

    Usage report

    The usage report contains information that service providers can use to invoice PATROL Express customers. This report contains account usage details. Account usage consists of the number of elements and parameter sets being monitored, listed by account.

    Deployment Options

    RSM failover

    Multiple physical machines may be clustered to create one logical RSM. That way if one RSM fails, the others will handle the load. Refer to the PATROL Express User Guide, which can be viewed or downloaded at www.bmc.com/patrolexpress/support, for more details on how to configure this.

    RSM requirements

    The following are the minimum system requirements for installing a dedicated RSM:

    Service Integration Portal (SIP) architecture

    The SIP can be deployed in a variety of configurations, depending on customers’ availability needs. For a simple, low availability solution, the SIP can be deployed entirely on one machine. If higher availability is needed, then the SIP can be deployed on multiple machines.

    The underlying architecture has three main components: a Web server, an application server and a database server. In a multi-machine configuration, each of these components can be deployed on a separate computer, and multiple application servers and Web servers can be deployed in parallel. If one machine fails, then the other machines will handle the additional work. For scalability, the database may be deployed on a Unix cluster, allowing for failover. Once customers determine their availability needs, they can determine the configuration that is best for their needs.

    Minimum system requirements for SIP

    The following are the minimal system requirements for running the SIP on a single Windows 2000 server.

    Summary

    PATROL Express status monitoring proactively identifies and quickly resolves substandard service performance. PATROL Express offers the ability to create and measure service groups easily, comparing them with service level objectives. It pinpoints problem areas and provides notification via pager and email. Ultimately, PATROL Express reduces the complexity of deploying a management solution while providing the necessary service level monitoring to keep your business running.

    BMC Software

    BMC Software, Inc. [NYSE:BMC], is a leader in enterprise management. The company focuses on Assuring Business Availability® for its customers by helping them proactively improve service, reduce costs and increase value to their business. BMC Software solutions span enterprise systems, applications, databases and service management. Through its Business Service Management strategy, the company’s solutions enable customers to have a complete view of their business and IT operations by linking IT resources to business objectives. Founded in 1980, BMC Software has offices worldwide and is a member of the S&P 500, with fiscal year 2003 revenues of approximately $1.3 billion.

    29122 07/03