PATROL® Express: A Technical Overview
Contents
Terms related to PATROL Express
How PATROL Express Remotely Monitors Infrastructure
The different components of PATROL Express
How parts of PATROL Express interact
Configuring Web transaction element
PATROL Express Service Measures Reports and Status Views
Integrating PATROL Express with Advanced PATROL Solutions
Enterprise integration support
PATROL data into PATROL Express
PATROL Express data into PATROL
Service Integration Portal (SIP) architecture
Minimum system requirements for SIP
Introduction
Purpose of this paper
The goal of this paper is to provide more in-depth information about the following:
- How PATROL Express monitors the performance and availability of infrastructure
elements and Web transactions- How the different components of PATROL Express work together
- What security precautions are taken by PATROL Express
- Which monitoring protocols are supported by PATROL Express
- How PATROL Express integrates with advanced PATROL solutions
- What different reports and views are offered by PATROL Express
- How the administrative support helps manage accounts
- What the minimum requirements are for running PATROL Express
What is PATROL Express?
PATROL Express is an infrastructure monitoring solution that can help improve service to customers while driving down operational costs. It reduces costs by remotely monitoring the availability of network devices, systems and application infrastructure. Its agentless technology enables it to be deployed rapidly because it does not require software to be installed on the elements being monitored. This Web-based solution provides monitoring, notification of outages, and reporting across servers, networks and application infrastructures.
PATROL Express also monitors the performance and availability of Web transactions. It measures both transactions and infrastructure against user-defined service level objectives. Monitoring both infrastructure and transactions – and providing immediate problem notification and escalation – enables IT to quickly determine how infrastructure problems affect the end-user experience.
Terms related to PATROL Express
Terms that would help the reader better understand this document are defined in this section.
Element
PATROL Express remotely monitors elements. Two types of elements exist – infrastructure and Web transaction.
An infrastructure element can refer to any device that is IP addressable. Examples include operating system servers, Web servers, network routers and switches.
A Web transaction element monitors the performance and availability of HTTP and HTTPS Web transactions. Examples of Web transaction elements include buying a book online or conducting a Web search.
Service
A service is a logical group of elements. These groups allow the user to better organize a PATROL Express account for easier navigation. In PATROL Express, an element can belong to more than one service. Common types of services include elements of the same type, such as all Oracle databases, or systems that work together to provide a common function, such as a Web site.
Account
PATROL Express supports multiple-user accounts. Each account has a separate user name and password. Users determine the criteria for organizing the services within an account. As many services as necessary may be set up to monitor an account, but each account must always contain at least one service.
Availability
In PATROL Express, availability is expressed as the percentage of time the element is available during a selected time period. An element is considered unavailable when in critical alarm state. (For more information about critical alarm state, please refer to Critical alarm formulas on page 11.)
Mean Time to Repair (MTTR)
MTTR is the average length of time it takes to fix a problem that caused an alarm state on a monitored element.
Parameter set
A parameter set is a logical grouping of parameters. Each parameter set represents an independent component of an infrastructure element. For example, the operating system parameter sets contain various general system performance and resources parameters, such as CPU usage, virtual memory and physical memory.
How PATROL Express Remotely Monitors Infrastructure
The different components of PATROL Express
PATROL Express consists of the following two main components.
Service Integration Portal (SIP)
The SIP is a Web-based portal application that resides at the IT organization’s or service provider’s data center. The SIP is where end users configure elements, organize elements into services, view reports and set up notifications.
Remote Service Monitor (RSM)
The purpose of the RSM is to collect performance data and relay it to the SIP. It is downloaded from the SIP and installed and deployed in the user’s network. Once installed, it will remotely monitor elements that are configured at the SIP. The RSM remotely (agentless) monitors elements using a set of industry-accepted protocols.
Dedicated versus shared RSMs
A dedicated RSM monitors the elements of only one PATROL Express account. A shared RSM monitors the elements of multiple accounts. An end user can create a dedicated RSM, while a PATROL Express administrator can create a shared RSM.
As seen in the following graphics, a user account can use any of the following:
Users can perform the following:
The diagram above – one of many possible deployment options for PATROL Express – shows how data is sent between RSMs and the SIP.
The diagram above shows how one account can use multiple RSMs to monitor various elements.
How parts of PATROL Express interact
Where the RSM is located
The RSM resides in the customer’s environment and must have IP addressability to the elements that it monitors. The RSM must also be able to resolve the PATROL Express SIP IP address.
The RSM runs as a service and supports a number of remote monitoring protocols. Each RSM can monitor hundreds of elements. Typically, the RSM will retrieve parameters from an element once a minute, which is known as the monitoring interval. The RSM also has a manager application that resides in the Microsoft Windows system tray, enabling the user to easily stop and start the RSM as well as obtain diagnostic and communication information.
The RSM’s manager application, which resides in the Windows system tray, enables the user to easily stop and start the monitoring application as well as obtain diagnostic and communication information.
How the RSM communicates with the SIP
The RSM communicates with the SIP using HTTP and HTTPS. The data is compressed and encrypted prior to being sent. The RSM initiates all communication between the SIP and the RSM. These communications fall into one of the three following categories:
- Verifying RSM-to-SIP communication – The RSM tries to connect to the SIP every minute to ensure there is a connection between the components. This is known as the heartbeat.
- Forwarding alarm or warning exceptions to the SIP for processing – When the RSM detects a threshold violation, it will immediately send this data to the SIP.
- Forwarding parameter data to the SIP for processing service reports – The RSM sends collected performance data at user-defined intervals. These intervals are set at element configuration time (5, 10, 15 or 30 minutes). This data is used for service and history reporting.
Security
PATROL Express was designed with security in mind. As previously mentioned, all traffic, including log on credentials between the SIP and the RSM, is compressed and encrypted using HTTPS. In addition, the following precautions have been taken:
- The SIP uses Secure Socket Layer (SSL) IDs with strong encryption
SSL IDs enable the world’s strongest SSL encryption with both domestic and export versions of Microsoft and Netscape browsers
SSL is the standard for large-scale online merchants, banks, brokerages, health-care organizations and insurance companies worldwide- Users are required to authenticate using their user IDs and passwords to access the SIP
- Portal and element credentials are stored encrypted in the database of the SIP
RSM monitoring protocols
The following table lists the type of data PATROL Express collects and the protocols used by the RSM to collect that data.
Parameter sets
Below are some of the different categories of PATROL Express parameter sets. For a complete list of PATROL Express parameter sets and the parameters they monitor, please refer to the PATROL Express User Guide. You can view or download a PDF version of this document at: www.bmc.com/patrolexpress/support.
Operating systems
PATROL Express monitors basic operating system parameters, including memory and CPU and disk utilization. PATROL Express can be configured to monitor specific processes as well as how much memory and CPU a process is using. It can also monitor both Windows Event logs and text log files for user-defined messages databases.
Databases
PATROL Express takes a snapshot of database performance, which includes ensuring that the database is up and running and that it can monitor parameters such as number of transactions, lock usage and active SQL statements.
Network devices
PATROL Express monitors each interface (port) on the device to ensure it is running and reports how much data has been transmitted – and at what speed. In addition, it monitors the status of network devices, checking on availability and reporting the system description of each device.
Storage devices
When monitoring storage devices, PATROL Express focuses primarily on availability, such as up/down status and environmental systems, including fans, power supplies and temperature; asset information, such as vendor, model, serial number and firmware; and configuration, such as device capacity and number of ports.
Web transactions
PATROL Express monitors the performance and availability of Web transactions using HTTP and HTTPS. PATROL Express supports all major dynamic HTML techniques, such as JSP, ASP and CGI, and popular content types, such as Microsoft Word, PDF and plain text.
Configuring Web transaction element
Teaching
From the SIP, customers teach PATROL Express the transaction, or path, they want it to monitor.
Teaching is a three-step process for the user:
1. Log on to PATROL Express and select Add an Element.
2. Provide PATROL Express with the starting URL of the transaction to monitor.
3. Click through the path of Web pages to be monitored.
Note: teaching is an online process. No software needs to be downloaded to teach PATROL Express a path of Web pages to monitor.
During the teaching process, requests made by the user go through the SIP (for preprocessing) before going out to the Web site to be monitored. PATROL Express records all the HTTP and HTTPS requests made when the element is taught. After the request is processed at the site, the returned information passes through the SIP before appearing in the user’s browser. Later, when PATROL Express monitors the site, it will simulate the requests to appear as if a user is making them.
It takes just seconds for someone to teach or reteach PATROL Express a transaction. An element can be taught unlimited times. Once teaching is complete, the user selects which locations (RSMs) from which to monitor, then PATROL Express deploys the transaction and begins monitoring almost immediately.
Diagnostics
PATROL Express provides two types of diagnostics for Web pages. First, customers can run trace routes from the RSMs to diagnose network-related problems. Second, customers can request an object-level breakdown for a specific URL. This breakdown shows the download time for all Web-page components such as images and HTML.
Customers can run trace routes from the RSMs to diagnose network-related problems.
Customers can request an object-level breakdown for a specific URL.
PATROL Express Service Measures Reports and Status Views
Service Measures reports
Service Measures reports show the availability and mean time to repair (MTTR) for accounts, services and elements. Both availability and MTTR are measured against defined service level objectives. Users can select options to compare multiple services and multiple elements in either Availability or MTTR reports.
The Service Measures report above is displaying availability by account.
Account reports
The following are charts at the account level of the Service Measures tab.
- Availability reports
Availability – This chart shows the average percentage of time that all elements were free of any critical alarms.
Availability vs. Goals – This chart shows the average percentage of time that all elements met or exceeded their individual availability goals.- MTTR reports
Mean Time To Repair Critical Alarms – This chart shows the average time that critical alarms lasted. The timing stops when a critical alarm has been completely fixed, or if it drops to a lower alert state, such as a warning.
Mean Time To Repair vs. Goals – This chart shows the average percentage of time that the MTTR of all the elements met or exceeded the MTTR goals of the individual elements.Service reports
The following are charts at the service group level of the Service Measures tab.
- Availability reports
Availability – This chart shows the average percentage of time that all elements in this service were free of any critical alarms.
Availability vs. Goals – This chart shows the average percentage of time that all elements in this service met or exceeded their individual availability goals.- MTTR reports
Mean Time To Repair Critical Alarms – This chart shows the average time this service’s critical alarms lasted. Timing stops when a critical alarm is fixed, or when it drops to a lower alert state, such as a warning.
Mean Time To Repair vs. Goals – This chart shows the average percentage of time that all elements in this service met or exceeded their individual element MTTR goals.Infrastructure element reports
The following are charts at the element level of Service Measures.
Web transaction element reports
- Availability – This chart shows the percentage of time this element was free of any critical alarms.
- Mean Time To Repair (MTTR) – This chart shows the average time critical alarms lasted. Timing stops when a critical alarm is fixed or when it drops to a lower alert state such as a warning.
- Path Time vs. Goals (for Web pages) – This chart shows the percentage of time this element's path was completed faster than, or equal to, the defined path-time goal.
- Page Time vs. Goals (for Web pages) – This chart shows the percentage of time the pages (steps) were downloaded faster than, or equal to, the defined page-time goal.
- Diagnostic charts
Total path time
Slowest five steps, fastest five steps
Page time
Page component (DNS, first byte, resources)
Shown as averages or for specific locations (RSMs)Critical alarm formulas
Both infrastructure and Web transaction elements have critical alarm rules, which enable customers to specify the conditions that must be met for an element to be considered in critical alarm status. Critical alarm rules offer more granular control on what conditions constitute an alarm.
Critical alarms for Web transaction elements
For Web transactions, customers can specify the number of locations that must detect an outage before it is considered a critical alarm. For example, customers may create a rule that requires only one location (RSM) to detect an outage before the element is considered in critical alarm. Alternatively, customers may choose to create a rule that requires three or more locations (RSMs) to detect an outage. By allowing this type of flexibility, customers can prevent sporadic network outages at single locations from affecting the overall availability. Additionally, notification rules can be set up to send alerts only when critical alarms are detected. This significantly reduces the likelihood of false alarms, which are also known as false positives.
Critical alarms for infrastructure elements
For infrastructure elements, the customers specify how long a parameter must exceed the specified threshold. This is known as the go critical duration. An element will only be in critical alarm if the threshold has been violated for the specified time. This functionality helps prevent false alarms because it requires a problem to be detected for a sustained amount of time. For example, a CPU utilization may spike up only momentarily when an application is loading and then return to its normal range. Critical alarms allow customers to prevent false alarms in situations such as this one.
The following are examples of critical alarm rules.
Current Status views
Current Status views show a quick snapshot of the current services and elements. Possible states include: critical alarm, alarm, warning and normal.
Critical alarm, alarm and warning exceptions within elements are rolled up to the account and service level. This allows the user to quickly see if problems exist for the account. For example, if one parameter for one element detects a warning condition, yet the remaining parameters for other elements do not detect any problems, the status view of the service is “warning.” Another example is if an element is in critical alarm, then the service containing that element will also be in critical alarm.
As seen in the following graphic, current status views for the accounts and services have “quick view” graphic bars that display the overall health of the account or service. This enables IT organizations and service providers, as well as their customers, to quickly see if there are problems, and to determine the severity of those problems.
Status views, which have quick-view graphic bars that display the overall account/service health, enable IT organizations and service providers, as well as their customers, to quickly see if there are problems, and to determine the severity of those problems.
This status view provides a picture of how Web transactions are performing at various locations.
Alerts
PATROL Express includes flexible alerting rules. An alert rule consists of whom to notify, how to notify them, what elements and services they are responsible for and when to notify them.
Users can set up notifications based on the types of alerts and for different services or elements. Some examples include:
In addition, PATROL Express enables users to create an escalation hierarchy for notification within a department or company. For example, a company that uses PATROL Express may want one of the members of its IT department to be notified about a problem immediately. However, if after 30 minutes the alert still occurs, the IT manager would be notified.
Integrating PATROL Express with Advanced PATROL Solutions
The following are three methods for integrating PATROL Express with advanced PATROL solutions:
Enterprise integration support
Enterprise integration enables the administrator to configure PATROL Express to send all PATROL Express alerts to the PATROL Enterprise Manager or another enterprise management console that accepts SNMP traps. PATROL Express sends SNMP traps that conform to the SNMP version 1 format. The trap information includes customer account information as well as information about the cause of the problem.
PATROL data into PATROL Express
To integrate data from advanced PATROL solutions into PATROL Express, PATROL Express collects PATROL Knowledge Module (KM) data at defined reporting intervals. PATROL Express supports PATROL 3.3 and above and includes support for the following KMs: Windows, Unix and Oracle. Support for other KMs can be added using the Parameter Set Editor.
PATROL Express data into PATROL
To integrate PATROL Express data into advanced PATROL solutions, the PATROL Express KM collects data from the PATROL Express SIP. Once data is in the PATROL Express KM, it can be leveraged by the PATROL family of products. The PATROL Express KM communicates with the SIP via HTTP or HTTPS protocols, and can access that data virtually anywhere across network boundaries. Users with PATROL running in one network and PATROL Express in another can exchange information between the two solutions, as long as they have access to the SIP. In summary, this integration allows users to monitor and manage what PATROL Express is monitoring – via a single PATROL Console.
The figure above shows PATROL Express in the management architecture of advanced PATROL solutions.
Administrative Support
PATROL Express has both customer accounts and administrative accounts. Typically, the customers of IT organizations and service providers have customer accounts, while system administrators and service providers have administrative accounts.
Administrators have special authorities. Logging on to PATROL Express as an administrator provides users with access to tools to perform the following tasks:
Account list
Logging on as a PATROL Express administrator allows users to immediately identify which customer accounts are in alarm or warning status. From the Account List screen, administrators can identify a problem with a specific account and then log on to that account to further investigate the cause of the problem.
Access control
Access control support enables administrators of PATROL Express to support different types of users. Also, administrators can set access control for each account. The following are the three levels of access that can be set by administrators:
Administrators can immediately identify which customer accounts are out of compliance with their service level objectives and which accounts are in alarm or warning status.
Parameter Set Editor
In addition to using the default parameter sets to monitor the infrastructure elements in an account, PATROL Express administrators can now create custom parameter sets. The Parameter Set Editor (PSE) provides an interface that guides the administrator through the required steps of adding parameter sets that publish monitoring data using PerfMon, SNMP and PATROL. Once the customized parameter set has been created it can be used by anyone in that system to monitor elements.
Logo and style sheet support
- Logo – Administrators can specify an image file to display in the upper left and right corners of each page, enabling the organization or service provider to display its corporate identity in the PATROL Express system.
- Style sheet – The style sheet controls the fonts, colors and other page attributes for the PATROL Express user interface. By editing the default style sheet, an administrator can customize PATROL Express to reflect the company’s corporate image. This feature allows administrators to change the appearance of the user interface by specifying a default or customized style sheet for PATROL Express.
Usage report
The usage report contains information that service providers can use to invoice PATROL Express customers. This report contains account usage details. Account usage consists of the number of elements and parameter sets being monitored, listed by account.
Deployment Options
RSM failover
Multiple physical machines may be clustered to create one logical RSM. That way if one RSM fails, the others will handle the load. Refer to the PATROL Express User Guide, which can be viewed or downloaded at www.bmc.com/patrolexpress/support, for more details on how to configure this.
RSM requirements
The following are the minimum system requirements for installing a dedicated RSM:
Service Integration Portal (SIP) architecture
The SIP can be deployed in a variety of configurations, depending on customers’ availability needs. For a simple, low availability solution, the SIP can be deployed entirely on one machine. If higher availability is needed, then the SIP can be deployed on multiple machines.
The underlying architecture has three main components: a Web server, an application server and a database server. In a multi-machine configuration, each of these components can be deployed on a separate computer, and multiple application servers and Web servers can be deployed in parallel. If one machine fails, then the other machines will handle the additional work. For scalability, the database may be deployed on a Unix cluster, allowing for failover. Once customers determine their availability needs, they can determine the configuration that is best for their needs.
Minimum system requirements for SIP
The following are the minimal system requirements for running the SIP on a single Windows 2000 server.
Summary
PATROL Express status monitoring proactively identifies and quickly resolves substandard service performance. PATROL Express offers the ability to create and measure service groups easily, comparing them with service level objectives. It pinpoints problem areas and provides notification via pager and email. Ultimately, PATROL Express reduces the complexity of deploying a management solution while providing the necessary service level monitoring to keep your business running.
BMC Software
BMC Software, Inc. [NYSE:BMC], is a leader in enterprise management. The company focuses on Assuring Business Availability® for its customers by helping them proactively improve service, reduce costs and increase value to their business. BMC Software solutions span enterprise systems, applications, databases and service management. Through its Business Service Management strategy, the company’s solutions enable customers to have a complete view of their business and IT operations by linking IT resources to business objectives. Founded in 1980, BMC Software has offices worldwide and is a member of the S&P 500, with fiscal year 2003 revenues of approximately $1.3 billion.
| 29122 07/03 |