Once the workloads have been characterized, some analysis should be performed to determine the impact of executing multiple workloads at the same time. It is possible that some workloads will not work well together and thus, when executed at the same time, will degrade each other's performance. In such cases, it is best to keep these workloads from running at the same time. The workload that consists of the maintenance programs for the warehouse should be tracked closely because of its impact on availability. While not specifically related to the issue, data marts have their place in a warehouse strategy. One factor that should be evaluated in deciding to establish marts is the ability to segregate workloads across multiple data marts and thus mitigate the instances of competing workloads that degrade performance.
One issue to address in a warehouse environment is whether there will be uniform workloads or whether all work will be unique. Some companies may find queries that are executed on a regular basis and thus can be characterized as a workload. Others may find the dynamic nature of the environment very difficult to characterize. This will be addressed in more detail later. The performance analyst can get help with pattern matching from end users while trying to determine the workload characteristics of the company's warehouse.
Workload arrival rates and how those arrival rates can be influenced must be combined with the workload characterization. Queries against a warehouse are not driven by customers transacting business with the company, but rather by users who are searching for information to make the business run smoother and be more responsive to its customers. Therefore, the timing of these queries can be under some control. Typically, this control will work best when it is integrated into the warehouse service-level agreements. This will make control a part of the agreement between IS and the users of the system. For example, this control might be structured by setting different response time goals for different workloads or groups of users based on day of the week or hour of the day. This would not guarantee that the work would arrive at certain times, but it would encourage submission of workloads at different times.
Sometimes availability is overlooked in the evaluation of performance. If a system is unavailable, then it is not performing. Therefore, the ability of the platform to deliver the required availability is critical. Some might question the need for high availability of a warehouse compared to the availability requirements of an operational system. A warehouse may, in fact, need 24x7 availability. Consider that queries against a warehouse will have to process large volumes of data, which may take hours or perhaps days. Longer outages might be tolerated by an operational system if they are planned around user queries, but unplanned outages at the middle or end of a long running query may be unacceptable for users.
In addition, the more a company uses a warehouse to make strategic business decisions, the more the warehouse becomes just as critical as the operational systems that process the current orders. Many have argued the value of a warehouse to project future buying patterns and needs to ensure that the business remains competitive in a changing marketplace. These decisions affect whether there will be future orders to record in an operational system.
If a tool generates the queries, this will be an opportunity to determine how much control you have within the tool to generate efficient SQL. This may take some time and effort to work with the vendor of the tool and to work with some level of SQL EXPLAIN information to achieve the best SQL for your system. If all of the benefits gained from that design in the SQL queries are not used, then the long hours spent designing the schema of the warehouse to meet the users' requirements will have been wasted.
There are many choices of platforms on which to run the data warehouse. Performance is sometimes a consideration when making this choice, but not always. If performance of the warehouse is not a major consideration, then service-level agreements for the warehouse will have to be adjusted based on the ability of the platform to deliver service.
Service-level agreements for a data warehouse may need to be more fluid than those established for an operational system. An operational system may have a requirement that 90% of all transactions complete in less than one second. This is easy to track, report and understand. However, in a data warehouse system, there may be a requirement to produce an answer within one second for every 10,000 pages processed by the query. Metrics that can be obtained without traces are important because of the overhead of DB2® traces. While the user may require some education on the metric, this allows the system's administrator to have more control over the delivery of service and the resources needed to deliver on service-level agreements.
Service-level agreements need to be adjustable after implementation rather than rigid. The goal of the data warehouse is to deliver value to the business. Value is delivered in terms of more business, growth in new business areas or reduced costs for the business. When this is successful, the company executives will want more. This will generate an increase in warehouse activities that will strain the initial resources. Service-level agreements along with capacity planning information can pave the way for warehouse growth as it proves its value to the company and expands its mission.
In addition, time-sensitive information may occasionally be required for business opportunities that exist for only a short time. The ability to adjust resources to meet specific workloads can have an impact on the attainment of other service-level agreements. Because of these adjustments and the fluid nature of the SLA system, reporting of attainment should have some level of prioritization contained within the reporting structure. This allows for the recognition that some workloads for brief periods were considered less important. This reduces the impact of missing the SLA for these workloads. Furthermore, within the SLA reporting structure, the emphasis should always be on percentages of attainment, of resource utilization, and so forth because the actual resources and response times could and probably will change over time.
An example of competing workloads is when two queries are executed at the same time against two different databases, but the underlying tables reside on disk drives that are attached to the same controller on the same physical channel to DASD. Those two queries will be competing for the same resource a channel to the disk drives. Queuing will begin to occur and wait time will be inevitable. In computing systems, wait time is the enemy of performance.
Another example of a competing workload is when two queries are executed at the same time against the same table. One is a well-refined query that uses the available indexes to retrieve a relatively small result set and the other is a broad query that scans certain partitions to generate a large result set. In this case, the synchronous I/O for the random reads for the well-refined query will more than likely have to wait for the asynchronous I/O of the poorly defined query. This happens because of the number of pages retrieved by the asynchronous I/O and the time required performing that operation. Again, the enemy of performance is wait time, and in this case, the unfortunate loser is the person with the well-constructed SQL statement.
The purpose of workload characterization typing is to define the resource requirements so that workloads that compete for resources can be executed at different times.
Some might be concerned about tracking individual SQL statements and might fear that the amount of data would be overwhelming. In a warehouse environment, the number of statements executed daily is significantly smaller, and the majority of these statements run much longer than statements from an OLTP application.
It would be impossible to list all the metrics involved in workload characterization of a data warehouse. However, the object names used to produce a query's result set are important. If multiple queries access the same objects sequentially, then the size of the buffer pool for the accessed objects could determine the nature of these workloads as being competing or complementary. If the buffer pool(s) were large enough to contain a high percentage of the object's pages, then this workload could be very complementary.
However, if the pool contained a small percentage of the object's pages, then this workload could compete for the scarce buffer pool resource quickly. In this case, the important metrics are the object name, the average number of concurrent queries, the buffer pool for the named object, the size of the buffer pool and the average and maximum utilization of the buffer pool. These pieces of information would allow the user to determine if the workloads were complementary or competing.
For operational systems, the amount of data would be larger than any data warehouse. Additionally, the difference that event data would make in the analysis process would be small compared to the difference in cost to achieve that accuracy. Companies will not collect event data for operational systems because of the cost. However, the amount of information that would be collected for a data warehouse would be much smaller. Therefore, collecting event data for a data warehouse would not be out of the question.
This would not be true of all types of event data concerned with the operation of a data warehouse. Input/Output (I/O) operations occur so frequently and are of such a small duration that collection of this type of event data would be out of the question. The time required to record the event could exceed the time of the event itself. When this happens, the overhead of maintaining the event data would greatly increase the overhead of operating the data warehouse and would not be justifiable. However, the collection of detailed event data for every query executed against the data warehouse would be a viable option since the number of daily queries would be small compared to the queries executed against an operational system.
Therefore, the metrics of analysis will probably include both interval data and event data. With that recognition comes the difficult task of organizing and evaluating the data so that the interval data can be matched to the event data in a coherent fashion. For example, accounting statistics are collected on every query against the warehouse; I/O statistics are accumulated on intervals of 15 minutes; buffer pool statistics are accumulated on intervals of 15 minutes. If a normal query executes in less than 15 minutes, how can the analyst take into account the interval data when the query was not executing during the entire interval? What happens when the query lasts 15 minutes, but spans two 15-minute intervals because it started in the middle of one and finished in the middle of another? These questions highlight the issues associated with the attempt to mix interval and event data.
Event data needs to have as much detail associated with the event as is possible to collect without incurring unacceptable overhead. This data can then be used to tune the performance of specific queries in the data warehouse and to characterize the queries for workload analysis. As an example, event data associated with a particular query should have all the information concerning that query's use of DB2 buffer pools, getpage requests and I/O events to avoid trying to extract that information from an interval record.
Interval data is extremely valuable in judging the health of a data warehouse. It allows for trending, so the analyst can see where the warehouse has been and have a good idea of where the warehouse is going. Interval data can provide for tuning of a DB2 buffer pool for system-wide activity. After this, event data can be used for minor adjustments to additional information that can lead to moving objects from one pool to another. Interval data can be valuable in pattern matching. If specific events occur repeatedly and have the same impact on the system, they will be easier to notice with interval data because the analyst is working with smaller amounts of data that will report over longer time frames. This is especially true if the events only repeat once a month or once a quarter.
In a warehouse system and because of the small number of updates, these will not be as important as the pool usage. Likewise, locking conflicts are less important in a warehouse environment. Locking conflicts are greatly reduced because of the small number of updates, the bind and SQL options that allow queries to read through locks and the ability to place an object in read-only status for specific periods of time.
In a warehouse environment, the single task of system-wide tuning, with a chance for significant changes in performance, should concern itself with DB2's use of memory in its four types of pools. This tuning should provide DB2 as much memory as possible without generating excessive overhead through MVS paging. Depending on the version of DB2 used, the performance analyst will have options, such as data spaces for buffer pools and the use of ESO hyperspace. The greatest challenge to this tuning is that three of the pools have no external controls through DB2 commands, and the one that has an external interface through DB2 commands provides little assistance. The user must know exactly what to change in the buffer pools to improve performance.
The traditional approach to buffer pool tuning has been to collect detailed information for a short time, like 15 to 20 minutes, at a peak period, and then make buffer pool adjustments to get optimum performance when the workload matches that time interval. The difficulty with this approach is that DB2 seldom matches the same workload used to create this optimum environment. As a result, the performance gains from this type of tuning are not always ideal. Part of the problem with this tuning methodology is the changing workloads based on query execution that DB2 is trying to manage.
No one has yet to accept the challenge of tuning DB2, but the optimum performance from a DB2 warehouse environment can only be achieved when the buffer pools are dynamically adjusted through DB2 commands based on the workloads that are executing. This is true for any DB2 subsystem, not just one that is running a warehouse. There are three elements to tuning:
Other recommendations include keeping all system objects in BP0 and establishing a separate BP7 for all sort work objects (DSNDB07). Depending on the amount of virtual storage that is available, further separation can be obtained by placing indexes and data in different pools. Even greater separation can be achieved by separating scanned indexes from probed indexes. These recommendations apply to tablespace objects as well.
If virtual storage were truly boundless, then all pools would be sufficiently large so that no work would ever be constrained by the size of the pool. Unfortunately, we are not there yet; therefore, there is some value to adjusting pool sizes based on the objects that have a high access pattern.
Prefetch can be called for at bind time if the access path chosen is a sequential scan of a table. It can be dynamically invoked if DB2 determines the query is processing the data sequentially. It can also be invoked to sort a long list of row identifiers for prefetch when the application requires a large number of rows from a table that are not necessarily in exact sequence. These operations benefit the system by reducing wait times; however, they can also have a negative impact on the system if, and when, these operations fill the pool with data pages.
When the pool is filled, DB2 has to ensure that other functions, such as synchronous I/O operations, can be performed for running queries. These thresholds allow DB2 to execute prefetch operations as much as possible while preventing these operations from dominating the use of a buffer pool. Because DB2 maintains a log with all updates performed by an SQL statement, the system does not have to write the updated pages to DASD immediately. This allows DB2 to accumulate changes to a page in the buffer pool and then write all the changes to a page with a single I/O operation. This facility allows for a well running DB2 subsystem. When the pool becomes full, DB2 gradually begins writing pages to DASD quicker, and the benefits of a single write I/O operation for multiple updates is eroded.
Workload characterization and threshold information can be used to dynamically tune a DB2 subsystem. Workload characterization can differentiate between objects that are primarily accessed sequentially and objects that are primarily accessed randomly. By placing these objects in different buffer pools, the analyst can manipulate thresholds to either maximize or minimize prefetch activities based on the workload characterization. Also, by separating these objects, each pool can be used for maximum efficiency.
In this environment, thresholds could be set to coincide with the primary usage of the pool. However, when the objects in the pool are not following their normal access patterns, the operator could be informed of these changes and make adjustments through the Alter Bufferpool command to tailor the thresholds to match the current access patterns. When the queries that caused this not normal access pattern are finished, the operator could be informed to change the thresholds back to their original desired state.
Additionally, the use of many pools with objects separated by access patterns could put a strain on virtual storage usage by the DB2 subsystem. This could be caused by sizing the pools for efficiency at peak access times. Not all of the pools and not all of the objects in a DB2 subsystem will be accessed at peak rates all hours of the day. Therefore, to use virtual storage in a responsible manner, DB2 buffer pools would be virtual storage-friendly if they could expand and contract based on the use of objects in a particular pool.
The DB2 system administrator, in concert with the MVS systems programmer, could decide how much virtual storage to allow for DB2 buffer pools. At that point, DB2 would set the pool sizes based on some priority system to match the most politically important processing time of the day. As workload changes indicate diminished use of important objects and increased use of less important objects in the system, the operator could expand and contract pools with the Alter Bufferpool command to maintain the total amount of virtual storage allocated for DB2 buffer pools. The operator could, at the same time, redistribute that storage based on the object usage at the time. As new objects are accessed, changes could be made to create a match between the objects being accessed and the pool sizes for those objects.
The example that has been described in detail concerned the use of buffer pools. However, the same methodology could be applied to all four types of pools in a DB2 subsystem. The metrics required for this tuning effort could easily be derived from interval data. By using interval data, the cost associated with data collection would be held to a minimum, while the performance gains could potentially be tremendous.
These metrics contain information generated by the DB2 optimizer as output from the Explain process. These metrics indicate to the user how DB2 will retrieve the data to process the query. The method of access, whether it is a scan or index access, is an example of the information that could be gathered. This could be used in conjunction with the name of any index that might be used by DB2 to process the query to identify what DB2 resources will be used; by exclusion, resources which will not be used are also identified. In addition to these metrics, there are statistics that are maintained by the DB2 in the catalog concerning the objects processed to produce the query results.
Furthermore, depending on your use and collection of DB2 object statistics, metrics might be required from the actual objects that are not maintained in the catalog. This would be helpful if you have chosen not to update DB2 catalog statistics because you fear that access paths will change and cause performance degradations. Finally, SQL syntax information, such as columns used in predicates and the operators used inside those predicates, are invaluable in tuning DB2 query performance. This last set of metrics is the most difficult to collect and can be obtained from parsing the SQL text.
It is also important to understand how these metrics would be used. The access path information could be combined with object statistics to produce a list of statements that generated a large number of I/O operations to produce the result set. Each shop will have its own number to assign to that "large I/O" threshold.
Analysis of these statements might lead to changes in the predicate structure so that DB2 would consider indexes for access path selection that would have been ruled out otherwise. Analysis might also identify SQL usage patterns that were less than optimal and provide a basis for SQL training of warehouse users to avoid SQL syntax that generates excessive
I/O operations and causes poor query response time.
Collecting information concerning all queries would allow for performance analysis very similar to the business analysis performed using the warehouse itself. The value of pattern matching for predicate operators and columns used in a predicate would provide invaluable information to the performance analyst. This would allow the analyst to distinguish between broad misunderstandings of DB2 performance and SQL usage from single users who require more training to avoid poor SQL that slows down a warehouse's performance.
Unfortunately, DB2 does not provide all this information today. The access path selection information can be obtained from the mini-bind IFCID 22 for dynamic queries and from the PLAN_TABLE table for static queries that are bound with the "EXPLAIN YES" parameter coded. The object statistics can be collected from the DB2 catalog or from third- party analysis tools if the catalog is not kept current. The column names used in predicates could be obtained from the catalog after parsing the SQL text maintained in the SYSSTMT and SYSPACKSTMT table, but this would only assist in static queries and the overwhelming majority of queries in a warehouse environment are dynamic. The predicate information provides the greatest value, but unfortunately, there is no current mechanism to capture and maintain that information on dynamic queries. Therefore, the tuning of individual queries is largely left to historical analysis of thread history information and user education on good SQL coding techniques.
This information can be organized into two matrices. Both could have a list of objects, such as table and index names, and associated with each object would be the SQL statements that drive access to that object. At the statement level, the number of getpage requests, subdivided by asynchronous and synchronous requests, would be detailed. These detailed numbers would be rolled up to the object level for total object getpage requests.
The second matrix would have similar object information and a count of the number of select and update statements executed over a particular time interval. The longer the time interval is, the better the analysis will be. The update information could include a breakdown of insert, update and delete statements for more detailed analysis. From this information, index objects could be identified that generate little or no read activity and, therefore, are potential candidates for elimination.
Objects that have a high insert and delete activity could be targeted for increased free space during a reorganization to avoid frequent reorganizations. Objects that have no delete activity could be analyzed to determine the need for some archival process based on business requirements of the data. All of these activities could indirectly affect the performance of the application. Additionally, this information can have great value in reducing the DASD requirements of the warehouse. . Disk drives may be inexpensive, but they are not free.
For direct improvement of performance, these matrices would pinpoint statements that scan tables frequently and provide opportunities for tuning improvements. The Explain from DB2 might indicate that an access path using List Prefetch is being used, when the actual data shows that a List Prefetch path has been abandoned for lack of storage or because too many rows have been selected. The object/statement/getpage request matrix identifies the problem and quantifies the impact to the DB2 subsystem.
When viewing the matrix from a DB2 subsystem or data-sharing group perspective, the analyst can tune the SQL causing the greatest impact on the DB2 subsystem. By tying this information back to a specific user, the analyst can target users that might need further SQL education. This information is of greatest value when working with dynamic SQL because the only Explain of dynamic SQL is the IFCID 22 Mini-bind, which is seldom externalized through a DB2 trace.