Db2 Luw 10 - Capture and access section actuals skill builder

This skill builder is going to discuss the capture and access of section actuals. We're going to introduce the concepts of collecting the section actual statistics during SQL execution. We'll see how we can create an activities event monitor to capture the access plan section actual statistics. One of the things we are going to need to do is to configure our database to enable section actual collection. We're going to invoke a stored procedure called EXPLAIN_FROM_ACTIVITY that generates the data in explain tables from the Activities Event Monitor data. We're going to use DB2exfmt, our explain tool to format that report that contains both estimated and section actual statistics. Normally, we look at access plans and explain reports and they contain just estimated cardinalities. The optimizer creates the access plan based on catalog statistics and it estimates the cardinality for processing the SQL statement for each operation. If those estimates are inaccurate, for any of a variety of reasons, then the access plan could be inefficient. Here we will look at an example of a portion of an access plan. The access plan included an index scan. The table had 513,000 rows. The index scan applied one predicate to reduce the result to 107,000. When the rows were fetched, another predicate was applied, it was expected to reduce the results set size to 21,000. The optimizer chose the method of accessing the tables based on these cardinalities. Now, we can use an activities event monitor to capture the access plan section actuals and view the actual statistics from runtime in an explain report. So the optimizer still prepares the access plan prior to execution. It does it based on the catalog statistics. It has the estimated cardinalities in it. We are going to have an activities event monitor running that captures runtime metrics and includes the section actual statistics. Once that information is stored in our activities event monitor tables we can run the EXPLAIN_FROM_ACTIVITY procedure to populate data into our explain tables. Once that data is in the explain tables, we can run db2exfmt to create a report that includes both the actual and the estimated statistics. So here's that same section of an access plan. Now we see two numbers above each operation. We have the estimated number which was what we saw on the original report. But now because this has the section actual statistics, we see how many rows were proccessed by that operation at runtime. So, the table had 513,000 rows. The index scan, which we only expected to contain 107,000 matching rows actually had 357,000. So it is dramatically larger. The fetch operation that applied the predicates to the data row reduced the result set, but now the result is 71,000 rather than 21,000. So what we see in our demonstration is that we will run through these stats. We might figure out why we have this difference in statistics. In order to generate section actuals into an explain report, we have to set up the database environment. I've already done the steps of updating the database configuration. There's a parameter called section_actuals that was set to base. I've decided to create a new service class called batch_sc I created a workload called batch_work for the program db2batch. I'm going to use db2batch to actually run the SQL statement. Now, I could have triggered the collection of the activity data here at the workload level. I decided instead to use a threshold, so that only statements that exceed an estimated cost of 600 will actually generate activity data. So, there are lots of ways you can create thresholds and various ways to filter and limit the generation of the activity data. It is key that I include the details and section options, because that sets the environment. But I need activity event monitor. By creating a new activity event monitor called wlmactivity which ends up creating a set of tables to actually store the activity related statistics and information. So, that's all been done. In the command line, I've connected to the database. I'm going to run a script that executes db2batch. It executed the SQL statement and produced the result. Because of the activities event monitor, that data was stored. Now, I need to find out some information about the entry in the activities event monitor data. I'm going to run a query. The query goes against the event monitor table; activity_wlmactivity It returns the application ID that created the entry, the unit of work id and the activity id. I'm going to copy some of this information. I have to ask for a very specific entry when I invoke procedure explain from activity. So I have that call statement. The call statement has multiple parameters. The first is the connection ID and then the unit of work ID and the application id. Insert the new timestamp into this call. Save that. And now I'm going to run the stored procedure. The stored procedure ran. It found the entry in the activity event monitor data and moved the information into my explain tables. Now I also ran the explain tool db2exfmt and saved the output of that. So now, I can look at that explain report. See the SQL statement. These are the predicates that were in the SQL statement. We see the estimated cost. And now in the explain report I see two counts. I see an estimated count and an actual count for each operation. Go down to ... A key point is when we're accessing this table hist2 which has 513,000 rows there was an index scan that was expected to produce a 107,000 entries. It actually produced 357,000 entries. So there's something in that search that produced a much larger result than was expected. The fetch applied another predicate. It was supposed to reduce the result to 21,000 but now having 357,000 entries going into it, it produced a result set of 71,000. In this example, the reason for this difference between the estimate and actual statistics had to do with non-uniform distribution of the rows. So, the table had basic statistics. It was expecting that there were 100 branches of data so it assumed that each branch would have one percent of the data. It turned out that the range of branches that were included in the query was much more data from those branches than existed from other branches and therefore the results set was quite different. So what I was able to do is to resolve that by collecting new statistics with distribution statistics. So, I actually used this run stats command to get distribution statistics on the branch ID column. It included frequent values and quantiles. I ran that. Then re-explained the SQL statement to see if we were able to impact the access plan. So I did that, and produced another explain report. This is the standard explain, it just has the estimated cardinalities. You can see down here, a couple things have changed. In this area, the optimizer now realizing the result set is going to be much larger in size decides to do a table scan. So rather than using an index for that larger result, DB2 accesses the 513,000 rows using a table scan to produce the 68,000 rows. So, DB2 had a better understanding of the distribution of the data, and understood the predicates better and thereforre was able to produce an access plan that it thought would be more efficient to produce the result. If you would like to deepen your understanding DB2 LUW 10, particularily in areas of performance, we offer several courses CL413 is our DB2 LUW Performance Tuning and Monitoring Workshop for DB2 10.1 CL443 is the Performance Tuning and Monitoring for both Single and Multiple Partitioned databases. We offer these courses in a traditional classroom. There is a self-paced virtual class, as well as web-based training. You might also consider some of the certifications. We have a Certified Advanced Database Administrator for DB2 10.1 for LUW.