Tip:
Highlight text to annotate it
X
This skill builder is going to discuss the capture
and access of section actuals. We're going to introduce the concepts
of collecting the section actual statistics during SQL
execution. We'll see how we can create an activities event monitor
to capture the access plan section actual statistics.
One of the things we are going to need to do is to configure our database to enable
section actual collection. We're going to invoke a stored procedure
called EXPLAIN_FROM_ACTIVITY that generates the data in explain tables
from the Activities Event Monitor data. We're going to use DB2exfmt,
our explain tool to format that report that contains both
estimated and section actual statistics.
Normally, we look at access plans and explain reports
and they contain just estimated cardinalities. The optimizer
creates the access plan based on catalog statistics
and it estimates the cardinality
for processing the SQL statement for each operation.
If those estimates are inaccurate,
for any of a variety of reasons, then the
access plan could be inefficient.
Here we will look at an example of a portion of an access plan.
The access plan included
an index scan. The table
had 513,000 rows. The index scan
applied one predicate to reduce the result to
107,000. When the rows were fetched,
another predicate was applied, it was expected to reduce the results set size to 21,000.
The optimizer chose
the method of accessing the tables based on these cardinalities.
Now, we can use an activities event monitor to capture
the access plan section actuals and
view the actual statistics from runtime
in an explain report. So the optimizer still
prepares the access plan prior to execution.
It does it based on the catalog statistics. It has the estimated cardinalities in it.
We are going to have an activities event monitor running that captures runtime
metrics
and includes the section actual statistics.
Once that information is stored in our activities event monitor
tables we can run the EXPLAIN_FROM_ACTIVITY
procedure to populate
data into our explain tables. Once that data is in the explain tables, we can run
db2exfmt
to create a report that includes both the actual
and the estimated statistics. So here's that same
section of an access plan. Now we see two numbers above
each operation. We have the estimated
number which was what we saw on the original report. But now because this
has
the section actual statistics, we see how many rows were proccessed by that operation at
runtime.
So, the table had 513,000 rows.
The index scan, which we only expected to contain 107,000
matching rows actually had 357,000.
So it is dramatically larger. The
fetch operation that applied the predicates to the data row
reduced the result set, but now the result is 71,000
rather than 21,000. So
what we see in our demonstration is that we will run through these stats.
We might figure out why we have this difference
in statistics.
In order to generate section actuals
into an explain report, we have to set up the database environment.
I've already done the steps of updating the database configuration.
There's a parameter called section_actuals that was set to base.
I've decided to create a new service class
called batch_sc
I created a workload called batch_work for the program db2batch. I'm
going to use db2batch to actually run the SQL statement.
Now, I could have triggered the collection of the activity data here at the workload
level.
I decided instead to use a threshold,
so that only statements that exceed an estimated cost of 600
will actually generate activity data.
So, there are lots of ways you can create thresholds and various ways to filter and limit
the generation of the activity data.
It is key that I include the details and section options,
because that sets the environment. But I need
activity event monitor. By creating a new activity event monitor
called wlmactivity
which ends up creating a set of tables
to actually store the activity related statistics and information.
So, that's all been done.
In the command line, I've connected to the database.
I'm going to run
a script that executes db2batch.
It executed the SQL statement and produced the result.
Because of the activities event monitor, that data was stored.
Now, I need to
find out some information about the entry
in the activities event monitor data.
I'm going to run a query. The query
goes against the event monitor table;
activity_wlmactivity
It returns the application ID
that created the entry, the unit of work id and
the activity id. I'm going to copy some of this information.
I have to ask for a very specific entry
when I invoke procedure
explain from activity. So I have that call statement.
The call statement has multiple parameters. The first is the
connection ID and then the unit of work ID
and the application id.
Insert the new timestamp into this call.
Save that.
And now I'm going to run the stored procedure.
The stored procedure ran. It found the entry
in the activity event monitor data
and moved the information into my explain tables.
Now I also ran the explain tool
db2exfmt and saved the
output of that.
So now, I can look at that explain report.
See the SQL statement. These are the predicates that were in the SQL
statement.
We see the estimated cost. And now in the explain report I see
two counts. I see an estimated count
and an actual count for each operation. Go down to ...
A key point is when we're accessing this table hist2 which
has 513,000 rows
there was an index scan that was expected to produce a 107,000
entries.
It actually produced 357,000
entries. So there's something in that
search that produced a much larger result than was expected.
The fetch applied another predicate.
It was supposed to reduce the result to 21,000 but now having
357,000 entries going into it,
it produced a result set of 71,000.
In this example, the reason for this
difference between the estimate and actual statistics
had to do with
non-uniform distribution of the rows.
So, the table had basic statistics.
It was expecting that there were 100 branches of data
so it assumed that each branch would have one percent of the data.
It turned out that the range of branches that were included in the query
was much more data
from those branches than existed from other branches
and therefore
the results set was quite different.
So what I was able
to do is to resolve that
by collecting new statistics with distribution statistics.
So, I actually used this run stats command
to get distribution statistics on the branch ID
column. It included frequent values and quantiles.
I ran that. Then
re-explained the SQL statement
to see if we were able to impact the access plan.
So I did that, and
produced another explain report.
This is the
standard explain, it just has the estimated cardinalities.
You can see down here, a couple things have changed.
In this area, the optimizer now
realizing the result set is going to be much larger in size decides to do a table scan.
So rather than using an index for that larger result, DB2
accesses the 513,000 rows using a table scan to produce the 68,000 rows.
So, DB2 had a better understanding
of the distribution of the data, and understood the predicates better
and thereforre was able to produce an access plan that it thought would
be more efficient to produce the result. If you would like to deepen your understanding
DB2 LUW 10, particularily in areas of performance,
we offer several courses CL413
is our DB2 LUW Performance Tuning and Monitoring Workshop
for DB2 10.1
CL443 is the Performance Tuning and Monitoring
for both Single and Multiple Partitioned databases.
We offer these courses in a
traditional classroom. There is a self-paced
virtual class, as well as web-based training.
You might also consider
some of the certifications.
We have a Certified Advanced Database Administrator for DB2
10.1 for LUW.