Tuning SQL Using Feature Engineering and Feature Selection - Part III Addendum

In Part III, I covered details on the Normal Ranges, but left out the screenshots (and discussion) on running the normal_ranges sub-query.

The screenshots below are the output of running the simple select (above) on the normal_ranges with clause sub-query [subset into two sets of output, one on stat_source = ASH and another on stat_source = SQLSTAT, for my convenience taking the screenshots].

There are nearly 50 features engineered from the active session history and nearly 30 from the SQL statistics at this time [as you may recall, I plan to go back and re-jig some of the feature engineering].

A few things to note on the numeric columns – the values are computed for each metric across all snapshots (and ASH samples) from the normal interval date/time range specified for this run of the code:

· Lower Bound: this is the lower bound of the normal range.

· Average Value: this is the average value of the metric.

· Upper Bound: this is the lower bound of the normal range.

· Standard Deviation: this is the square root of the variance, which is basically another measure of the degree of dispersion in the data. When the standard deviation is low, then the metric values are close to the average. When the standard deviation is high, then the values for that metric are much more widely dispersed.

So, again, these normal ranges are used to flag anomalies in metric values that come from the problem interval. As previously stated, the feature selection [i.e. reducing the set of relevant features] method I use a form of one-hot encoding: assigning a 1 of the value is above the normal range and a zero if it is below the normal range. After that, it’s a simple query to report the flagged metrics and their values and order by the metrics with the most anomalies, and how far off from normal the values they were [to be covered in a future post].

Hopefully you can see that by unpivoting the columns into rows and using other feature engineering techniques, I can easily scale my analysis to many metrics. With ASH and SQLSTAT there are not as many metrics compared to the tens of thousands found in other database instance wide metrics embedded in many AWR views, but still there are about 80 metrics instrumented thus far, and it would be very difficult to do column-by-column comparisons when using the pivoted multi-column-table version of the AWR views. The intent of the DOPA approach is to make the root cause analysis of a problem easier by showing you only the metrics and their values that are outside of normal range, thus providing that laser focus on the performance issue at hand.

Roger Cornejo

Roger Cornejo has over 34 years’ experience with large/complex Oracle applications (versions 4.1.4 – 18c). Roger’s main focus is on DB performance analysis and tuning, and for the past 8 years, diving deep into AWR tuning data. He is often relied on to produce Oracle Database tuning results across 12c/11g/10g (and occasionally 9i) databases. As a thought leader, he has been sought out for his expertise in tuning (presenter at the past 8 East Coast Oracle Conferences, as well as COLLABORATE14 and COLLABORATE18, RMOUG16, and Hotsos 2017-2018). Additionally, Roger authored a book on his recent work, Dynamic Oracle Performance Analytics: Using Normalized Metrics to Improve Database Speed, The book is available through Amazon and Apress: http://www.apress.com/9781484241363

Linked in: https://www.linkedin.com/in/roger-cornejo-1805642/

Twitter: @OracleDBTuning

Website

Cras mattis consectetur purus sit amet fermentum. Integer posuere erat a ante venenatis dapibus posuere velit aliquet. Aenean eu leo quam. Pellentesque ornare sem lacinia quam venenatis vestibulum.

Apr 22 Tuning SQL Using Feature Engineering and Feature Selection - Part III Addendum

Apr 25 Tuning SQL Using Feature Engineering and Feature Selection - Part IV

Apr 21 Tuning SQL Using Feature Engineering and Feature Selection - Part III

Blog by Roger Cornejo explores topics related to Oracle database.