Tuning SQL Using Feature Engineering and Feature Selection - Part IV

“I am Sir Oracle, and when I ope’ my lips, let no dog bark!” William Shakespeare, Merchant of Venice, Act 1 Scene 1.  I like this quote because it is catchy, but in actuality it is a  high expression of haughtiness; the context of the quote is a description of people who when they talk, they imagine others (even dogs) silently listening to their wisdom.  I am not Sir Oracle, but I would like to share some details on running the code that helps tuning SQL using feature engineering and feature selection.  As mentioned previously, I use this approach that I call the Dynamic Oracle Performance Analytics (DOPA for short) to scale anomaly detection analysis to tens of thousands of performance metrics and am now applying this technique to SQL-level analysis.







In this post I will go through some examples on building the model (i.e. running the SQL statement).  I will also go into more detail on the subset parameters in relation to the major data sets including how to set the SQL List; sessions in bad interval; sessions in good interval, etc.  Following the basic pattern for the DOPA process described above, that will be steps 3 and 4 [Tweaking the model and Running the model and review the results].

In Tuning SQL Using Feature Engineering and Feature Selection Part I, Part II, Part III and Part III A, I covered steps 1 and 2 following the basic pattern of the DOPA Process [repeated below for convenience].

The Basic Pattern for the DOPA Process

At its core the DOPA process is simple; it compares the metric values encountered in a problem interval for the SQL to the values encountered in an interval when the SQL processing was normal.  The metrics values form the problem interval that are outside normal ranges are the anomalies and are shown in priority sorted order in the output you get when running the DOPA process SQL statement [yes, it’s a single SQL statement].

1.      Instrument the metrics of interest:  This is the feature engineering (unpivoting, calculating values, transforming dates into numeric metrics (e.g. number of seconds since last sample/event)).  There is nothing for you to do here unless you want to instrument different/more metrics.  I have done this for tens of thousands of metrics in the original DOPA process covered in my book.  For SQL-level metrics, I source the metrics from DBA_HIST_SQLSTAT and DBA_HIST_ACTIVE_SESS_HISTORY.  This step was covered in Part I and in Part II.

2.      Compare good run to a bad run: Here we compare the metrics encountered in a problem interval to the metrics encountered in an interval when the SQL processing was normal (I use a with clause sub-query to calculate the normal ranges).  The DOPA SQL statement does the comparison, but your input is needed here (after all you’ll be running this on a problem in a system you help look after).   This blog post will cover this step in the process in more detail.

3.      Tweak the model: This is done via input conditions to the DOPA SQL statement.  For now, accept the default conditions that are used to tweak the model.  I’ll cover this in more detail in a future when I review sample executions in detail.

4.      Run the model and review the results: AKA run the SQL statement.  This can be somewhat of an iterative process, because you may need to change the way you subset the data and have opportunities to tweak the model.  Sometimes you get many anomalies flagged, so re-run with different conditions may make the analysis of the anomalies easier.  I’ll cover this in more detail, as well, in a future when I review running the model in detail.

 

I built the DOPA process to run as a single SQL statement.  Using a series of with clause sub-queries, the SQL statement gathers the instrumented metrics for the SQL statements and sessions and intervals of interest, calculates the normal ranges then flags the metrics that are key influencers in the performance problem at hand and outputs the results.

Example Test Workload

Ordinarily, I would run the SQL Statement for Tuning SQL Using Feature Engineering and Feature Selection against a production performance issue.  But here is this case, I’ve been using as an example test case a workload generated from running a SQL statement that I tweaked from “Use the Index, Luke! Post on Testing and Scalability”.  You can skip this section if you’re not interested in the details of the workload generating SQL.

create table sections nologging as SELECT level - 1 n  FROM DUAL CONNECT BY level < 300;
execute DBMS_STATS.GATHER_TABLE_STATS(null, 'SECTIONS', CASCADE => true);
 
create table gen nologging as SELECT level x FROM DUAL CONNECT BY level < 900000;
execute DBMS_STATS.GATHER_TABLE_STATS(null, 'GEN', CASCADE => true);
 
create table scale_data nologging as 
SELECT n as section, x as id1, CEIL(DBMS_RANDOM.VALUE(0, 100)) as id2 FROM sections, gen  WHERE x <= n * 3000;
CREATE INDEX scale_slow ON scale_data (section, id1, id2);
execute DBMS_STATS.GATHER_TABLE_STATS(null, 'SCALE_DATA', CASCADE => true);
 
select table_name, num_rows, (blocks * 8 * 1024) / (1024*1024) size_meg from user_Tables;
 
TABLE_NAME                         NUM_ROWS    SIZE_MEG
------------------------------ ------------ -----------
SECTIONS                                299      .03125
GEN                                 899,999     10.9375
SCALE_DATA                      133,653,000 2442.515625

 

 

In “Luke’s” post he creates a table function, test_scalability.run [reference his post for details], which I use to run a basic query against scale_data in a loop.  Thus, the test code for the problem interval with the slow index is:

SELECT *
  FROM TABLE(test_scalability.run(
       'SELECT * ' 
      || 'FROM scale_data '
      ||'WHERE section=:1 '
      ||  'AND id2=CEIL(DBMS_RANDOM.value(1,100))', 10));

 

Basically, I’m going to generate 3 workloads as follows:

1.      Slow index

2.      Fast index

3.      No index

Workload 1 -- Slow index: 

Oracle generates SQL ID '988a1hx9zc5rr' for this code with a plan hash value 1093133531, which you can see uses the slow index.

SELECT * from table( dbms_xplan.display_awr( '988a1hx9zc5rr' , 1093133531, NULL, 'ADVANCED') );
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------
SQL_ID 988a1hx9zc5rr
--------------------
select count(*) from (SELECT * FROM scale_data WHERE section=:1 AND
id2=CEIL(DBMS_RANDOM.value(1,100)))
 
Plan hash value: 1093133531
 
--------------------------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |       |       |  1443 (100)|          |
|   1 |  SORT AGGREGATE   |            |     1 |     7 |            |          |
|   2 |   INDEX RANGE SCAN| SCALE_SLOW |  4470 | 31290 |  1443   (4)| 00:00:01 |
--------------------------------------------------------------------------------
 
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
 
   1 - SEL$F5BB74E1
   2 - SEL$F5BB74E1 / SCALE_DATA@SEL$2
 
Outline Data
-------------
 
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$F5BB74E1")
      MERGE(@"SEL$2" >"SEL$1")
      OUTLINE(@"SEL$1")
      OUTLINE(@"SEL$2")
      INDEX(@"SEL$F5BB74E1" "SCALE_DATA"@"SEL$2" ("SCALE_DATA"."SECTION"
              "SCALE_DATA"."ID1" "SCALE_DATA"."ID2"))
      END_OUTLINE_DATA
  */
 
Peeked Binds (identified by position):
--------------------------------------
 
   1 - :1 (NUMBER): 0
 
Query Block Registry:
---------------------
 
  <q o="18" f="y" h="y"><n><![CDATA[SEL$F5BB74E1]]></n><p><![CDATA[SEL$1]]
        ></p><i><o><t>VW</t><v><![CDATA[SEL$2]]></v></o></i><f><h><t><![CDATA[SC
        ALE_DATA]]></t><s><![CDATA[SEL$2]]></s></h></f></q>
  <q o="2"><n><![CDATA[SEL$1]]></n><f><h><t><![CDATA[from$_subquery$_001]]
        ></t><s><![CDATA[SEL$1]]></s></h></f></q>
  <q o="2"><n><![CDATA[SEL$2]]></n><f><h><t><![CDATA[SCALE_DATA]]></t><s><
        ![CDATA[SEL$2]]></s></h></f></q>
 
 
 
56 rows selected.

 

Workload 2 -- Fast index: 

Drop the slow index and create a fast index to generate the workload for the normal interval, then re-run.

-- "LUKE" says the following index is better:
DROP INDEX scale_slow;
CREATE INDEX scale_fast ON scale_data (section, id2, id1);
execute DBMS_STATS.GATHER_TABLE_STATS(null, 'SCALE_DATA' , CASCADE => true);

 

Oracle generates SQL ID '988a1hx9zc5rr' for this code with a plan hash value 2609620195, which you can see has a lower cost and uses the fast index.

SELECT * from table( dbms_xplan.display_awr( '988a1hx9zc5rr' , 2609620195 , NULL, 'ADVANCED') );
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------
SQL_ID 988a1hx9zc5rr
--------------------
select count(*) from (SELECT * FROM scale_data WHERE section=:1 AND
id2=CEIL(DBMS_RANDOM.value(1,100)))
 
Plan hash value: 2609620195
 
--------------------------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |       |       |    11 (100)|          |
|   1 |  SORT AGGREGATE   |            |     1 |     7 |            |          |
|   2 |   INDEX RANGE SCAN| SCALE_FAST |  2486 | 17402 |    11   (0)| 00:00:01 |
--------------------------------------------------------------------------------
 
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
 
   1 - SEL$F5BB74E1
   2 - SEL$F5BB74E1 / SCALE_DATA@SEL$2
 
Outline Data
-------------
 
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$F5BB74E1")
      MERGE(@"SEL$2" >"SEL$1")
      OUTLINE(@"SEL$1")
      OUTLINE(@"SEL$2")
      INDEX(@"SEL$F5BB74E1" "SCALE_DATA"@"SEL$2" ("SCALE_DATA"."SECTION"
              "SCALE_DATA"."ID2" "SCALE_DATA"."ID1"))
      END_OUTLINE_DATA
  */
 
Peeked Binds (identified by position):
--------------------------------------
 
   1 - :1 (NUMBER): 0
 
Query Block Registry:
---------------------
 
  <q o="18" f="y" h="y"><n><![CDATA[SEL$F5BB74E1]]></n><p><![CDATA[SEL$1]]
        ></p><i><o><t>VW</t><v><![CDATA[SEL$2]]></v></o></i><f><h><t><![CDATA[SC
        ALE_DATA]]></t><s><![CDATA[SEL$2]]></s></h></f></q>
  <q o="2"><n><![CDATA[SEL$1]]></n><f><h><t><![CDATA[from$_subquery$_001]]
        ></t><s><![CDATA[SEL$1]]></s></h></f></q>
  <q o="2"><n><![CDATA[SEL$2]]></n><f><h><t><![CDATA[SCALE_DATA]]></t><s><
        ![CDATA[SEL$2]]></s></h></f></q>
 
 
 
56 rows selected.

 

Workload 3 – No index: 

Drop the fast index to generate the workload for another problem interval, then re-run.

-- let's see how it does with no index 
DROP INDEX scale_fast;
execute DBMS_STATS.GATHER_TABLE_STATS(null, 'SCALE_DATA' , CASCADE => true);
-- then do another test:[but not until 10 PM EST]
SELECT *
  FROM TABLE(test_scalability.run(
       'SELECT * ' 
      || 'FROM scale_data '
      ||'WHERE section=:1 '
      ||  'AND id2=CEIL(DBMS_RANDOM.value(1,100))', 10)); 

 

Oracle generates SQL ID '988a1hx9zc5rr' for this code with a plan hash value 4175758320, which you can see uses no index.

SELECT * from table( dbms_xplan.display_awr( '988a1hx9zc5rr' , 4175758320 , NULL, 'ADVANCED') );
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------
SQL_ID 988a1hx9zc5rr
--------------------
select count(*) from (SELECT * FROM scale_data WHERE section=:1 AND
id2=CEIL(DBMS_RANDOM.value(1,100)))
 
Plan hash value: 4175758320
 
-----------------------------------------------------------------------------------------
| Id  | Operation                  | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |            |       |       | 85651 (100)|          |
|   1 |  SORT AGGREGATE            |            |     1 |     7 |            |          |
|   2 |   TABLE ACCESS STORAGE FULL| SCALE_DATA |  2478 | 17346 | 85651   (2)| 00:00:04 |
-----------------------------------------------------------------------------------------
 
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
 
   1 - SEL$F5BB74E1
   2 - SEL$F5BB74E1 / SCALE_DATA@SEL$2
 
Outline Data
-------------
 
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$F5BB74E1")
      MERGE(@"SEL$2" >"SEL$1")
      OUTLINE(@"SEL$1")
      OUTLINE(@"SEL$2")
      FULL(@"SEL$F5BB74E1" "SCALE_DATA"@"SEL$2")
      END_OUTLINE_DATA
  */
 
Peeked Binds (identified by position):
--------------------------------------
 
   1 - :1 (NUMBER): 0
 
Query Block Registry:
---------------------
 
  <q o="18" f="y" h="y"><n><![CDATA[SEL$F5BB74E1]]></n><p><![CDATA[SEL$1]]></p><i><
        o><t>VW</t><v><![CDATA[SEL$2]]></v></o></i><f><h><t><![CDATA[SCALE_DATA]]></t><s>
        <![CDATA[SEL$2]]></s></h></f></q>
  <q o="2"><n><![CDATA[SEL$1]]></n><f><h><t><![CDATA[from$_subquery$_001]]></t><s><
        ![CDATA[SEL$1]]></s></h></f></q>
  <q o="2"><n><![CDATA[SEL$2]]></n><f><h><t><![CDATA[SCALE_DATA]]></t><s><![CDATA[S
        EL$2]]></s></h></f></q>
 
 
 
55 rows selected.

 

The 3 Workloads Summary

ASH data summary for the 3 workloads running this query (sql_id = '988a1hx9zc5rr'):

-- ======================================================================================================================
-- use this to gather test intervals --  sql_id 988a1hx9zc5rr 
-- ======================================================================================================================
set linesize 200
prompt dba_hist_active_sess_history 
prompt ==============================================================
SELECT /*+LEADING(x h) USE_NL(h)*/
count(distinct h.sql_exec_id) distinct_sql_exec_id
, h.sql_id
, h.sql_plan_hash_value
, SUM(10) ash_secs
, to_char(min(sample_time), 'MM_DD_YYYY_HH24_MI') start_time
, to_char(max(sample_time), 'MM_DD_YYYY_HH24_MI') end_time
, min(snap_id) min_snap_id
, max(snap_id) max_snap_id
, min(sample_id) min_sample_id
FROM dba_hist_active_sess_history h
WHERE 1=1
  and sql_id = '988a1hx9zc5rr'
  and snap_id in (select snap_id from dba_hist_snapshot where trunc(begin_interval_time) = to_date('06-APR-2022', 'DD-MON-YYYY'))
GROUP BY h.sql_id, h.sql_plan_hash_value
ORDER BY min_sample_id, sql_id --ash_secs DESC 
;
 
DISTINCT_SQL_EXEC_ID SQL_ID        SQL_PLAN_HASH_VALUE    ASH_SECS START_TIME       END_TIME         MIN_SNAP_ID MAX_SNAP_ID MIN_SAMPLE_ID
-------------------- ------------- ------------------- ----------- ---------------- ---------------- ----------- ----------- -------------
                 380 988a1hx9zc5rr          1093133531        3800 04_06_2022_14_19 04_06_2022_15_22        5647        5648      16907194
                   3 988a1hx9zc5rr          2609620195          30 04_06_2022_19_25 04_06_2022_20_21        5652        5653      16925514
                 368 988a1hx9zc5rr          4175758320        3680 04_06_2022_21_09 04_06_2022_22_10        5654        5655      16931744

 

Rounding the time intervals out to full hours (I have hourly snapshots) and labeling the intervals.  I’ll use these to subset the data when running the DOPA process:

BAD INTERVAL:  04_06_2022_14_00 to 04_06_2022_16_00 ; snaps: 5647 to 5648

GOOD INTERVAL: 04_06_2022_19_00 to 04_06_2022_21_00 ; snaps: 5652 to 5653

BAD Interval2: 04_06_2022_21_00 to 04_06_2022_23_00 ; snaps: 5654 to 5655

Running the DOPA process with SQL-Level Metrics Instrumented:

I’ve written the code to run from SQL*Plus and for convenience I separated out the bind variable setting in a separate file from the SQL statement. 

Test Bad Interval 1 Against the Normal/Good Interval:

-- sqlid=988a1hx9zc5rr
--:sql_id_string_comma_sep_lst
undefine sql_id_string_comma_sep_lst
variable sql_id_string_comma_sep_lst varchar2(4000)
execute :sql_id_string_comma_sep_lst := '988a1hx9zc5rr';
 
--:sessions_for_interval
undefine sessions_for_interval
variable sessions_for_interval varchar2(4000)
execute :sessions_for_interval := '';
 
--:sessions_for_normal 
undefine sessions_for_normal
variable sessions_for_normal varchar2(4000)
execute :sessions_for_normal := '';
 
-- ====================================================================================================
--:stats_days_back_only_Y_N
undefine stats_days_back_only_Y_N                
variable stats_days_back_only_Y_N varchar2(20)
execute :stats_days_back_only_Y_N := 'Y';
 
--:stats_days_back
undefine stats_days_back
variable stats_days_back number
execute :stats_days_back := 30;
 
--:allint_st_MM_DD_YYYY_HH24_MI
undefine allint_st_MM_DD_YYYY_HH24_MI
variable allint_st_MM_DD_YYYY_HH24_MI varchar2(20)
execute :allint_st_MM_DD_YYYY_HH24_MI := '04_06_2022_00_00';
 
--:allint_end_MM_DD_YYYY_HH24_MI
undefine allint_end_MM_DD_YYYY_HH24_MI
variable allint_end_MM_DD_YYYY_HH24_MI varchar2(20)
execute :allint_end_MM_DD_YYYY_HH24_MI := '04_07_2022_00_00';
 
-- testing sql_id 988a1hx9zc5rr  
-- problem: 04_06_2022_14_00 04_06_2022_16_00
--:intrvl_st_MM_DD_YYYY_HH24_MI
undefine intrvl_st_MM_DD_YYYY_HH24_MI
variable intrvl_st_MM_DD_YYYY_HH24_MI varchar2(20)
execute :intrvl_st_MM_DD_YYYY_HH24_MI := '04_06_2022_14_00';
 
--:intrvl_end_MM_DD_YYYY_HH24_MI
undefine intrvl_end_MM_DD_YYYY_HH24_MI
variable intrvl_end_MM_DD_YYYY_HH24_MI varchar2(20)
execute :intrvl_end_MM_DD_YYYY_HH24_MI := '04_06_2022_16_00';
 
-- testing sql_id 988a1hx9zc5rr 
-- normal:  04_06_2022_19_00 04_06_2022_21_00
--:normRng_st_MM_DD_YYYY_HH24_MI
undefine normRng_st_MM_DD_YYYY_HH24_MI
variable normRng_st_MM_DD_YYYY_HH24_MI varchar2(20)
execute :normRng_st_MM_DD_YYYY_HH24_MI := '04_06_2022_19_00';
 
--:normRng_end_MM_DD_YYYY_HH24_MI
undefine normRng_end_MM_DD_YYYY_HH24_MI
variable normRng_end_MM_DD_YYYY_HH24_MI varchar2(20)
execute :normRng_end_MM_DD_YYYY_HH24_MI := '04_06_2022_21_00';
 
undefine metric_name_ash
variable metric_name_ash varchar2(100)
execute :metric_name_ash := '%';
 
undefine flagged_values_only_Y_N
variable flagged_values_only_Y_N varchar2(20)
execute :flagged_values_only_Y_N := 'Y';
 
undefine flag_ratio
variable flag_ratio number
execute :flag_ratio := 0.00;
 
-- ====================================================================================================
select * from 
(
select 01 ord, ':sql_id_string_comma_sep_lst' param , :sql_id_string_comma_sep_lst value, 'Enter a comma separated list of sql_id''s' descr from dual union
select 02 ord, ':sessions_for_interval' param, nvl(:sessions_for_interval, 'null') value, 'optional: comma separated list of sid|serial# for problem interval' descr  from dual union
select 03 ord, ':sessions_for_normal' param ,  nvl(:sessions_for_normal, 'null') value, 'optional: comma separated list of sid|serial# for normal interval' descr  from dual union
select 04 ord, ':stats_days_back_only_Y_N' param ,  nvl(to_char(:stats_days_back_only_Y_N), 'null') value, 'optional: easy way to set a problem interval date range' descr  from dual union
select 05 ord, ':stats_days_back' param ,  nvl(to_char(:stats_days_back), 'null') value, 'optional: specify a number of days back' descr  from dual union
select 06 ord, ':allint_st_MM_DD_YYYY_HH24_MI' param ,  nvl(:allint_st_MM_DD_YYYY_HH24_MI, 'null') value, 'Start date/time for all metrics' descr  from dual union
select 07 ord, ':allint_end_MM_DD_YYYY_HH24_MI' param ,  nvl(:allint_end_MM_DD_YYYY_HH24_MI, 'null') value, 'End date/time for all metric' descr  from dual union
select 08 ord, ':intrvl_st_MM_DD_YYYY_HH24_MI' param ,  nvl(:intrvl_st_MM_DD_YYYY_HH24_MI, 'null') value, 'Start date/time for the problem interval' descr  from dual union
select 09 ord, ':intrvl_end_MM_DD_YYYY_HH24_MI' param ,  nvl(:intrvl_end_MM_DD_YYYY_HH24_MI, 'null') value, 'End date/time for the problem interval' descr  from dual union
select 10 ord, ':normRng_st_MM_DD_YYYY_HH24_MI' param ,  nvl(:normRng_st_MM_DD_YYYY_HH24_MI, 'null') value, 'Start date/time for the normal ranges interval' descr  from dual union
select 11 ord, ':normRng_end_MM_DD_YYYY_HH24_MI' param ,  nvl(:normRng_end_MM_DD_YYYY_HH24_MI, 'null') value, 'End date/time for the normal ranges interval' descr  from dual union
select 12 ord, ':metric_name_ash' param ,  nvl(:metric_name_ash, 'null') value, 'to subset :metric_name like <>' descr  from dual union
select 13 ord, ':flagged_values_only_Y_N' param ,  nvl(to_char(:flagged_values_only_Y_N), 'null') value, 'usually set to Y  - exception: "N" for Metric Time-Series' descr  from dual union
select 14 ord, ':flag_ratio' param ,  nvl(to_char(:flag_ratio), 'null') value, 'to subset on metrics with flag_ratio values > a specified flag_ratio' descr  from dual 
)
order by ord
;

 

Output listing the substitution variable parameters, their values and description. 






 

You may have noticed this tests compares:

BAD INTERVAL1:  04_06_2022_14_00 to 04_06_2022_16_00 ; snaps: 5647 to 5648; that used the slow index, to

GOOD INTERVAL: 04_06_2022_19_00 to 04_06_2022_21_00 ; snaps: 5652 to 5653; which used the fast index.

I’ll compare the Good interval to BAD Interval2: 04_06_2022_21_00 to 04_06_2022_23_00 , in a later example.

Running the anomaly detection code yields the following results:





Discussion on this test run follows:

-        Out of about 90 or so instrumented metrics, the engineered feature from SQLSTAT “total io_interconnect (meg)” is the metric with the biggest departure from the normal range with a flag ration of 8.32, followed by “total iowait (sec)” and from ASH, “total time in wait class: User I/O (sec)”

-        The DOPA process counts intervals where the metric was outside of normal range, in this case out or 3 intervals, one interval recorded values for the metric that were outside normal range.

-        I also aggregate all the metrics beginning with “total” across all intervals, prefix that new metric with “SUM: ” and do a straight comparison to those same metrics in the normal interval, and in flag_eval, put the results of the comparison. 

-        For the metrics beginning with “SUM:”, I expected to see everything flagged in the problem interval, but found some metrics flagged in the normal interval.  Let me explain my interpretation: Since the query using the fast index was so much better than the query using the slow index, there were many more executions completed, and thus more fetches, rows processed, physical reads, etc.

 

Test Bad Interval 2 Against the Normal/Good Interval:

In the next test I’ll compare:

BAD Interval2: 04_06_2022_21_00 to 04_06_2022_23_00 that uses no index to

GOOD INTERVAL: 04_06_2022_19_00 to 04_06_2022_21_00 ; snaps: 5652 to 5653; which used the fast index.

Parameters follow:




Running the anomaly detection code yields the following results:



Discussion on this test run follows:

-        Out of about 90 or so instrumented metrics, the engineered feature from ASH “total max pga_allocated (meg)” is the metric with the biggest departure from the normal range, but with such a low flag ration of .03, I’d say that observation is insignificant, in fact you can tweak the model and subset out for say flag ratios below 1 [example below].

-        Most of the anomalies are flagged for the cross-interval aggregate metrics beginning with “SUM:”.  Again for the metrics flagged in the normal interval, since the query using the fast index was so much better than the query doing full table scan, there were many more executions completed, and thus more fetches, rows processed, physical reads, etc.

-        I also found it interesting that ASH: “SUM: total is_session_on_cpu (sec)” pretty much correlates exactly with SQLSTAT: “SUM: total cpu_time (sec)” as does ASH: “SUM: total in_plsql_execution (sec)” correlates with SQLSTAT: “SUM: total plsexec_time (sec)”.  I guess one should expect this correlation because they are more or less measuring the same thing, but internally (as I understand), the “equivalent” metrics in  ASH and SQLSTAT are not measured the same way.

For now, I’ll leave the rest of the analysis to the reader.

Tweaking the Model:

I did discuss tweaking the model; in this example I’m ignoring all results with a flag ratio lower that 1.00, by resetting the value for :flag_ratio to 1.00.  This version of the anomaly detection model gives us a somewhat better focus on more meaningful metric anomalies.  Note, only 25 anomalies listed, where previously 33 anomalies were listed.  Obviously, this can be tweaked even further if desired.


 

What’s next:

I did promise a jet tour of the Dynamic SQL-Level Analysis Code, so buckle your seat belts!!!

 

 [CR1]

Taming the AWR Tsunami – An Innovative Approach

Tuning SQL Using Feature Engineering and Feature Selection - Part III Addendum