Discovering Performance Anomalies - Index Compression Overhead in High Volume Insert Workloads

Preamble on Anomaly Detection:

The quote by William Butler Yeats, "Education is not the filling of a bucket, but the lighting of a fire," resonates with the concept of personal experimentation and anomaly detection in performance analysis. Like education, solving performance anomalies is not about simply accumulating knowledge or blindly applying preconceived solutions. It is about igniting curiosity, embracing a dynamic approach, and delving deep into the anomalies to uncover their underlying meaning. Just as education sparks a fire within us, anomaly detection and deep diving into root causes of performance problems fuel our quest for understanding the intricacies of complex systems. By actively exploring and experimenting with the anomalies, we can unravel insights that go beyond the known boundaries, driving innovation, optimization, and improved performance.

Catching on fire for learning the intricacies of Oracle performance.

 

Sidebar Note: Anomaly Detection for Analyzing Arbitrary Performance Problems:

The analysis of the anomaly detection results provides valuable insights into this case trying to understand the impact of index compression-related statistics on the slow insert rates. In general, the approach of using anomaly detection to analyze a large number of performance metrics allows for a more comprehensive understanding of the problem than traditional small-model approaches provide, as you now have easy access to tens of thousands of performance metrics and thus flag up anomalies that need further investigation.

 

Subsequent deep diving into obscure metrics and researching their meaning in Oracle documentation and other resources is crucial for gaining a better understanding of the performance behavior and learning the intricacies of Oracle.  This deep diving illustrates how one can take a proactive approach to problem-solving and leverage their knowledge from relational database management systems and computer science.

 

 

It's worth mentioning that the anomaly detection results themselves do not provide a definitive solution to the problem. They serve as a starting point for deeper analysis and investigation. The identified metrics should be correlated with other relevant factors, such as the partitioning scheme, primary key validation, system resource utilization, and query execution plans, to form a holistic understanding of the performance issues.

 

In my view, the approach of using anomaly detection to identify potential problem areas and the emphasis on in-depth analysis of specific metrics is indispensable. By leveraging this information and combining it with further investigation, you can make informed decisions and take targeted actions to improve the insert performance in the problem workloads.

 

Problem Statement and Challenges:

The performance issue is that of slow insert rates into a large, partitioned table that utilizes compression and exhibits high CPU usage. The main challenges are experienced during the insert process, where the table's size, partitioning, compression, and CPU utilization play a significant role.

 

The table in question is of substantial size and contains a significant number of partitions. This complexity adds to the overall overhead and potentially impacts the insert performance. The table and its index are compressed using advanced compression techniques, which can reduce storage requirements but may introduce additional processing overhead during inserts.

 

During the insert process, the sessions are mostly ON CPU, indicating that the insert process is computationally heavy. The execution plan seems optimal for the merge, but it is using nested loops rather than a hash join.  Additionally, some IO waits are recorded, suggesting that a smaller portion of the delays are in the input/output operations during inserts.

 

The observed insert rates vary depending on the number of rows being inserted. Larger insert operations, such as inserting 1 million rows, take considerably more time compared to smaller inserts of around 150 thousand rows (2 minutes plus compared to 20 seconds). This non-linear behavior indicates that the insert performance does not scale linearly with the number of rows.

 

The usual insert rate is around 8,600 rows per second, with occasional peaks reaching the 20,000 range. However, the expectation is that the performance should be significantly higher than these numbers [as was the case when inserting into a newly created table], leading to concerns about the efficiency of the insert process.

 

Furthermore, during problem workloads, there are high values for specific dba_hist_sysstat metrics related to index compression operations. This suggests that the compression and decompression activities, especially related to the indexes, may be contributing to the slow insert rates and increased CPU usage.

 

Summary of Anomaly Detection Results for Index Compression-Related Statistics in Problem Workloads:

Understanding and addressing the underlying causes of the slow insert rates, such as the impact of partitioning, compression, and index operations, will be crucial in improving the overall performance of the insert process.  So where do I look for the root cause indicators?  As you are probably aware there are tens of thousands of Oracle performance metrics that are exposed by DBA_HIST views from the Automated Workload Repository (AWR).  With such a massive set of performance metrics it is nearly impossible to solve arbitrary performance problems using traditional small model approaches where you look at a small set of well-known metrics.  Using the anomaly detection method I developed (detailed in my book: “Dynamic Oracle Performance Analytics”), I used this dynamic approach to analyze anomalies from several problem insert workloads compared to several normal insert workloads.  I should note that anomaly detection among the tens of thousands of metrics does not give you the answer to the problem, however, it does point you in the right direction for further analysis.  Once you have the results of the anomaly detection model, this is where you put on your thinking cap.  Analyzing the model may involve diving deep into obscure metrics that are scantly documented.  It is not uncommon for one to encounter metrics that you will have little to no prior experience with, therefore, we go to the Oracle doc’s and other resources to research the meaning of the metrics (a background in relational database management systems and computer science helps).

Interestingly, in this case, I found that many values for index compression-related statistics exposed via the view dba_hist_sysstat (see below) were extremely high for the problem workloads compared to the normal workloads [to be totally honest, prior to this analysis I didn’t know that these metrics existed, so I had to look them up in the Oracle doc’s]:

-              index cmph sc, ffs decomp buffer rows avail

-              index cmph sc, ffs decomp buffer rows used

-              index cmph sc, ffs decomp buffers

-              index cmph sp, leaf recompress

-              index cmph sp, deci norecomp donotrecomp bit

-              index cmph sp, leaf norecomp limit

-              index cmph ld, lf blks w/o unc r

-              index cmph cu, uncomp sentinels

-              index crx upgrade (prefetch)

 

I did a deep dive on these statistics to understand their impact on performance (details below**). The analysis of index compression-related statistics revealed significant anomalies in the problem insert workload compared to the normal workload. These anomalies indicate potential issues with index compression and its impact on performance. The high values observed in these statistics suggest frequent decompression operations, heavy access to compressed indexes, and limitations in the recompression process. These factors can result in increased CPU resource usage, potential latency, and suboptimal utilization of compression benefits. Monitoring and investigating these anomalies provide valuable insights for optimizing index compression strategies, improving storage efficiency, and enhancing overall system performance.

 

Summary of Factors Influencing the Slow Inserts:

The main concern is the slow insert rates, especially with the increasing number of partitions. High CPU usage and significant enough IO waits contribute to the performance issue. The compression settings, merge statement, and index compression operations may also play a role. Addressing these factors will be crucial in improving the insert performance and preventing application timeouts.

 

The slow insert rate in this scenario could be influenced by several factors. Here are a few potential reasons for the observed behavior:

 

1.      Compression Overhead: While compression can reduce storage requirements, it also introduces overhead during the compression and decompression operations. In this case, the high values for dba_hist_sysstat metrics related to index compression operations indicate that the compression operations are consuming significant CPU resources. This could contribute to slower insert rates, especially when dealing with larger amounts of data.

a.      Possible Solution:  Evaluate if compression is needed for this table and its index.  It could be that compression is not adding that much space savings in comparison to the costs it introduces.

 

2.      Primary Key Constraints: Since the table has a primary key defined, the database performs uniqueness checks during the insert process to ensure that no duplicate key values are inserted. This validation process can add overhead, especially when inserting a large number of rows. The primary key (PK) validation process can potentially increase the overhead with a compressed index, especially if decompression is required during the validation process.  Rationale:  When a row is inserted or modified in a table with a primary key constraint, the database needs to ensure that the new or modified key value does not violate the uniqueness constraint. This involves validating the key against the existing data in the index. If the index is compressed, the database may need to decompress the relevant index blocks to perform the primary key validation. Decompression requires additional CPU and potentially I/O resources, which can contribute to increased overhead.  The overhead associated with decompression and primary key validation can impact the overall performance of insert and update operations, particularly in high-volume scenarios. The additional CPU and I/O usage required for decompression can slow down the validation process and potentially affect the insert rate.  It's worth noting that the performance is worse the number of rows being inserted is high (thus increased size of the compressed index). 

a.      Possible Solution 1: It may be of value to test adjusting the compression settings or not compress the index to optimize the primary key validation process, thus mitigating the overhead caused by the compressed index and PK validation.  N.B. If you are interested in improving the compression ratio, there is some indication in the Oracle doc’s that ordering the inserts can also improve the compression ratio.

b.      Possible Solution 2: Consider if you can make the index partition unusable during the merge, then rebuild it after the merge.  Rationale: this would eliminate the index lookup in the PK validation.



 

3.      Partitioning: While the size of the partitioned table itself does not directly impact insert performance, it can indirectly affect performance through factors such as data redistribution, partition maintenance, index maintenance, and disk I/O considerations. I did not see any anomalies in the metrics that were directly related to size.  Having said that, partition-based inserts can sometimes be slower in a table with a large number of partitions, such as in this case of nearly 10,000 partitions. This is because each partition is managed as a separate segment in the database, and the overhead of managing a large number of partitions can impact the performance of insert operations. 

a.      Metadata Operations:  The database will need to perform metadata operations such as maintaining partition statistics and managing space allocation; with a large number of partitions, these metadata operations can become more time-consuming and impact the overall insert performance.

b.      Index Maintenance: For tables with indexes, the database needs to update the index entries for each inserted row. With a large number of partitions, the index maintenance overhead can increase, affecting the overall insert performance.

c.      Locking and Serialization: In general, the database needs to coordinate concurrent inserts into multiple partitions. This can result in locking and serialization overhead, especially if multiple sessions are concurrently inserting into different partitions. As the number of partitions increases, contention for resources can slow down the insert operations. 

                                                    i.     N.B. In this case there were some concurrency related metrics flagged up in the anomaly detection, but it was difficult to say if these came from other processes in the workload.  The wait events for the sessions doing the merges did not indicate any concurrency class wait events.

d.      List Partitioning: In general, list partitioning can have an impact on insert performance, particularly if the partitioning scheme is not efficiently aligned with the insert patterns. If the data being inserted does not align well with the partition boundaries, it can result in suboptimal performance due to the need for data redistribution across partitions. It is worth evaluating if the partitioning strategy is suitable for the insert workload. 

                                                    i.     N.B. In this case the merge inserts data into a single new partition, and the update would also be within an existing single partition, thus the merge pattern aligns perfectly with the partitions.

 

4.      Global Temporary Table: The use of a global temporary table for inserting data can introduce some overhead, depending on the complexity of the operations performed on the temporary table. It is worth evaluating if the operations on the temporary table, such as the merge statement, involve complex queries or significant data transformations that could impact performance.

a.      N.B. In this case it did not appear that the GTT had much of an negative impact on performance, however, it may be of benefit to split out the merge in to separate Insert and update operations

 

5.      System Resource Limitations: It is important to assess the overall system resources, including CPU, memory, and I/O, to ensure that they are sufficient to handle the workload. If the system is resource-constrained, it can lead to slower insert rates due to contention and increased wait times.

a.      N.B. In this case there is plenty of capacity during problem workloads.

 

Solution Experimentation

In light of the prior hypothesis that the size of the table and the number of partitions indirectly contribute to overhead related to metadata operations, index maintenance, locking, and serialization, the application team conducted solution experiments to validate this claim. By dropping a significant number of partitions in the first experiment, they observed a substantial reduction in the merge insert time for 500,000 rows, indicating that the reduced table size and partition count alleviated the associated overhead. The subsequent experiment with a smaller number of partitions further corroborated this finding, leading to even faster insert times. These results suggest that the hypothesis holds true, emphasizing the impact of table and partition size on the aforementioned performance factors. Additionally, the team's plan to explore the NOCOMPRESS option for the table and index in an upcoming POC aligns with the hypothesis by aiming to minimize the overhead introduced by compression-related operations. By actively investigating these factors through experimentation, the team aims to optimize the system's performance by addressing the indirect overhead resulting from table size, partition count, and compression.

 

** Detailed Analysis of Index Compression Anomalies

The anomaly detection analysis of the problem insert workload compared to the normal insert workload revealed high values for several index compression-related statistics in the dba_hist_sysstat data. The following statistics exhibited significant anomalies:

 

-        "index cmph sc, ffs decomp buffer rows avail": This statistic indicates the number of rows in the decompressed buffer available for fast full scan (FFS) operations on the index.  During a fast full scan, Oracle Database reads index blocks directly instead of accessing data blocks, allowing for efficient scanning of the index structure. However, if the index is compressed, the database needs to decompress the compressed data before it can be utilized for the FFS operation.  A high value for this statistic suggests that there is a significant amount of data that needs to be decompressed during the problem workload. This could be an indication that the compressed index is being heavily accessed, resulting in frequent decompression operations. The excessive decompression may impact the overall performance of the system, as it requires additional CPU resources and potentially introduces latency.  

 

-        "index cmph sc, ffs decomp buffer rows used": This statistic indicates the number of rows in the decompressed buffer used for fast full scan of the index.  The high value suggests that the workload is heavily accessing the compressed index, requiring frequent decompression of index data. This can impact performance due to the additional CPU resources required for decompression and the potential latency introduced by the decompression process.

 

-        "index cmph sc, ffs decomp buffers": This statistic denotes the number of blocks decompressed for index fast full scans. If the value of this statistic is high, it suggests that a significant number of index blocks are being decompressed during fast full scans. This indicates a substantial amount of data requiring decompression, which can have an impact on performance. Decompression requires additional CPU resources and introduces potential latency during the decompression process.

 

-        "index cmph sp, leaf recompress": This statistic reflects the number of times a leaf block Compression Unit was recompressed.  The high value suggests that during the execution of a workload, certain Compression Units within the leaf blocks of the index were subjected to recompression. Recompression occurs when the compression algorithm determines that a Compression Unit can be further compressed, leading to potential space savings and improved storage efficiency.  Monitoring the "index cmph sp, leaf recompress" statistic provides insights into the frequency of recompression events for Compression Units within the leaf blocks of an index. Higher values for this statistic suggest that there are frequent instances where the compression algorithm identifies opportunities for additional compression within the leaf blocks.

 

-        "index cmph sp, deci norecomp donotrecomp bit": This statistic indicates the presence of the "donotrecomp" bit for decimal prefix compressed leaf blocks. The "donotrecomp" bit serves as a flag that indicates whether recompression is allowed for these blocks.  When the "donotrecomp" bit is set, it prevents recompression from occurring on the affected leaf blocks. In other words, the blocks with the "donotrecomp" bit set will not undergo recompression, even if there are subsequent operations or modifications that could potentially trigger recompression.  The high value suggests that a significant number of decimal prefix compressed leaf blocks are marked to prevent recompression in the problem workload.  Recompression is an important aspect of index compression as it allows for efficient storage and improved performance by reducing the size of the index structure. When recompression is disabled by setting the "donotrecomp" bit, it can limit the potential benefits of compression.  Ideally, in a well-optimized system, the "donotrecomp" bit should be set sparingly, allowing for recompression to occur when necessary. This allows the system to take advantage of the compression algorithms and optimize the storage and performance of the index.  A high value for the "index cmph sp, deci norecomp donotrecomp bit" statistic suggests that recompression is being frequently bypassed, which may indicate a suboptimal use of compression or specific requirements that prevent recompression. It is worth investigating the reasons behind the frequent setting of the "donotrecomp" bit to ensure that the compression strategy aligns with the goals of efficient storage and performance optimization.

 

-        "index cmph sp, leaf norecomp limit": This statistic represents the number of times leaf block recompression has reached a predefined limit where no further recompression occurs. Higher values for this statistic indicate that the compression algorithm has encountered leaf blocks that have already reached the recompression limit and, as a result, no further recompression is performed on those blocks.

 

-        "index cmph ld, lf blks w/o unc r": This statistic denotes the number of index leaf blocks that have been flushed without any uncompressed rows. In other words, it indicates the count of leaf blocks in the index that have been written to disk or storage without including any rows in their uncompressed form.  When the value of this statistic is high, it suggests that a significant number of leaf blocks in the index are lacking uncompressed rows. This means that the compression algorithm is working efficiently and has successfully compressed all the rows within those leaf blocks, resulting in space savings.  While this statistic is beneficial and demonstrates the effectiveness of index compression in optimizing storage utilization, it is a useful indicator that the performance problem (i.e. where the CPU is going) is related to index compression.

 

-        "index cmph cu, uncomp sentinels": This statistic represents the number of Compression Units created with uncompressed sentinels in the context of index compression. In an Oracle database, index compression involves the use of sentinels, which are special markers used to represent repeated values within a Compression Unit. Sentinels help to reduce the storage space required by replacing repetitive occurrences of a value with a shorter representation.  However, in some cases, certain Compression Units cannot be effectively compressed due to various factors such as the nature of the data or limitations imposed by the compression algorithm. As a result, these Compression Units are created with uncompressed sentinels. Uncompressed sentinels indicate that the values within those Compression Units have not been successfully compressed and are stored in their original, uncompressed form. A higher value for this statistic suggests that a considerable number of Compression Units were unable to be compressed effectively, resulting in the presence of uncompressed sentinels within those units.  This statistic helps in assessing the effectiveness of index compression and identifying cases where compression is not achieving the desired level of space savings. It highlights areas where the compression algorithm may encounter difficulties in compressing certain types of data, leading to the creation of Compression Units with uncompressed sentinels.

 

-        "index crx upgrade (prefetch)": This statistic indicates the number of index blocks upgraded due to prefetching. The upgrade/prefetch process is invoked during a query/data retrieval process in attempt to optimize data access by fetching additional data blocks into the memory cache before they are actually needed. This helps to minimize disk I/O and improve query performance.  Higher values for this statistic indicate that prefetching has been utilized more frequently, suggesting that the optimizer is successfully identifying opportunities for prefetching and taking advantage of them to enhance data retrieval efficiency.  Although index prefetching and index compression are separate mechanisms, it is possible to observe high values for statistics related to both aspects during problem workloads. This could be due to the nature of the workload and the specific characteristics of the indexed data.  For example, a workload with frequent index accesses and complex query patterns may trigger more aggressive prefetching of index blocks to optimize data retrieval. At the same time, the high values for index compression-related statistics may indicate that the compressed indexes are being actively utilized during query execution.

 

The identification of specific index compression-related statistics with extremely high values in the problem workloads compared to normal workloads is significant. These metrics provide indicators of the root causes of bottlenecks or inefficiencies related to index compression operations. This insight can help guide further investigation and potential optimization strategies to address the observed performance issues during the problem insert workload and hopefully lead to solutions to the slow insert rates.

 

 

Optimizing Oracle Applications - Harnessing SQL Profiles and SQL Plan Baselines

Maximizing Application Performance: Unleashing the Power of Oracle’s Diagnostic and Tuning Pack Privileges for Non-DBAs