Cache

CPU cache metrics

Bandwidth

Cache bandwidth represents the rate at which data is transferred to and from the CPU caches. It provides insights into how efficiently the caches handle data movement, including loads and evictions.

Metric [e.g. GByte/s]	Description
L2 bandwidth	Overall bandwidth of the Level 2 cache
L3 bandwidth	Overall bandwidth of the Level 3 cache
L2D load bandwidth	Bandwidth used by load operations on the L2D (Level 2 Data) cache
L2D evict bandwidth	Bandwidth used by eviction operations on the L2D cache (data evicted to lower memory levels)
L3 load bandwidth	Bandwidth used by load operations on the L3 cache
L3 evict bandwidth	Bandwidth used by eviction operations on the L3 cache (data evicted to lower memory levels)

High cache bandwidth is indicative of efficient data transfers between CPU and memory subsystems, which is crucial for performance in memory-bound workloads. However, bandwidth alone does not guarantee optimal performance, as high bandwidth with frequent evictions could point to inefficient cache utilization. Analyzing cache bandwidth alongside other metrics like cache hit rates and stalls helps in understanding cache behavior and identifying potential bottlenecks. Efficient use of cache hierarchies can significantly reduce latency and improve overall system performance.

Data Volume

Cache data volume measures the total amount of data transferred through the L2 and L3 caches, including loads and evictions. It provides an understanding of the workload's memory traffic at different cache levels. Cache data volume is a critical metric for understanding the memory footprint and access patterns of an application.

Metric [e.g. GByte]	Description
L2 volume	Total data volume handled by the Level 2 cache
L3 volume	Total data volume handled by the Level 3 cache
L2D load volume	Volume of data loaded into the L2 Data (L2D) cache
L2D evict volume	Volume of data evicted from the L2D cache to lower memory levels
L3 load volume	Volume of data loaded into the L3 cache
L3 evict volume	Volume of data evicted from the L3 cache to lower memory levels

High data volumes suggest a workload with significant memory traffic, which could stress cache hierarchies and lead to performance bottlenecks. However, high volume alone does not imply inefficiency; it must be analyzed in the context of other metrics such as cache bandwidth, hit rates, and execution stalls to determine whether the cache subsystem is effectively supporting the workload. Optimizing data locality and minimizing evictions can help reduce unnecessary memory traffic and improve performance.

Miss Rate

Cache miss rate measures the percentage of memory requests that could not be fulfilled by the cache and required fetching data from lower levels in the memory hierarchy. It indicates the efficiency of the cache system in meeting data demands.

Metric [%]	Description	Formula
L2 miss rate	Percentage of Level 2 cache misses relative to total instructions	(L2 misses / total instructions ) * 100
L3 miss rate	Percentage of Level 3 cache misses relative to total instructions	(L3 misses / total instructions ) * 100

A high miss rate signifies inefficient cache usage and can result in significant performance penalties due to the increased latency of accessing lower memory levels. However, the impact of miss rate depends on the workload; compute-bound tasks may tolerate higher miss rates than memory-bound tasks. Often the miss rate will depend on the nature of your algorithm, therefore you should focus on lowering cache miss ratio by improving cache reuse.

Miss Ratio

Cache miss ratio measures the proportion of memory requests that result in a miss, requiring data to be fetched from a higher memory level. It is calculated as the ratio of misses to total memory requests for that cache level.

Metric	Description	Formula
L2 miss ratio	Proportion of Level 2 cache misses relative to total L2 memory requests	L2 misses / total L2 requests
L3 miss ratio	Proportion of Level 3 cache misses relative to total L3 memory requests	L3 misses / total L3 requests

A low miss ratio is desirable, as it signifies effective use of the cache hierarchy and minimizes costly data fetches from slower memory levels. While the miss rate may be influenced by workload or algorithm design, the miss ratio can often be improved by increasing data locality and enhancing cache reuse. Strategies such as loop optimizations, blocking, and cache-aware algorithms can help reduce the miss ratio, improving overall system performance.

Request Rate

Request rate measures the intensity of data accesses relative to the total instructions executed. It indicates how frequently memory requests are made to specific cache levels (L2 or L3) per instruction.

Metric [%]	Description	Formula
L2 request rate	Proportion of Level 2 cache misses relative to total L2 memory requests	(L2 requests / total L2 requests ) * 100
L3 request rate	Proportion of Level 3 cache misses relative to total L3 memory requests	(L3 requests / total L3 requests ) * 100

The request rate is a valuable indicator of an application’s memory access behavior - a high request rate indicates frequent memory access relative to computation, which may suggest a memory-bound workload. A high L2 or L3 request rate can be the result of significant data dependencies, potentially leading to cache contention or stalls. While the request rate often depends on workload characteristics and algorithm design, reducing unnecessary memory accesses through data locality optimizations and cache-aware programming can improve performance. Balancing memory requests and computation is key to achieving efficient application execution.

L1I Rates

L1I rates measure various aspects of the Level 1 instruction cache's performance, including miss rate, request rate, and stall rate. These metrics provide insights into the efficiency of instruction fetching and the impact of instruction cache behavior on execution. The miss rate indicates the cache’s effectiveness in serving instruction fetches, the request rate shows the workload's demand for instruction fetching, and the stall rate quantifies the performance penalty due to cache misses.

Metric [%]	Description	Formula
L1I miss rate	Proportion of Level 2 cache misses relative to total L2 memory requests	(L2 requests / total L2 requests ) * 100
L1I request rate	Proportion of Level 3 cache misses relative to total L3 memory requests	(L3 requests / total L3 requests ) * 100
L1I stall rate	Proportion of Level 3 cache misses relative to total L3 memory requests	(L3 requests / total L3 requests ) * 100

Efficient use of the L1 instruction cache is critical for maintaining high CPU performance. High miss rates and stall rates can significantly degrade execution speed, especially in workloads with frequent instruction fetches. While some aspects of L1I performance are algorithm-dependent, strategies such as improving code locality, reducing branching, and optimizing instruction footprint can help lower miss rates and reduce stalls, ensuring more efficient use of the instruction cache.

L1I Miss Ratio

The L1I miss ratio provides insight into the efficiency of the L1 instruction cache in serving instruction fetch requests. Minimizing the L1I miss ratio is essential for maintaining fast and efficient instruction execution. While some degree of instruction cache misses is algorithm-dependent, strategies like improving instruction locality and reducing unnecessary code branching can help lower the miss ratio. Analyzing this metric alongside L1I miss rates and stall rates gives a comprehensive view of instruction cache behavior and its impact on performance.

Metric	Description	Formula
L1I miss ratio	Represents the proportion of instruction fetch requests that required data to be fetched from a higher cache level or memory	L1I misses / total L1I requests

Edit this Page on GitHub