xbat Logo XBAT
AboutDemo
CTRL+K
Megware logo

Memory

Memory metrics

Memory bandwidth measures the rate at which data is transferred between the main memory (DRAM) and the CPU. This includes read, write, load, and eviction operations, and provides a detailed view of how efficiently the memory subsystem is utilized. It is a critical metric for identifying potential bottlenecks in memory-intensive applications.

Metric [e.g. GByte/s]Description
totalOverall data transfer rate between the CPU and main memory, combining all read, write, load, and eviction operations
readRate of data read from main memory
writeRate of data written to main memory

High utilization of total bandwidth, particularly in read or write operations, may indicate that the workload is memory-bound rather than compute-bound. While maximizing bandwidth is desirable, inefficiencies like excessive evictions or poor data reuse can still hinder performance. Optimizing memory access patterns, reducing unnecessary data transfers, and improving cache utilization can help ensure effective use of available memory bandwidth. Use the Roofline Model to check if you application is memory-bound.

Memory data volume measures the total amount of data transferred between the CPU and main memory (DRAM). This includes the total, read, and write data volumes, providing insights into the workload's memory access patterns and intensity.

Metric [e.g. GByte]Description
totalTotal amount of data transferred to and from main memory, including reads and writes
readTotal amount of data read from main memory
writeTotal amount of data written to main memory

High data volume indicates workloads with significant memory demands, which may lead to memory bottlenecks if the system’s bandwidth is insufficient. However, high memory data volume is not inherently inefficient. Effective memory utilization depends on workload characteristics, data locality, and reuse. Reducing unnecessary memory transfers and optimizing data access patterns can help mitigate performance issues in memory-bound applications. This metric, in conjunction with bandwidth and cache performance, provides a comprehensive understanding of memory subsystem behavior.

DRAM usage measures the percentage of available main memory (DRAM) currently utilized by the system and applications. It includes metrics for both direct memory usage and swap usage when the system resorts to using disk-based memory.

Metric [%]Description
usagePercentage of DRAM capacity currently in use
swap usagePercentage of swap space (secondary memory, such as disk) utilized when DRAM is insufficient to meet memory demands

High DRAM usage is typical for memory-intensive applications but can lead to performance issues if it forces reliance on swap space. Monitoring swap usage is crucial, as it often indicates insufficient physical memory or poorly optimized memory usage. Balancing memory allocation, optimizing data structures, and ensuring sufficient DRAM capacity are essential strategies for minimizing swap dependency and maintaining system performance.

DRAM used measures the total amount of main memory (DRAM) currently utilized by the system, including allocated buffers and cached data. It provides a detailed breakdown of memory usage, helping identify how DRAM resources are allocated and consumed.

MetricDescription
usedTotal amount of DRAM currently in use by the system and applications
swap usedTotal amount of swap space utilized when DRAM is insufficient to meet memory demands
buffersMemory allocated for system buffers
cachedMemory allocated for caching frequently accessed data to improve performance

High DRAM usage is common in memory-intensive applications but can lead to performance degradation if swap usage becomes significant. Monitoring the usage of buffers and cached memory is important for optimizing memory allocation and ensuring effective utilization of available DRAM. Strategies such as reducing unnecessary allocations, improving data locality, and increasing physical memory capacity can help maintain system performance and minimize swap dependency.

The load-to-store ratio measures the balance between memory load operations (data reads) and store operations (data writes) during program execution.

MetricDescriptionFormula
load to storeIndicates how many memory read operations are performed relative to write operationsNumber of Load Operations / Number of Store Operations

The load-to-store ratio is a useful indicator for understanding workload characteristics and memory behavior. It helps in identifying potential memory bottlenecks, as workloads with unbalanced load or store patterns may stress certain parts of the memory hierarchy. Optimizing the ratio depends on workload requirements but often involves improving data locality, minimizing redundant memory accesses, and balancing read/write operations to align with system capabilities.

HBM bandwidth measures the rate of data transfer between the CPU/GPU and High Bandwidth Memory (HBM). It includes separate metrics for read bandwidth, write bandwidth, and total bandwidth, providing a detailed view of HBM performance.

Metric [e.g. GByte/s]Description
totalCombined rate of data reads and writes to and from HBM
readRate at which data is read from HBM
writeRate at which data is written to HBMs

HBM bandwidth metrics are critical for understanding memory performance in systems leveraging HBM, especially for memory-intensive applications like AI, HPC, and graphics workloads. High bandwidth utilization typically indicates effective use of HBM; however, bottlenecks may arise if workloads exceed memory bandwidth limits or if data locality is poorly optimized.

HBM volume measures the total amount of data transferred between the CPU/GPU and High Bandwidth Memory (HBM). It includes metrics for total, read, and write data volumes, giving insight into the workload's memory traffic.

Metric [e.g. GByte/s]Description
totalTotal amount of data transferred to and from HBM, combining both reads and writes
readTotal amount of data read from HBM
writeTotal amount of data written to HBM

HBM volume metrics provide valuable insights into the memory footprint of workloads utilizing HBM. High read or write volumes suggest significant memory access requirements, which may impact performance if not well-optimized. While high data volumes are expected in memory-intensive workloads, optimizing data locality, reducing unnecessary transfers, and improving memory reuse can help maximize performance.

UPI bandwidth measures the rate of data transfer between CPUs or sockets using Intel's Ultra Path Interconnect (UPI). It tracks the total bandwidth, as well as data received and sent via UPI links.

Metric [e.g. GByte/s]Description
totalCombined rate of all data transferred over UPI links, including both sent and received data
receivedRate of data received over UPI links
sentRate of data sent over UPI links

UPI bandwidth is a critical metric for understanding inter-socket communication efficiency in multi-CPU systems. High bandwidth utilization reflects frequent data transfers between sockets, which can lead to performance bottlenecks if not optimized. Strategies such as reducing inter-socket dependencies, improving data locality, and NUMA-aware programming can help mitigate these challenges. |

UPI data volume measures the total amount of data transferred between CPUs or sockets over Intel’s Ultra Path Interconnect (UPI). This includes metrics for total data, as well as data received and sent via UPI links.

Metric [e.g. GByte]Description
totalTotal amount of data transferred across UPI links, combining both sent and received data
receivedTotal amount of data received over UPI links
sentTotal amount of data sent over UPI links

A high UPI data volume highlights significant inter-socket communication, which may impact performance due to increased latency and bandwidth limitations. While some workloads inherently require frequent inter-socket transfers, optimizing data placement and reducing inter-socket dependencies can help minimize UPI traffic. These metrics, when combined with UPI bandwidth and cache performance indicators, provide a comprehensive understanding of inter-CPU communication and opportunities for workload optimization.

Edit this Page on GitHub