Memory

Memory metrics

Bandwidth

Memory bandwidth measures the rate at which data is transferred between the main memory (DRAM) and the CPU. This includes read, write, load, and eviction operations, and provides a detailed view of how efficiently the memory subsystem is utilized. It is a critical metric for identifying potential bottlenecks in memory-intensive applications.

Metric [e.g. GByte/s]	Description
total	Overall data transfer rate between the CPU and main memory, combining all read, write, load, and eviction operations
read	Rate of data read from main memory
write	Rate of data written to main memory

High utilization of total bandwidth, particularly in read or write operations, may indicate that the workload is memory-bound rather than compute-bound. While maximizing bandwidth is desirable, inefficiencies like excessive evictions or poor data reuse can still hinder performance. Optimizing memory access patterns, reducing unnecessary data transfers, and improving cache utilization can help ensure effective use of available memory bandwidth. Use the Roofline Model to check if you application is memory-bound.

Data Volume

Memory data volume measures the total amount of data transferred between the CPU and main memory (DRAM). This includes the total, read, and write data volumes, providing insights into the workload's memory access patterns and intensity.

Metric [e.g. GByte]	Description
total	Total amount of data transferred to and from main memory, including reads and writes
read	Total amount of data read from main memory
write	Total amount of data written to main memory

High data volume indicates workloads with significant memory demands, which may lead to memory bottlenecks if the system’s bandwidth is insufficient. However, high memory data volume is not inherently inefficient. Effective memory utilization depends on workload characteristics, data locality, and reuse. Reducing unnecessary memory transfers and optimizing data access patterns can help mitigate performance issues in memory-bound applications. This metric, in conjunction with bandwidth and cache performance, provides a comprehensive understanding of memory subsystem behavior.

DRAM Usage

DRAM usage measures the percentage of available main memory (DRAM) currently utilized by the system and applications. It includes metrics for both direct memory usage and swap usage when the system resorts to using disk-based memory.

Metric [%]	Description
usage	Percentage of DRAM capacity currently in use
swap usage	Percentage of swap space (secondary memory, such as disk) utilized when DRAM is insufficient to meet memory demands

High DRAM usage is typical for memory-intensive applications but can lead to performance issues if it forces reliance on swap space. Monitoring swap usage is crucial, as it often indicates insufficient physical memory or poorly optimized memory usage. Balancing memory allocation, optimizing data structures, and ensuring sufficient DRAM capacity are essential strategies for minimizing swap dependency and maintaining system performance.

DRAM Used

DRAM used measures the total amount of main memory (DRAM) currently utilized by the system, including allocated buffers and cached data. It provides a detailed breakdown of memory usage, helping identify how DRAM resources are allocated and consumed.

Metric	Description
used	Total amount of DRAM currently in use by the system and applications
swap used	Total amount of swap space utilized when DRAM is insufficient to meet memory demands
buffers	Memory allocated for system buffers
cached	Memory allocated for caching frequently accessed data to improve performance

High DRAM usage is common in memory-intensive applications but can lead to performance degradation if swap usage becomes significant. Monitoring the usage of buffers and cached memory is important for optimizing memory allocation and ensuring effective utilization of available DRAM. Strategies such as reducing unnecessary allocations, improving data locality, and increasing physical memory capacity can help maintain system performance and minimize swap dependency.

Load To Store Ratio

The load-to-store ratio measures the balance between memory load operations (data reads) and store operations (data writes) during program execution.

Metric	Description	Formula
load to store	Indicates how many memory read operations are performed relative to write operations	Number of Load Operations / Number of Store Operations

The load-to-store ratio is a useful indicator for understanding workload characteristics and memory behavior. It helps in identifying potential memory bottlenecks, as workloads with unbalanced load or store patterns may stress certain parts of the memory hierarchy. Optimizing the ratio depends on workload requirements but often involves improving data locality, minimizing redundant memory accesses, and balancing read/write operations to align with system capabilities.

HBM Bandwidth

HBM bandwidth measures the rate of data transfer between the CPU/GPU and High Bandwidth Memory (HBM). It includes separate metrics for read bandwidth, write bandwidth, and total bandwidth, providing a detailed view of HBM performance.

Metric [e.g. GByte/s]	Description
total	Combined rate of data reads and writes to and from HBM
read	Rate at which data is read from HBM
write	Rate at which data is written to HBMs

HBM bandwidth metrics are critical for understanding memory performance in systems leveraging HBM, especially for memory-intensive applications like AI, HPC, and graphics workloads. High bandwidth utilization typically indicates effective use of HBM; however, bottlenecks may arise if workloads exceed memory bandwidth limits or if data locality is poorly optimized.

HBM Volume

HBM volume measures the total amount of data transferred between the CPU/GPU and High Bandwidth Memory (HBM). It includes metrics for total, read, and write data volumes, giving insight into the workload's memory traffic.

Metric [e.g. GByte/s]	Description
total	Total amount of data transferred to and from HBM, combining both reads and writes
read	Total amount of data read from HBM
write	Total amount of data written to HBM

HBM volume metrics provide valuable insights into the memory footprint of workloads utilizing HBM. High read or write volumes suggest significant memory access requirements, which may impact performance if not well-optimized. While high data volumes are expected in memory-intensive workloads, optimizing data locality, reducing unnecessary transfers, and improving memory reuse can help maximize performance.

UPI Bandwidth

UPI bandwidth measures the rate of data transfer between CPUs or sockets using Intel's Ultra Path Interconnect (UPI). It tracks the total bandwidth, as well as data received and sent via UPI links.

Metric [e.g. GByte/s]	Description
total	Combined rate of all data transferred over UPI links, including both sent and received data
received	Rate of data received over UPI links
sent	Rate of data sent over UPI links

UPI bandwidth is a critical metric for understanding inter-socket communication efficiency in multi-CPU systems. High bandwidth utilization reflects frequent data transfers between sockets, which can lead to performance bottlenecks if not optimized. Strategies such as reducing inter-socket dependencies, improving data locality, and NUMA-aware programming can help mitigate these challenges. |

UPI Data Volume

UPI data volume measures the total amount of data transferred between CPUs or sockets over Intel’s Ultra Path Interconnect (UPI). This includes metrics for total data, as well as data received and sent via UPI links.

Metric [e.g. GByte]	Description
total	Total amount of data transferred across UPI links, combining both sent and received data
received	Total amount of data received over UPI links
sent	Total amount of data sent over UPI links

A high UPI data volume highlights significant inter-socket communication, which may impact performance due to increased latency and bandwidth limitations. While some workloads inherently require frequent inter-socket transfers, optimizing data placement and reducing inter-socket dependencies can help minimize UPI traffic. These metrics, when combined with UPI bandwidth and cache performance indicators, provide a comprehensive understanding of inter-CPU communication and opportunities for workload optimization.

Edit this Page on GitHub