Memory system design and optimizations for data intensive computing

Date

2020-12

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Emerging data-intensive applications, such as graph analytics and data mining, exhibit tremendous datasets and irregular memory access patterns. Research has shown that these memory-bound applications are unable to effectively leverage the principles of data locality and regular memory accesses within traditional cache-based memory systems to mitigate the “memory-wall” issue. The expansion of data volume has simultaneously driven a transition from monolithic architectures towards systems integrated with discrete and distributed nodes in large-scale computing systems. As such, multi-layered software infrastructures have become essential to bridge the gap between heterogeneous commodity devices. However, utilization of inter-node memory operations with divergent interfaces inevitably leads to redundant latency and performance degradation. Furthermore, frequent one-sided remote memory accesses, as employed by modern distributed shared memory programming models (i.e., OpenSHMEM, MPI-RMA, etc.), that bypass the operating system and remote CPU also expose security vulnerabilities. Existing Trusted Execution Environments (TEEs), or enclave systems, such as ARM TrustZone, Intel SGX, RISC-V Keystone, etc., provide local memory isolation. Unfortunately, these solutions do not provide the same protection for inter-node memory transactions.

In this dissertation research, we first explore the in-memory optimizations based on the 3D-stacked memory devices to enhance the performance of the data-intensive workloads. We design a memory hotspot-aware manager (HAM) that provides in-memory request aggregations and hotspot prefetching. We then optimize the performance of intra-node memory systems by introducing a memory access coalescer (MAC) along with associated new architectures that utilize 3D-stacked memory devices such as the Hybrid Memory Cube (HMC) and High Bandwidth Memory (HBM). In addition to local memory systems, we also introduce a global address space extension to the RISC-V instruction set architecture (ISA) that enables high-performance inter-node memory systems. We designate this RISC-V ISA extension as the Extended Base Global Address Space, or xBGAS. The xBGAS extension provides native ISA-level support for direct access to remote shared data blocks by mapping remote objects into a system's extended address space. Finally, we introduce a scalable enclave design based on the marriage of the xBGAS infrastructure and Keystone, the open-source, secure enclave for RISC-V systems. We show the design of the proposed scalable enclave system and compare with existing works. We also analyze potential threat models and discuss how our design can defend against these threats.

The in-memory optimizations using HAM and local memory system design with MAC exploit the inherently large throughput and memory-level parallelism (MLP) of 3D-stacked memory devices to meet the needs of bandwidth-driven applications. Given the irregular memory access patterns of data-intensive workloads, our intra-node memory system optimizations provide an increased potential for hiding the memory access latency by overlapping computation and communication, rather than through latency reductions resulting from cache prefetching. Orthogonally, xBGAS provides a scalable inter-node memory system for high-performance remote memory accesses. As a result, it holds great promise for the optimization of data-intensive applications that feature huge datasets distributed over discrete nodes. Herein, the ISA-level inter-node data operations may increase the injection rate of remote requests, as well as network bandwidth utilization, in large-scale computing systems without introducing the burdensome software infrastructures necessary to bridge various heterogeneous devices. Moreover, the object-based data management model of xBGAS renders it ideally suited to datacenter-scale RISC-V servers and future exascale computing systems. Furthermore, by mapping data object IDs into the extended address space, inter-node data protections are simplified through the use of unified memory accesses where permission bits are utilized for access control to each data object. In addition, the low-latency remote data operations of xBGAS also offset the time cost of hardware permission checking processes. This helps alleviate performance degradations associated with the physical memory isolation of enclave systems in inter-node environments.


Embargo status: Restricted until January 2026. To request access, click on the PDF link to the left.

Description

Rights

Rights Availability

Restricted until January 2026.

Keywords

Memory system, Three dimensional (3-D) stacked memory, Data-intensive computing, Distributed shared memory, Partitioned global address space (PGAS), Extended Base Global Address Space (xBGAS)

Citation