X hits on this document

PDF document

Scalable Locality-Conscious Multithreaded Memory Allocation - page 1 / 11





1 / 11

Scott Schneider

Christos D. Antonopoulos

Dimitrios S. Nikolopoulos

Department of Computer Science College of William and Mary

Department of Computer Science College of William and Mary

Department of Computer Science College of William and Mary




Scalable Locality-Conscious Multithreaded Memory Allocation


1. Introduction

We present Streamflow, a new multithreaded memory manager designed for low overhead, high-performance memory allocation while transparently favoring locality. Streamflow enables low over- head simultaneous allocation by multiple threads and adapts to se- quential allocation at speeds comparable to that of custom sequen- tial allocators. It favors the transparent exploitation of temporal and spatial object access locality, and reduces allocator-induced cache conflicts and false sharing, all using a unified design based on seg- regated heaps. Streamflow introduces an innovative design which uses only synchronization-free operations in the most common case of local allocations and deallocations, while requiring minimal, non-blocking synchronization in the less common case of remote deallocations. Spatial locality at the cache and page level is favored by eliminating small objects headers, reducing allocator-induced conflicts via contiguous allocation of page blocks in physical mem- ory, reducing allocator-induced false sharing by using segregated heaps and achieving better TLB performance and fewer page faults via the use of superpages. Combining these locality optimizations with the drastic reduction of synchronization and latency over- head allows Streamflow to perform comparably with optimized se- quential allocators and outperform—on a shared-memory system with four two-way SMT processors—four state-of-the-art multi- processor allocators by sizeable margins in our experiments. The allocation-intensive sequential and parallel benchmarks used in our experiments represent a variety of behaviors, including mostly lo- cal object allocation-deallocation patterns and producer-consumer allocation-deallocation patterns.

Categories and Subject Descriptors

D.4.2 [Operating Systems]:






















Threads; ming






General Terms

Algorithms, Management, Performance


memory management, multithreading,



ory, synchronization-free, non-blocking

Efficient dynamic memory allocation is essential for desktop, server and scientific applications [27]. As more of these appli- cations use thread-level parallelism to exploit multiprocessors and emerging processors with multiple cores and threads, scalable mul- tiprocessor memory allocation becomes of paramount importance.

Dynamic memory allocation can negatively affect performance by adding overhead during allocation and deallocation operations, and by exacerbating object access latency due to poor locality. Therefore, effective memory allocators need to be optimized for both low allocation overhead and good object access locality. Scal- ability and synchronization overhead reduction has been the cen- tral consideration in the context of thread-safe memory allocators [3, 18], while locality has been the focal point of the design of se- quential memory allocators for more than a decade [11].

Multiprocessor allocators add synchronization overhead on the critical path of all allocations and deallocations. Synchronization is needed because a thread may need to access another thread’s heap in order to remotely release an object to the owning thread. Since such operations may be initiated concurrently by multiple threads, synchronization is used to arbitrate thread accesses to the data structures used for managing the heaps. Therefore, local heaps need to be protected with locks or updated atomically with read- modify-write operations such as cmp&swap. The vast majority of thread-safe allocators use object headers [3, 9, 15, 18, 25], which facilitate object deallocation in local heaps but pollute the cache in codes that allocate many small objects.

Locality-conscious sequential allocators segregate objects of different sizes to different page blocks allocated from the operating system [7]. Objects are allocated by merely bumping a pointer and no additional information is stored with each object. In general, the allocation order of objects does not necessarily match their access pattern. However, contiguous allocation of small objects works well in practice because eliminating object headers helps avoid fragmentation and cache pollution.

Efficient, thread-safe memory allocators use local heaps to re- duce contention between threads. The use of local heaps helps a multiprocessor allocator avoid false sharing, since threads tend to allocate and deallocate most of their objects locally [3]. At a lower level, page block allocation and recycling policies in thread-safe allocators are primarily concerned with fragmentation and blowup, without necessarily accounting for locality [3].

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.


June 10–11, 2006, Ottawa, Ontario, Canada.

Copyright 2006 ACM 1-59593-221-6/06/0006. . . $5.00.

The design space of thread-safe allocators that achieve both good scalability and good data locality merits further investigation. It is natural to consider combining scalable synchronization mech- anisms (such as lock-free management of heaps) with locality- conscious object allocation mechanisms (such as segregated heaps with headerless objects). Although the two design considerations of locality and scalability may seem orthogonal and complementary at first glance, combining them in a unified design is not merely an engineering effort. Several problems and trade-off’s arise in an at-

Document info
Document views49
Page views49
Page last viewedSun Jan 22 15:10:48 UTC 2017