X hits on this document

PDF document

Scalable Locality-Conscious Multithreaded Memory Allocation - page 4 / 11





4 / 11

Thread 1

Object size classes


5-8 9-12 13-16



Active Head

Page blk 1

Page blk 2


Page blk k

malloc/free Thread n








Remotely Freed



Figure 1. Streamflow front-end design. Figure (a) is an overview of a heap, and Figure (b) is the detail for a particular page block within that heap.

serve the same object class. A simple page block rotation strategy guarantees that if there is enough free memory for the allocation of a specific object class, a page block with available memory will be found at the head of the list. More specifically, when a page block becomes full, it is transferred to the end of the list. Similarly, when an object is freed by the owner of the heap, the page block it belongs to is placed at the head of the list, if it is not already there. The block rotation is a fast operation involving exactly seven pointer updates and no atomic instructions.

to free memory inside the page block (freed and unallocated), iii) An identifier of the owner-thread of the page block (id), iv) The head of a LIFO list used for object deallocations to the page block by threads other than the owner-thread (remotely freed), and v) bookkeeping information, such as the number of free objects in the page block and the size of each object. All the fields in the header, with the exception of remotely freed, are accessed only by the page block owner-thread, thus accesses and modifications of these fields require no synchronization.

Page blocks are always page aligned. Their sizes vary, depend- ing on the object class they serve. As a rule of thumb, each page block in Streamflow is large enough to accommodate 1024 objects, however minimum/maximum page block size limitations also ap- ply. There is clearly a trade-off between the number of objects in each page block and the average amount of unused memory a page block may contain. The minimum page block size (16 KB in Streamflow) allows more than 1024 very small objects to be packed inside a single page block, given that the size of the re- sulting page blocks is also small and the additional memory con- sumption is not a concern. This amortizes costly heap expansion operations among more object allocations. On the other hand, the maximum page block size (256 KB in our implementation) limits the memory requirements for page blocks which serve relatively large object classes. Without a limit, page blocks for large objects could otherwise grow up to 2 MB. This limit reduces internal allo- cator fragmentation, which is the amount of memory reserved from the system, yet never used inside each page block. The resulting page block size is always rounded to the nearest power of two.

The beginning of each page block is occupied by the page block header. The header consists of all the data structures and bookkeeping information necessary for the management of the page block. It contains: i) Pointers for linking the page block to the doubly-linked list of page blocks for each object class, ii) Pointers

Headerless objects:

When an object is freed, a memory allocator

needs to discover whether the object is large or small as well as its size and—if the object is small—the exact page block it originated from. A common technique is to attach a header to each object and encode the necessary information in the header. Some architectures impose an 8-bytes alignment requirement for certain data types, or accesses to these data types suffer significant performance penal- ties. This limits the minimum memory required for headers to 8 bytes and the minimum object granularity supported by the alloca- tor to 16 bytes (including the header). As a result, the use of headers introduces two serious side-effects: a) Significant space overhead, which can reach up to 300% (12 bytes of overhead for every 4-bytes object), and b) less objects can be packed in a single cache line or

a single page, thus hurting spatial locality.

Streamflow eliminates headers from small objects using the BI- BOP technique [24]. We introduce a global table with one byte for each virtual memory page in the system. Accesses to the table can simply be indexed by the page starting address. A single bit of each table cell characterizes objects allocated in the specific page as small or large. If the object is small, the remaining 7 bits are used to encode the disposition—in pages—of the header of the parent page block. This encoding is sufficient for realistic page block sizes (up to 512 KB, considering a page size of 4 KB). As soon as the header of the parent page block is available, the memory manager has all

Document info
Document views45
Page views45
Page last viewedWed Jan 18 12:32:28 UTC 2017