Lab: Memory Allocator

Assigned:: Tuesday, Feb 14, 2017
Due:: Tuesday, Feb 21, 2017 by 10:30pm (extended)
Collaboration:: Work with your assigned partner for this lab. You may use your classmates and their code as a resource, but please cite them. Sharing of complete or nearly-complete answers is not permitted. If you do not know whether it is acceptable use a specific resource you should ask.

Lab: Memory Allocator

Assigned:: Tuesday, Feb 14, 2017
Due:: Tuesday, Feb 21, 2017 by 10:30pm (extended)
Collaboration:: Work with your assigned partner for this lab. You may use your classmates and their code as a resource, but please cite them. Sharing of complete or nearly-complete answers is not permitted. If you do not know whether it is acceptable use a specific resource you should ask.

For this lab, you will write a simple implementation of the malloc and free functions. This simple memory allocation library will give you a much deeper understanding of pointers and memory management. Building an allocator can be a bit painful: odds are your favorite C functions call a memory allocator, which means you can’t use them inside your allocator or you’ll end up with never-ending recursion.

Please read the following instructions carefully. There are quite a few important details in the implementation of this lab, and getting them wrong can lead to some very difficult-to-diagnose errors.

Groups

Uzo, Clara, and Saung
JaeEun and Yazan
Anna and Black
Ajuna and Seth
Adam and Yuxi
Chiara and Maddie
Connor and Alex M.
Giang and Gemma
Jonah and Matt
Zachary and David
Rachel and Mattori
Kathryn and Mari
Larry and Alex F.
Niko and Marcel

Getting Started

The starter code at this lab is available on GitHub at grinnell-cs/213-bibop-allocator. The starter code includes a simple, somewhat-functional heap implementation in allocator.c, along with a wrapper that allows you to replace the default allocator with your custom allocator when you run a program.

The “somewhat-functional” heap uses the mmap system call to request memory from the operating system every time malloc is called. This is a terrible allocator! The mmap system call returns pages of memory to the program, meaning we’re requesting at least 4096 bytes of memory for every allocation, even if we only allocate an int. In addition to wasting an enormous amount of memory, free is not implemented so we never actually reclaim this space.

While the starter code implements a very bad allocator, it does work for some programs. Most notably, it does not work for programs that call malloc often enough that the operating system runs out of space. You can run some simple utities like ls or editors like vim if you don’t open a large file.

To run a program with your allocator instead of the default allocator, you will need to use the LD_PRELOAD environment variable. This is a special environment variable that ld, the dynamic linker (linker dynamic?), looks at before starting a program. Any libraries listed in this environment variable will be loaded before system libraries. That means any functions you implement in a preloaded library will replace the default implementation. There are some tricks to doing this correctly with a memory allocator (for one, the dynamic linker allocates memory, which creates some odd circular dependencies) so I have included a wrapper from a system called heap layers that takes care of most of this for you. You just need to implement functions called xxmalloc, xxfree, and xxmalloc_usable_size in allocator.c. Leave the files wrapper.h and gnuwrapper.cpp alone!

To run a program with your custom allocator, use a command line like the following (omit the dollar sign. This is how I show the shell prompt):

$ LD_PRELOAD=./myallocator.so ls

This assumes you’re running the command from the root of your project directory. Everything should run as it usually does. To test your allocator, you should break something. Go to the xxmalloc function inside allocator.c and add a line return (void*)-1; to the beginning of the function. After running make, test your allocator again. This time, the ls command should fail, most likely with a segmentation fault. Remove this intentional error now so it doesn’t cause problems later!

You can also run more complex programs like vim:

$ LD_PRELOAD=./myallocator.so vim

The program will probably start up okay, but if you open a large source file you’ll probably run out of memory and vim will crash. We’ll fix that later.

Speaking of crashes, you may find it useful to run programs with your allocator inside a debugger. To do this, you have to start things a little differently. This session starts the program ls with your custom allocator inside of gdb.

$ gdb ls
Lots of gdb startup messages go here
> set environment LD_PRELOAD=./myallocator.so
> run

If you want to start the program with command line options, they go immediately after the run command (on the same line).

In case your gdb skills are feeling a bit rusty, the cprogramming.com guide covers most of the essentials. I am also happy to walk you through some basic debugging techniques I use on a regular basis.

Implementation Overview

Starting with our Very Bad Allocator™, you will implement what is often called a BIBOP allocator. BIBOP, short for “Big Bag of Pages”, is an old and well-known technique for allocating memory that uses a size-segregated heap. The job of a memory allocator is to request large blocks of memory from the OS, then parcel out smaller pieces of that memory to programs that request it. You could imagine an allocator that cleverly splits large blocks into smaller pieces, called allocated objects, when needed. A clever allocator could then coalesce adjacent freed objects into larger free objects so they can be used to satisfy larger requests in the future. While this approach has its place, it’s often not worth the effort. A size-segregated designates each large block of memory to hold objects of a single size. If you need to allocate objects of a different size you have to get them from a different block. This simplifies the allocator, and actually has some nice performance advantages as well. For one, we can store the size of allocated objects once per block instead of next to every object, so there is less wasted space for metadata.

It’s easiest to describe the structure of a BIBOP allocator by explaining how it handles requests.

Allocating Memory

Our allocator enters the picture when the main program calls malloc(27). The code in gnuwrapper.cpp handles some boilerplate, and it then calls xxmalloc(27). We’re supposed to find a 27 byte space in memory and return it to the program. Because a BIBOP allocator dedicates an entire block to one size of object, we don’t want to keep blocks around for every odd object size; instead, our allocator rounds up to the next power of two. Some systems also require a minimum alignment for allocated memory of 16 bytes, so you should round any smaller requests up to 16.

Now we’re looking for 32 bytes of free memory. A BIBOP allocator keeps a separate list of blocks (those are the large chunks of memory from the OS) for each object size. We’ll have eight lists of blocks, one to hold blocks dedicated to 16 byte objects, one for 32 byte objects, one for 64 byte objects, and so on up to 2048 bytes. We’ll handle larger objects by calling mmap directly.

The beginning of each list of blocks is reachable through a pointer to the beginning of the block. Keeping an array of these pointers makes it easy to select the appropriate list; take the log2 of the object size, then subtract log2 of the minimum object size. This gives you the index of the appropriate block list. You can compute the log base two of a power of two integer using the __builtin_ctz function described on the GCC Builtins page. In fact, you can write all of the rounding and size manipulation rountines for this lab without loops or large numbers of if or switch/case statements. To receive full credit, you should use reasonable, concise implementations when dealing with array indices or rounding.

In our case, we use the block list at index 1. This pointer, if it isn’t NULL, points to the beginning of the first block designated to hold 32 byte objects.

If the size 32 block list is not `NULL`

At the beginnng of the first block in the block list, we have a block header. This structure should hold the size of the objects on this block, the next pointer that takes us to the next block, and the freelist pointer. The freelist pointer points to the first free object in this block, which is guaranteed to be 32 bytes because of how the heap is structured. We’re going to allocate this free block of memory, so we should save it to a temporary, look inside the object to find a pointer to the next free object, then use this next free object as the head of the free list. What is happening here is that we’ve used the free space to store a linked list to record all the free space our allocator currently has available. Yes, this is a little mind bending; I will draw some pictures of this on the board.

If the freelist was NULL, this block does not have any free objects so we should move on to the next one. If none of the available blocks have free objects, proceed on to the next step.

If the size 32 block list is `NULL`

If there are no size 32 blocks, we need to make one by asking the OS for memory. To do this, call mmap as in the provided starter code to allocate PAGE_SIZE bytes. Once you have a page of memory, you should put a block header at the beginning of this available memory. Break the remaining memory on the page into 32 byte chunks (or whichever size is appropriate for the block list you are currently looking in). For simplicity and correctness, you should divide the remaining space into 32 byte aligned chunks; in general, you should align these chunks to the block’s object size. If you don’t do this, your allocator may not work and freeing memory will be very difficult. Add each chunk (a free object) to the block’s freelist. You should use the available space inside each free object to store a linked list node.

Freeing Memory

At some point later, the program calls free. The code in gnuwrapper.cpp does some boilerplate again and calls along to xxfree. All we have at this point is a pointer to somewhere inside the object. We don’t know the size of the object, and we don’t know if the pointer is at the beginning of the object or somewhere in the middle. The first step is to find the size of the object.

Finding an object’s size

You will need to know an object’s size to free it, but you’ll also have to implement xxmalloc_usable_size for the allocator to work properly. You can call xxmalloc_usable_size from inside xxfree, so you might as well start out implementing this helper function. Block headers go at the beginnings of pages, and every page starts at a multiple of PAGE_SIZE. Because of this, you can round a pointer down to the next multiple of PAGE_SIZE to find its header. Once you have the header you can simply read out the object size from the block header.

Finding the beginning of an object

Now that we know the size of the object being freed, we need to figure out where it starts (the program may be freeing from an interior pointer). Because you carefully pareceled out free objects aligned to the object size (you did do that, right?) this just requires rounding down to the next multiple of the object size.

Finally, you have pointers the block header and the beginning of the object being freed, so you can just add it to the block’s free list. It’s usually best to reuse freed memory as soon as possible, so add the freed object to the front of the free list.

Part A: Working with Sizes

Before writing any code in your memory allocator, you should implement the computations to round object sizes to the next power of two and select the appropriate block list. This code is difficult to test inside of your allocator because you cannot call functions like printf when you are inside of xxmalloc. The only POSIX functions you are allowed to call are the async signal safe functions listed on this page. You can see that I have added a slight hack to print error messages (sometimes) when things go wrong. If you set use_emergency_block to true you can make exactly one call to a function like printf that may call malloc to display an error, but you should plan to quit the program after doing this because the heap will almost certainly be corrupted when printf frees this memory. It’s not a great way to debug your program, but sometimes our options are limited when we’re messing with the innards of the runtime or operating system.

I recommend implementing your rounding code in a separate test program to make sure it works, then move it into your allocator once you trust it.

Part B: Allocating from Blocks

Implement xxmalloc for objects up to 2048 bytes. You will need the code from part A to choose the appropriate block list, along with new code to allocate new blocks, initialize the new block’s freelist, and to take a free object off the freelist and return it to the program.

We don’t have any code to compute the size of an allocated object yet, but you can assume that any allocated memory is at least 16 bytes. Update xxmalloc_usable_size so it always returns 16. The gnuwrapper.cpp file defines realloc, which calls xxmalloc_usable_size to check if an object can be resized. Once you’ve updated xxmalloc_usable_size to return the conservative lower bound of 16 you should still be able to run some simple programs, even though freeing does not work yet.

If you test a program and it crashes, check to see if xxmalloc is being asked to allocate more than 2048 bytes at a time (try a gdb breakpoint or use the emergency error message hack included in xxmalloc). If so, you may have to skip ahead to part D before you can test your allocator; many large object allocations happen as part of program startup and can be difficult to avoid.

Part C: Freeing Memory

Implement the size calculation and freeing process described above. You should have a complete implementation of xxmalloc_usable_size after this step. You may want to implement and test your rounding down procedure in a separate program where you can print test results, instead of testing it live in your allocator.

Part D: Large Objects

We don’t want to deal with large objects using our BIBOP structure, so we’ll just fall back to mmap. For any object larger than 2048 bytes, just round the mmap request up to a multiple of PAGE_SIZE and return the result to the program.

One difficult issue with large objects is in the free implementation. Pages that hold large objects do not have a block header, but we may go looking for one. A standard technique to determine whether a page has a block header is to add a magic number to the beginning of each BIBOP block. Add a field to your block header struct and fill it in with a recognizable constant (I particularly like 0xD00FCA75 or 0xF00DFACE). In your size computation, check for this constant in the block header before returning a value. If it isn’t there, you know you are looking at a large object instead of a BIBOP block.

If a large object is passed to xxfree you can simply disregard it.

Extra Credit Opportunity

If you would like to earn some extra credit, you can implement a special large object header that stores the size of the allocated object. You may need to go looking backwards for this header because large objects could span multiple pages. To mark the beginning of a large object, use a different magic number. Once you’ve found this point, you can free the large object back to the OS by calling munmap.

There are other, better strategies for supporting large objects, like reserving a dedicated large object area. If you want to attempt one of these alternative strategies I can help you plan a reasonable implementation.

Part E: Evaluation

Your final submitted allocator should work on simple command line utilities, as well as with gcc, vim, top, and the lynx command line web browser. In addition to these real programs, I will run my own tests to make sure your allocator is well-behaved, so you may want to write some simple tests yourself. A few important properties to test are:

Memory that has been allocated but not freed should not be returned from malloc.
Freed memory should be reused at some point, ideally as soon as possible (memory for freed large objects does not need to be reused).
Allocated objects should not overlap.
The xxmalloc_usable_size function must return the actual available size in each allocated object.
Programs must be able to allocate and free a mix of small and large objects without violating any of these conditions.

If you test with other programs and run into problems, let me know. Many programs like Google Chrome use threads, which will almost certainly cause problems for your allocator. We’ll learn how to deal with parallel programs that access data structures in our next few labs.

As usual, once your lab is complete you should submit your work with a pull request.

Lab: Memory Allocator

Lab: Memory Allocator

Groups

Getting Started

Implementation Overview

Allocating Memory

If the size 32 block list is not NULL

If the size 32 block list is NULL

Freeing Memory

Finding an object’s size

Finding the beginning of an object

Part A: Working with Sizes

Part B: Allocating from Blocks

Part C: Freeing Memory

Part D: Large Objects

Extra Credit Opportunity

Part E: Evaluation

If the size 32 block list is not `NULL`

If the size 32 block list is `NULL`