Lab: Memory Allocator

Assigned
  • April 20, 2021
Due
  • April 27, 2021 by 11:59pm
Collaboration
    All labs in this class should be completed with your assigned group. You may ask for assistance from the instructor or mentor, but you may not discuss any aspect of your work on the lab with other students in the class unless they are in your group.
Submitting
    To submit this lab, run make zip in the VSCode terminal to create a zip archive of your work. Log in to Gradescope at https://gradescope.com and upload it to the Memory Allocator lab on Gradescope. You can resubmit as many times as you like before the deadline. Only one member of the group should submit the lab, but make sure to select all group members on Gradescope. Make sure you read and update group.txt before submitting your work for this lab.

Overview

For this lab, you will implement a full-featured memory allocator. That means you will be implementing malloc and free. This will give you a much deeper understanding of pointers and memory management, and hopefully eliminate some of the mystery around what happens inside of libc when you call these useful functions.

Building an allocator can be a bit painful: odds are your favorite C functions call a memory allocator, which means you can’t use them inside your allocator or you’ll end up with never-ending recursion. But this is also a fun challenge; building software in a limited environment to provide services to other programs is the essence of operating system implementation, and will help you fully appreciate the challenge and importance of careful system building.

Please read the following instructions carefully. There are quite a few important details in the implementation of this lab, and getting them wrong can lead to some very difficult-to-diagnose errors.

Getting Started

To obtain a copy of the starter code, one member of your lab group should perform the following steps:

  1. Open https://gpu.cs.grinnell.edu and log in with your Grinnell account. You will need to approve your login with Duo or enter in a one-time-password if you do not use Duo.
  2. Start an XFCE session. If you see a blank page or some other error for the session, close the tab and then click on your session again to re-open it. This fixes most issues, but please ask for help if you run into persistent issues.
  3. Open a terminal window in the XFCE Session.
  4. Use the git command to check out a copy of the starter code for the lab:
    $ git clone /home/curtsinger/csc213/labs/malloc ~/csc213/labs/
    
  5. And now you can use the code command to open the starter code with Visual Studio Code.
    $ code ~/csc213/labs/malloc
    
  6. A Visual Studio Code window should appear with the malloc directory open in the file browser. You may see a welcome message, which you can close. You can also close any prompts to upgrade to a new version of VSCode.
  7. Open a terminal inside of VSCode using the Terminal menu. By default, terminals appear on the bottom of the window. I find it more conveninent to move it to the right side; just right-click somewhere near the top of the panel that appears and choose Move Panel Right.
  8. Now you can run make in the terminal to build the starter code, or just type ctrl+shift+b to run the default build task (which just runs make).

Now you can use the share button at the top bar of your remote session to share with your lab partner. If you have any trouble logging in, sharing your session, or with any other part of the startup process please ask for help.

The “somewhat-functional” heap uses the mmap system call to request memory from the operating system every time malloc is called. This is a terrible allocator! The mmap system call returns pages of memory to the program, meaning we’re requesting at least 4096 bytes of memory for every allocation, even if we only allocate an int. In addition to wasting an enormous amount of memory, free is not implemented so we never actually reclaim this space.

While the starter code implements a very bad allocator, it does work for some programs. Most notably, it does not work for programs that call malloc often enough that the operating system runs out of space. You can run some simple utities like ls or editors like vim if you don’t open a large file.

To run a program with your allocator instead of the default allocator, you will need to use the LD_PRELOAD environment variable. This is a special environment variable that ld, the dynamic linker (linker dynamic?), looks at before starting a program. Any libraries listed in this environment variable will be loaded before system libraries. That means any functions you implement in a preloaded library will replace the default implementation. There are some tricks to doing this correctly with a memory allocator (for one, the dynamic linker allocates memory, which creates some odd circular dependencies) so I have included a wrapper from a system called heap layers that takes care of most of this for you. You just need to implement functions called xxmalloc, xxfree, and xxmalloc_usable_size in allocator.c. Leave the files wrapper.h and gnuwrapper.cpp alone!

To run a program with your custom allocator, use a command line like the following example:

$ LD_PRELOAD=./myallocator.so ls
[Running with custom allocator]
allocator.c heaplayers Makefile myallocator.so obj

As you can see, the above example includes the message [Running with custom allocator]. This is included with the library to confirm that you are actually running with the allocator; if you mistype the library path you may accidentally run a program without your allocator, which means you could end up testing the system allocator instead of your implementation.

You’ll eventually be able to run more complex programs but with this naive implementation of malloc the program quickly exhausts the memory the OS is willing to give it. The usual programs we would use to illustrate these limitations are not available on repl.it, so you’ll have to take my word for it this time.

Debugging Malloc

You may find it useful to run programs with your allocator inside a debugger. To do this, you have to start things a little differently. This session starts the program ls with your custom allocator inside of gdb.

$ gdb ls
...
Lots of gdb startup messages will appear here
...
> set environment LD_PRELOAD=./myallocator.so
> run

If you want to start the program with command line options, they go immediately after the run command (on the same line).

Implementation Overview

Starting with our very bad allocator, you will implement what is often called a BiBoP allocator. BiBoP, short for “Big Bag of Pages”, is an old and well-known technique for allocating memory that uses a size-segregated heap. The job of a memory allocator is to request large blocks of memory from the OS, then parcel out smaller pieces of that memory to programs that request it.

You could imagine an allocator that cleverly splits large blocks into smaller pieces, called allocated objects, when needed. A clever allocator could then coalesce adjacent freed objects into larger free objects so they can be used to satisfy larger requests in the future. While this approach has its place, it’s complicated, and has some significant downsides. For one, all that “cleverness” actually requires work every time a user calls malloc or free, so the complexity has a cost.

We’ll make our lives easier by grouping objects of the same size together into blocks. Our allocator will request big blocks of memory from the OS, and then divide the blocks up to hold a single specific object size. If you need to allocate objects of a different size you have to get them from a different block. This simplifies the allocator, and actually has some nice performance advantages as well. For one, we can store the size of allocated objects once per block instead of next to every object, so there is less wasted space for metadata.

It’s easiest to describe the structure of a BiBoP allocator by explaining how it handles requests.

Allocating Memory

For our example, consider a simple program that calls malloc(sizeof(char) * 27). The malloc call is routed through some wrapper code provided in the lab, but eventually finds its way to the xxmalloc function you’ve implemented. At this point, the allocator must find a spot in memory that is big enough to hold at least 27 bytes.

Our allocator design dedicates an entire block of memory to objects of one size. We could create a block specifically for 27-byte objects, but there probably won’t be many. Instead, we’ll create a fixed set of sizes and round the malloc request up to the first size that’s at least 27 bytes. We call each of the sizes our allocator supports as a “size class”. If the allocator has a 32-byte size class, our 27-byte malloc call would be satisfied by returning an object from the 32-byte size class (assuming there isn’t some other smaller size class that’s still at least 27 bytes).

There are some requirements for memory returned from malloc, which will influence our choice of size classes. The most important requirement for our allocator is alignment; Any pointer returned from malloc must be aligned to a 16-byte boundary on some systems, while others require eight-byte alignment. To make our allocator more portable, we’ll use 16-byte alignment. That means we have to make sure every small piece of memory we return occurs at an address that is a multiple of sixteen. Note that an address aligned to a multiple of 16 will have zeros in its four rightmost bits.

Since we will align each object to a 16-byte boundary, we should use 16 bytes as our smallest size class. That means any call to malloc that asks for less than 16 bytes will be rounded up to 16 bytes. If we had a smaller size class (i.e. eight bytes) we’d have to add eight bytes of padding after each object to satisfy the alignment constraint, so this size class would be pointless. Requests for more than 16 bytes of memory must also return 16-byte aligned addresses, so we need to choose size classes that make this alignment possible. The easiest choice is to use powers of two as our size classes. Consider the following example calls:

  • malloc(5) should be rounded up to a request for 16 bytes
  • malloc(16) should allocate 16 bytes
  • malloc(17) should be rounded up to a request for 32 bytes
  • malloc(48) should be rounded up to a request for 64 bytes
  • malloc(64) should allocate 64 bytes

Since our example request was for 27 bytes, the allocator will need to round that up to 32 bytes. Now that we know which size class we are looking for, we need to find some memory to return to the program. There are two places we can get this memory: from the operating system, or from a freelist. We’ll look at the freelist first.

Allocating from the Freelist

The freelist is a linked list of free memory locations. Because our allocator uses a fixed set of size classes, we’ll keep a separate freelist for each size; that makes it easy to find a free object for a given malloc call. Your allocator will have a freelist for each power-of-two size class, from 16 up to 2048 (we'll talk about larger objects later). The allocator only looks at the freelist for the size of the current request; a call to malloc(27)` should look in the freelist for 32-byte objects, but not other freelists. If there is at least one free 32-byte object on the freelist, the allocator should remove it from the front of the list and return it to the program.

The freelist is a linked list, but you may recall from earlier assignments and labs that linked lists require allocating memory. We’re writing an allocator, so calling malloc and free inside the allocator is would lead to infinite recursion. The key insight from our reading for this lab is that the freelist is used to track free memory, so we can use the actual free memory itself to store our linked list nodes. This technique, known as an embedded freelist, can be a bit difficult to grasp; we’ll spend some time in class discussing this technique. The basic ideas is that the address of the free memory itself is the value we are storing, and in that memory we can store a pointer to the next free object of the same size, or NULL if this is the last object in the freelist.

Requesting Memory from the OS

Returning to our example request for 27 bytes, what should the allocator do if it looks in the 32-byte freelist and finds that there are no free objects? There are two options: the allocator could return a larger object if one is available, or it could ask the OS for more memory; we’ll choose the second option.

So, if malloc looks in the freelist for 32-byte objects and finds that it is empty, we need to get more memory from the OS. The allocator will call mmap to request a larger block of memory. For our implementation, we’ll make blocks a single page, which is 4096 bytes on the computers we use for this class. That size is defined in the PAGE_SIZE constant in your starter code. Once the allocator has received a page of memory from the OS, it can divide it up into object-sizes chunks.

We’re working on a 32-byte allocation, so we can divide the 4096-byte page into 128 (4096 divided by 32) objects. But before we do that, we need to think ahead a bit; at some point, the program is going to pass the allocated pointer to free. At that point, we’ll need to know how big the object is so we know which freelist to put it in. We need to store some information about the allocated memory in a place where we can find it later.

All the objects we allocate will appear on pages, and the OS ensures that pages start at addresses that are a multiple of 4096; that means the last twelve bits of every page’s starting address will be zeros. So, given any address in a page, we can quickly find the beginning of the page. And because the entire page holds objects of the same size, we can keep information about the objects on that page in a page header.

For our example allocation of 27 bytes, we’re going to divide a page up into 32-byte objects. Instead of putting all 128 objects on the 32-byte freelist, we’ll reserve the first object at the beginning of the page for a header. In that header, we’ll store two things:

  1. A magic number that we can use to verify that a page came from this allocator.
  2. The size of all the objects on this page.

The magic number should be some value you can check to verify that you are looking at a real block from the heap. That way if a program ever passes a pointer to free that didn’t come from this allocator we can ignore it; it won’t appear on a page with our magic number, so we’ll know it’s not something we should free. The size is just that; an integer that reports the size of the objects on the block. Both fields should go at the beginning of the block, and must be written immediately after requesting a block of memory from the OS.

After placing the header at the front of the block, the allocator can divide the remaining memory up into free objects. You will need to skip over the first free object, since it overlaps the block header. The remaining space on the block is still usable, so you can walk through the block and add each object to the appropriate freelist. If you are dividing a block for 32-byte objects, those objects should begin 32, 64, 96, 128, … bytes from the start of the block. For a block that holds 128-byte objects, the objects should start at 128, 256, 384, 512, … bytes from the start of the block. Aligning objects to multiples of the object size is a requirement for this lab, and makes it much easier to implement free later on.

Once the allocator has divided the block into free objects and added them to the freelist, we are done with the block. There should now be objects on the 32-byte freelist, so your malloc function can return the first one.

Freeing Memory

At some point later, the program calls free(p). There is some wrapper code, but eventually our xxfree function will be called. All we have at this point is a pointer to somewhere inside an object. Your implementation of free will need to determine the size of the object, find its starting address, and add it to the appropriate freelist. We’ll revisit freeing in much more detail later.

Part A: Working with Sizes

Before writing any code in your memory allocator, you should implement the computations to round object sizes to the next power of two and select the appropriate freelist. The starter code includes a ROUND_UP macro, but you cannot use this macro to round to powers of two! The provided macro ROUND_UP(x, y) rounds x up to the next multiple of y; rounding to powers of two will require a different calculation.

This code is difficult to test inside of your allocator because you cannot call functions like printf when you are inside of xxmalloc. The only POSIX functions you are allowed to call are the async signal safe functions listed on this page. I’ve provided a log_message function you can use to safely print null-terminated strings without calling malloc or free. If you want to print formatted strings, you can use the snprintf function to produce a formatted string in a character array, then pass that to log_message to print it.

Becasue debugging inside your allocator is difficult, I recommend implementing your rounding code in a separate test program to make sure it works, then move it into your allocator once you trust it.

Warning: Rounding sizes is the number one cause of errors on this lab. Make sure your rounding code works correctly before moving on.

Part B: Allocating from Blocks

Implement xxmalloc for objects up to 2048 bytes. You’ll have eight size classes (16, 32, 64, 128, 256, 512, 1024, and 2048 bytes). You will need the code from part A to choose the appropriate freelist, along with new code to take an object off of a freelist, allocate new blocks, initialize the new block’s header, and add free objects from the block to the freelist.

We don’t have any code to compute the size of an allocated object yet, but you can assume that any allocated memory is at least 16 bytes. Update xxmalloc_usable_size so it always returns 16 (unless it is given a NULL pointer, in which case it should return zero). The gnuwrapper.cpp file defines realloc, which calls xxmalloc_usable_size to check if an object can be resized. Once you’ve updated xxmalloc_usable_size to return the conservative lower bound of 16 you should still be able to run some simple programs, even though freeing does not work yet.

If you test a program and it crashes, check to see if xxmalloc is being asked to allocate more than 2048 bytes at a time (gdb breakpoints will be helpful here). If so, you may have to skip ahead to part D before you can test your allocator; many large object allocations happen as part of program startup and can be difficult to avoid.

Part C: Freeing Memory

Now that you can allocate objects, it’s time to support freeing them. You will need to complete a full implementation of xxmalloc_usable_size to get the size of a freed object, then round to the start of the object and add it to the appropriate freelist.

When the function xxfree(void* ptr) is called, we don’t know if ptr points to the beginning of an allocated object, and we have no idea how big the object is. The first step is to find the size of the object.

Finding an object’s size

You will need to know an object’s size to free it, but you’ll also have to implement xxmalloc_usable_size for the allocator to work properly. You can call xxmalloc_usable_size from inside xxfree, so you should start by implementing this helper function. Recall that every block begins with a header, and that header stores the size of all objects in the block. Every block will begin at a multiple of PAGE_SIZE (4096 bytes), so you can simply round the pointer ptr down to the next lowest multiple of PAGE_SIZE.

The example code below shows how you can find the beginning of a page inside xxmalloc_usable_size:

size_t xxmalloc_usable_size(void* ptr) {
  // If ptr is NULL always return zero
  if (ptr == NULL) {
    return 0;
  }

  // Treat the freed pointer as an integer
  intptr_t free_address = (intptr_t)ptr;

  // Round down to the beginning of a page
  intptr_t page_start = free_address - (free_address % PAGE_SIZE);

  // Cast the page start address to a header struct
  page_header_t* header = (page_header_t*)page_start;

  ...

As you can see in the example above, we can construct a pointer to the page header by treating the freed pointer as an integer and rounding it down. This example assumes you’ve created a struct called page_header_t. That struct should hold the magic number and object size fields from earlier in the lab description.

Next, you’ll need to validate the magic number in the header; if it doesn’t match the expected magic number, the provided pointer didn’t come from our malloc implementation and xxmalloc_usable_size should return zero. Otherwise, return the size stored in the header.

Finding the beginning of an object

Now that you have an implementation of xxmalloc_usable_size, xxfree can quickly look up the size of an object beign freed. The next step is to figure out where it starts; the program that calls free could provide a pointer to the beginning of an allocated object, but it may pass in a pointer somewhere in the middle of an allocated obejct. Your implementation of free must work in either case. Because you carefully pareceled out free objects aligned to the object size (you did do that, right?) this just requires rounding down to the next multiple of the object size. You’ll have to figure out how to implement this, but you are welcome to borrow from the example provided for xxmalloc_usable_size.

Finally, you know the size of the object being freed, and you have a pointer to the start of that object. Now you can add it to the appropriate freelist. It’s usually best to reuse freed memory as soon as possible, so add the freed object to the front of the freelist.

Warning: Some programs will call free(NULL). In this case, your allocator should just return without doing anything. Accessing a block header for NULL will result in a segmentation fault, so you have to check for this case in xxfree and xxmalloc_usable_size.

Part D: Large Objects

One detail that all allocators need to work out is how to handle large objects. Our page structure can’t accommodate objects larger than 2048 bytes; we can only fit one of those on a page, and the next power of two is 4096, which leaves no room for the page header.

For our implementation, we’ll just fall back on mmap to satisfy larger allocation requests. The starter code should work as a basic large object allocator; it rounds the requested size up to a multiple of PAGE_SIZE, calls mmap, and returns the result.

One difficult issue with large objects is in the free implementation. Your allocator will not find a block header at the beginning of a large object, so free will do nothing. Just make sure your xxfree and xxmalloc_usable_size functions check for magic numbers; they should ignore objects that don’t have matching magic numbers. The end result will be that your program leaks large allocations.

Extra Challenge

If you’re looking for an extra challenge, you can try to implement a non-leaking large object allocator; one common trick is to use the allocator for smaller objects to build a data structure to track the large objects. For example, if you allocate a 8192-byte object in xxmalloc, it’s perfectly fine to call xxmalloc to request space for a 32-byte linked list node that you could use to track that large object.

Testing your Allocator

Your final submitted allocator should work on simple command line utilities, like these ones:

$ LD_PRELOAD=./myallocator.so ls
[Running with custom allocator]
allocator.c heaplayers Makefile myallocator.so obj
$ LD_PRELOAD=./myallocator.so whoami
[Running with custom allocator]
runner
$ LD_PRELOAD=./myallocator.so grep -R "malloc" .
[Running with custom allocator]
allocator.c:#include <malloc.h>
allocator.c:// The minimum size returned by malloc
allocator.c:// A utility logging function that definitely does not call malloc or free
allocator.c:void* xxmalloc(size_t size) {
...
$ LD_PRELOAD=./myallocator.so cat allocator.c
[Running with custom allocator]
#define _GNU_SOURCE

#include <assert.h>
#include <malloc.h>
#include <stdbool.h>
#include <stdint.h>
...
$ LD_PRELOAD=./myallocator.so md5sum allocator.c
[Running with custom allocator]
a7394a3c6991d4e4fc03b5bde6070f0d allocator.c

Once those commands are working (though your output for the last three will almost certainly be different), you can move on to some larger programs. The top command is a good test of your allocator, but others we typically use are not available on repl.it. I’ll let you know as I identify other good test programs that you can use.

In addition to testing your allocator with some simple programs, you can use the same testing script I will use to assign a portion of your grade on this lab. The test program is included in the malloc-test directory with your starter code. Run it like this:

$ LD_PRELOAD=./myallocator.so malloc-test/malloc-test
[Running with custom allocator]
BiBoP Allocator Test Results:

1. Is allocated memory writable?
  malloc(4) returned at least 4 bytes of writable memory. (+1 point)
  malloc(15) returned at least 15 bytes of writable memory. (+1 point)
...

The test program enforces some properties that are specific to this allocator design, so even the standard libc malloc will not receive full credit on this lab. The requirements for your allocator are:

  1. All allocated memory must be writable and readable
  2. Allocated memory up to 2048 bytes should be rounded up to a power of two size, with a minimum of 16 bytes. Allocations over 2048 bytes should be rounded up to the next multiple of 4096 bytes.
  3. Allocated objects up to 2048 bytes must be aligned to their power of two size. For example, malloc(24) should return a pointer to 32 bytes of memory that is aligned at an address that is a multiple of 32.
  4. Allocated memory should not overlap other allocated memory.
  5. Freed memory must be reused immediately. In other words, freeing an object should place it at the front of its size class’ freelist, and allocating a new object of that size should return the same pointer.
  6. Your allocator should return sufficient memory for large objects.

If you find that you are failing some of these tests, make sure your implementation of xxmalloc_usable_size is working correctly. This function is an important part of the testing process, so an incorrect implementation can cause many other tests to fail.