Lab: Meet x86

Assigned:: Wednesday, Oct 25, 2017
Due:: Monday, Oct 30, 2017 (by 5pm)
Collaboration:: Work with your assigned partner for this lab. You may use your classmates as a resource, but please cite them. Sharing of complete or nearly-complete answers is not permitted. If you do not know whether it is acceptable use a specific resource you should ask.

Overview

In this lab, you will learn a little more about the x86 instruction set by writing and examining programs written in x86 assembly. In this lab, we will use a few tools that may be somewhat unfamiliar; these include the GNU debugger, GNU assembler, and makefiles.

Groups

Greyson and Mari
Clara and Jimin
Hoang and Prabir
Andrew and Reilly
Nick and Aditi
Zachary and Ryan
John and Rojina
Matt and Bazil
Gemma and Faizaan
Kathryn and Dennis
Paps and An
Lex and Nripesh

Resources

The third reference is a complete list of all x86 instructions, which you probably will not need. If you are looking for an instruction to perform a particular operation, you can search on the page for matches in the description column. There may be many instructions with the same description, but these will differ in their operand types. Pay close attention to operands; x86 has quite a few funny quirks, and these show up mainly in operands.

x86 Calling Conventions

There are many tricky details in the x86 calling convention used on Linux, but for this lab you just need to know three things: which registers parameters go into, where return values go, and how to call functions like printf that take a variable number of parameters.

Parameters to functions are placed in registers in this order:

%rdi
%rsi
%rdx
%rcx
%r8
%r9

Return values go into the %rax register.

When you call a function like printf, which takes a variable number of parameters, you use the same calling convention with one added detail. You have to set the %al register to zero, which indicates that you are not passing any parameters in vector registers (a special kind of register for performing operations in parallel). You can see an example of this in the second hint for part C.

Other x86 Hints

Many parts of x86 assembly language are similar to MIPS, but there are some differences that will show up quickly. Here are few important ones:

Typically x86 operations take two parameters. The second parameter is both one of the inputs and the output. For example, the add %rax, %rbx instruction adds the values in %rax and %rbx and stores the result in %rbx.
x86 has built-in stack operations push and pop. The instruction push %rcx moves the stack down and copies the value in %rcx to the bottom of the stack. The instruction pop %rcx pulls the value off the bottom of the stack and stores it into %rcx, then moves the stack up.
Functions in x86 typically start and end the same way. First, you take the current frame pointer, the pointer to the top of the calling function’s stack frame, and push it to the stack (push %rbp). Next, you take the current stack pointer and make this the frame pointer (mov %rsp, %rbp). At the end of the function, you need to undo this process. Luckily, x86 has a handy leave instruction that undoes this, then you execute a ret instruction to return.
Some instructions in x86 require a suffix to indicate how large a value you should use. When you’re moving between registers this is implicit; the sizes of the registers must match, and you know the size based on the name. Other times you have to explicitly use a “q”, “l”, “s”, or “b” suffix for quad-word, long-word, short-word, or byte, respectively. These suffixes indicate that an 8 byte, 4 byte, 2 byte, or 1 byte operation should be performed.

Part A: Hello World!

We’ll start out by constructing a simple “Hello World” program with x86 assembly to make sure you have all the tools up and running.

First, create a directory for your lab work today. Next, we’ll write our first program’s source code in that directory, in a file named hello.s:

# Begin the data section, where we put constants. This contains our string message.
.data
msg:
  .asciz "Hello World!\n"

# Begin the text section, which is where we put instructions
.text
.global main            # The main function should be visible outside this file
main:
  push %rbp             # Save the frame pointer
  mov  %rsp, %rbp        # Use the current stack pointer as the new frame pointer
  lea  msg, %rdi         # Take the address `msg` and put it into %rdi
  call puts             # Call the puts function to display the message
  mov  $0, %rax          # Set main to return zero
  leave                 # Restore the frame pointer
  ret                   # Return

You can use your favorite text editor to create this file, as long as MPLAB is not your favorite text editor. Eclipse is probably not a great choice either.

Now that we have our source code, we can compile it into an executable with the following command:

$ gcc -no-pie -o hello hello.s -lc

This tells the GNU compiler collection to translate hello.s from x86 assembly to an executable file named hello. This file uses the C standard library (for the function puts) so we pass -lc to link this library in. The gcc command will invoke the GNU Assembler (gas) and the GNU Linker (ld) to produce this file. The option -no-pie is required on newer versions of gcc, which will try to build a position-independent executable by default. We are using addresses directly (msg and puts) so our assembly is not position-independent.

While this works fine, we can use a Makefile to quickly rebuild the program when we edit its source code. Create a new file called Makefile and add this to the file using your favorite text editor:

all: hello

clean:
	rm -f hello

hello: hello.s
	gcc -no-pie -o hello hello.s -lc

This file declares three targets: all, clean, and hello. The all and clean targets are called “phony” targets because these are just shorthand for other targets. After the name of a target, we write a colon and then the list of files or targets it depends on. By writing all: hello we are telling make that in order to build all, it must build hello. The make tool will build the first target by default, so we usually set it up to build everything in the first target. You can build a specific target by typing make clean or make hello. Following our target name and its dependencies, we write the shell commands that make must execute to transform the dependencies into the target. These lines must be indented with tabs, not spaces.

If you run make in your shell, you’ll probably receive the message make: Nothing to be done for 'all'. That’s because we already built hello by hand. If you edit your hello.s file and then run make, it should re-execute the steps to transform hello.s into hello because the dependencies have changed. If you want to force make to rebuild hello you can type make clean then make all.

Once you’ve built your hello program using make, run it! Type the command ./hello into your shell and hit enter.

The last step in this process is to use gdb, the GNU Debugger, to step through our program’s execution. Run the following command to start our program with gdb. You’ll get a fairly long message, then a prompt from inside of gdb.

$ gdb ./hello
GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./hello...(no debugging symbols found)...done.
(gdb) 

Now that we’re inside of gdb, we can send commands to gdb to control how the program executes. Run the start command as below:

(gdb) start
Temporary breakpoint 1 at 0x4004fa
Starting program: /home/curtsinger/x86/hello 

Temporary breakpoint 1, 0x00000000004004fa in main ()
(gdb) 

The start command bring the program to the start of our main function. It stops at that point by placing a breakpoint at the beginning of main. Now that we’re at the beginning of the main function, we can tell gdb to display the current instruction with this command:

(gdb) display/i $pc
1: x/i $pc
=> 0x4004fa <main+4>:	lea    0x601030,%rdi

This tells gdb to display the contents of memory at the program counter as an instruction. You can see that gdb stopped our program 4 bytes past the beginning of main, which skips over the first two instructions of the function, which are often called the “prologue.” Now we can walk through one instruction at a time with the stepi command:

(gdb) stepi
0x0000000000400511 in main ()
1: x/i $pc
=> 0x400502 <main+12>:	callq  0x4003f0 <puts@plt>

At any point while the program is stopped, you can run info registers to see the value in each of the processor’s registers.

Keep going until you end up inside the puts function. You can finish running puts until you get back to main using the finish command.

Okay, good work! If anything did not run as expected, now is the time to get some help. If things looked okay, continue on to the next step.

Part B: Hello hello hello hello hello hello …

Using our message program as a starting point, you will write a program that uses a loop to print a message ten times. Write your new program in the file hellohello.s. You may not simply copy-paste the code ten times! You will need to add a target for hellohello, and add hellohello to the dependencies for the all target and the files removed by the clean target.

Hint: The x86 architecture performs conditional jumps in two stages, sort of like the slt instruction in MIPS. You can use the cmp instruction to compare to registers, which sets the result of that comparison in a special set of “flag registers.” Conditional jumps run after the comparison use these flags to decide whether to take a jump or not.

Once you have a working program, run it with gdb and stop after your loop bounds check. Print the register values with info registers at this point. Which flags are consistently set or un-set by the comparison when the loop is finished? What about when it isn’t finished?

Call over a mentor to demonstrate your program and report your answer to the flags question.

Part C: Factorial

In this part of the lab, you will implement a recursive factorial function, and a main function to test your factorial implementation. Your main function should count from zero up to ten, call fact on each of these values, and then print out a message showing the result. Here is how your completed fact program should work:

$ ./fact
fact(0) is 1
fact(1) is 1
fact(2) is 2
fact(3) is 6
fact(4) is 24
fact(5) is 120
fact(6) is 720
fact(7) is 5040
fact(8) is 40320
fact(9) is 362880
fact(10) is 3628800

Write your code in a file fact.s, and update your Makefile to build a program called fact from your code. Once you have a working implementation, show it to me or one of the mentors. Please do not use tail-call elimination for this procedure!

Hint: Recursive functions are quite a bit simpler in x86 than on MIPS. Dealing with the stack is pretty simple because we have push and pop instructions. You’ll still have to save at least one value on the stack or you will lose it after the recursive call. Registers used to pass parameters are not saved across procedure calls!

Hint: You will need to call printf to show your message. Here’s a sample program that calls printf:

# Begin the data section, where we put constants. This contains our string message.
.data
msg:
  .asciz "fact(%u) is %u\n"

# Begin the text section, which is where we put instructions
.text
.global main            # The main function should be visible outside this file
main:
  push %rbp             # Save the frame pointer
  mov  %rsp, %rbp       # Use the current stack pointer as the new frame pointer
  lea  msg, %rdi        # Get the location of our printf message and put it in the first parameter register
  mov  $5, %rsi         # Put the constant 5 into the second parameter register
  mov  $120, %rdx       # Put the constant 120 into the third parameter register
  movb $0, %al          # Put the constant 0 into the %al register, as required when calling functions that take variable arguments
  call printf           # Call the printf function to display the message
  mov  $0, %rax         # Set main to return zero
  leave                 # Restore the frame pointer
  ret                   # Return