In this lab, you will learn a little more about the x86 instruction set by writing and examining programs written in x86 assembly. In this lab, we will use a few tools that may be somewhat unfamiliar; these include the GNU debugger, GNU assembler, and makefiles.
The third reference is a complete list of all x86 instructions, which you probably will not need. If you are looking for an instruction to perform a particular operation, you can search on the page for matches in the description column. There may be many instructions with the same description, but these will differ in their operand types. Pay close attention to operands; x86 has quite a few funny quirks, and these show up mainly in operands.
There are many tricky details in the x86 calling convention used on Linux, but for this lab you just need to know three things: which registers parameters go into, where return values go, and how to call functions like printf
that take a variable number of parameters.
Parameters to functions are placed in registers in this order:
%rdi
%rsi
%rdx
%rcx
%r8
%r9
Return values go into the %rax
register.
When you call a function like printf
, which takes a variable number of parameters, you use the same calling convention with one added detail.
You have to set the %al
register to zero, which indicates that you are not passing any parameters in vector registers (a special kind of register for performing operations in parallel). You can see an example of this in the second hint for part C.
Many parts of x86 assembly language are similar to MIPS, but there are some differences that will show up quickly. Here are few important ones:
add %rax, %rbx
instruction adds the values in %rax
and %rbx
and stores the result in %rbx
.push
and pop
. The instruction push %rcx
moves the stack down and copies the value in %rcx
to the bottom of the stack. The instruction pop %rcx
pulls the value off the bottom of the stack and stores it into %rcx
, then moves the stack up.push %rbp
). Next, you take the current stack pointer and make this the frame pointer (mov %rsp, %rbp
). At the end of the function, you need to undo this process. Luckily, x86 has a handy leave
instruction that undoes this, then you execute a ret
instruction to return.We’ll start out by constructing a simple “Hello World” program with x86 assembly to make sure you have all the tools up and running.
First, create a directory for your lab work today.
Next, we’ll write our first program’s source code in that directory, in a file named hello.s
:
# Begin the data section, where we put constants. This contains our string message.
.data
msg:
.asciz "Hello World!\n"
# Begin the text section, which is where we put instructions
.text
.global main # The main function should be visible outside this file
main:
push %rbp # Save the frame pointer
mov %rsp, %rbp # Use the current stack pointer as the new frame pointer
lea msg, %rdi # Take the address `msg` and put it into %rdi
call puts # Call the puts function to display the message
mov $0, %rax # Set main to return zero
leave # Restore the frame pointer
ret # Return
You can use your favorite text editor to create this file, as long as MPLAB is not your favorite text editor. Eclipse is probably not a great choice either.
Now that we have our source code, we can compile it into an executable with the following command:
$ gcc -no-pie -o hello hello.s -lc
This tells the GNU compiler collection to translate hello.s
from x86 assembly to an executable file named hello
. This file uses the C standard library (for the function puts
) so we pass -lc
to link this library in. The gcc
command will invoke the GNU Assembler (gas
) and the GNU Linker (ld
) to produce this file.
The option -no-pie
is required on newer versions of gcc, which will try to build a position-independent executable by default.
We are using addresses directly (msg
and puts
) so our assembly is not position-independent.
While this works fine, we can use a Makefile to quickly rebuild the program when we edit its source code. Create a new file called Makefile
and add this to the file using your favorite text editor:
all: hello
clean:
rm -f hello
hello: hello.s
gcc -no-pie -o hello hello.s -lc
This file declares three targets: all
, clean
, and hello
. The all
and clean
targets are called “phony” targets because these are just shorthand for other targets. After the name of a target, we write a colon and then the list of files or targets it depends on. By writing all: hello
we are telling make
that in order to build all
, it must build hello
. The make
tool will build the first target by default, so we usually set it up to build everything in the first target. You can build a specific target by typing make clean
or make hello
. Following our target name and its dependencies, we write the shell commands that make
must execute to transform the dependencies into the target. These lines must be indented with tabs, not spaces.
If you run make
in your shell, you’ll probably receive the message make: Nothing to be done for 'all'.
That’s because we already built hello
by hand. If you edit your hello.s
file and then run make
, it should re-execute the steps to transform hello.s
into hello
because the dependencies have changed. If you want to force make
to rebuild hello
you can type make clean
then make all
.
Once you’ve built your hello
program using make
, run it! Type the command ./hello
into your shell and hit enter.
The last step in this process is to use gdb
, the GNU Debugger, to step through our program’s execution. Run the following command to start our program with gdb
. You’ll get a fairly long message, then a prompt from inside of gdb.
$ gdb ./hello
GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./hello...(no debugging symbols found)...done.
(gdb)
Now that we’re inside of gdb
, we can send commands to gdb
to control how the program executes.
Run the start
command as below:
(gdb) start
Temporary breakpoint 1 at 0x4004fa
Starting program: /home/curtsinger/x86/hello
Temporary breakpoint 1, 0x00000000004004fa in main ()
(gdb)
The start command bring the program to the start of our main
function. It stops at that point by placing a breakpoint at the beginning of main
.
Now that we’re at the beginning of the main function, we can tell gdb
to display the current instruction with this command:
(gdb) display/i $pc
1: x/i $pc
=> 0x4004fa <main+4>: lea 0x601030,%rdi
This tells gdb
to display the contents of memory at the program counter as an instruction.
You can see that gdb
stopped our program 4 bytes past the beginning of main
, which skips over the first two instructions of the function, which are often called the “prologue.”
Now we can walk through one instruction at a time with the stepi
command:
(gdb) stepi
0x0000000000400511 in main ()
1: x/i $pc
=> 0x400502 <main+12>: callq 0x4003f0 <puts@plt>
At any point while the program is stopped, you can run info registers
to see the value in each of the processor’s registers.
Keep going until you end up inside the puts
function. You can finish running puts
until you get back to main
using the finish
command.
Okay, good work! If anything did not run as expected, now is the time to get some help. If things looked okay, continue on to the next step.
Using our message program as a starting point, you will write a program that uses a loop to print a message ten times.
Write your new program in the file hellohello.s
.
You may not simply copy-paste the code ten times!
You will need to add a target for hellohello
, and add hellohello
to the dependencies for the all
target and the files removed by the clean
target.
Hint: The x86 architecture performs conditional jumps in two stages, sort of like the slt
instruction in MIPS.
You can use the cmp
instruction to compare to registers, which sets the result of that comparison in a special set of “flag registers.”
Conditional jumps run after the comparison use these flags to decide whether to take a jump or not.
Once you have a working program, run it with gdb
and stop after your loop bounds check.
Print the register values with info registers
at this point. Which flags are consistently set or un-set by the comparison when the loop is finished? What about when it isn’t finished?
Call over a mentor to demonstrate your program and report your answer to the flags question.
In this part of the lab, you will implement a recursive factorial function, and a main
function to test your factorial implementation.
Your main
function should count from zero up to ten, call fact
on each of these values, and then print out a message showing the result.
Here is how your completed fact
program should work:
$ ./fact
fact(0) is 1
fact(1) is 1
fact(2) is 2
fact(3) is 6
fact(4) is 24
fact(5) is 120
fact(6) is 720
fact(7) is 5040
fact(8) is 40320
fact(9) is 362880
fact(10) is 3628800
Write your code in a file fact.s
, and update your Makefile
to build a program called fact
from your code.
Once you have a working implementation, show it to me or one of the mentors. Please do not use tail-call elimination for this procedure!
Hint: Recursive functions are quite a bit simpler in x86 than on MIPS.
Dealing with the stack is pretty simple because we have push
and pop
instructions.
You’ll still have to save at least one value on the stack or you will lose it after the recursive call.
Registers used to pass parameters are not saved across procedure calls!
Hint: You will need to call printf
to show your message. Here’s a sample program that calls printf
:
# Begin the data section, where we put constants. This contains our string message.
.data
msg:
.asciz "fact(%u) is %u\n"
# Begin the text section, which is where we put instructions
.text
.global main # The main function should be visible outside this file
main:
push %rbp # Save the frame pointer
mov %rsp, %rbp # Use the current stack pointer as the new frame pointer
lea msg, %rdi # Get the location of our printf message and put it in the first parameter register
mov $5, %rsi # Put the constant 5 into the second parameter register
mov $120, %rdx # Put the constant 120 into the third parameter register
movb $0, %al # Put the constant 0 into the %al register, as required when calling functions that take variable arguments
call printf # Call the printf function to display the message
mov $0, %rax # Set main to return zero
leave # Restore the frame pointer
ret # Return