This is the first article in a series of at least three, possibly four. The aim is to detail, in as much detail as is reasonable, the process that a GNU/Linux operating system goes through to load and initialize a new process from an executable file from storage. The plan is to start off with some required background information first, and then proceed to begin actually talking about more interesting details: in particular, I'll talk about three types of executables: statically-linked, dynamically-linked, and position-independent statically-linked.

I'll use the AMD/Intel x86_64 platform 1, but almost all of this material will translate to other platforms: ARM, SPARC, etc. Some of the content of these articles is derived from original readings of source code; I'll try my best to denote this information whenever possible. Otherwise, I will try to quote the information source. If I miss something, please contact me and I will update accordingly.

So, without further ado, onto the background material!

Assembly

For the purposes of these articles, I will assume a basic knowledge of 32-bit x86 assembly and the Intel architecture. Turning this into a full-blown introduction to Intel assembly would take far too long. However, as many people are not familiar with AMD/Intel 64-bit assembly, I'll start with a brief tangent into 64-bit assembly -- but before I do so, I'd like to call attention to the syntax I will use.

nasm/gas syntax

These days, there are two major formats that programmers write assembly for AMD/Intel architectures with. On Linux systems, it seems to be somewhat traditional to use gas (AT&T) syntax. I personally prefer nasm (Intel) syntax, so that is what I will be using for the code examples here.

If you only know AT&T syntax, you should be able to follow the Intel syntax without too much trouble. Just remember that mov instructions have the operands reversed . . .

Major differences between 32-bit and 64-bit assembly

64-bit assembly is really actually a superset of 32-bit assembly. Pretty much anything you write in 32-bit assembly will work flawlessly when compiled in 64-bit, with a few notable exceptions:2

In 64-bit assembly, one still has access to all of the registers available in 32-bit, plus several extras:

Also, many instructions formerly considered processor extensions are now standard in 64-bit AMD/Intel assembly, including MMX, SSE, and SSE2. If you are not aware of these, I recommend reading about them -- they are extremely useful. Again, however, they are outside the scope of these articles.

Most importantly, however, the calling conventions have changed. Traditionally, the Linux 32-bit calling convention was to push arguments onto the stack. On 64-bit systems, however, the situation becomes a little bit more complicated. 3 The basic version is this: what you would have passed on the stack before is now passed via registers, to the point of reason. Instead of something like this:

push    eax
push    dword [edx - 32]
call    somefunction
add     esp, 8

You now have:

movzx   rdi, eax
movzx   rsi, dword [rdx - 32]
call    somefunction

The order of registers is slightly peculiar: rdi, rsi, rdx, rcx, r8, and then r9. Called functions are also required to preserve the contents of the registers rbp, rbx, r12, r13, r14, and r15. Everything else may be modified by the function and still lie within the calling convention. 4

The User-space/Kernel-space divide

Now, onwards to some basic concepts from operating system development! Way back when, operating systems were pretty much there to load and execute other code. These days, they pretty much do the same thing -- load and execute other code. The difference lies in how they do it. Old OSes (CP/M, DOS, etc.) had a single global memory space. The OS sat in the same memory as the -- single -- program, which was in the same place as the drivers for hardware (those that were not already in hardware or the OS itself), etc. This, as you can imagine, is somewhat limiting -- only one program at a time? -- but is also fairly unstable: poorly-written programs could accidentally overwrite parts of the OS or drivers. Not exactly a good thing . . .

Later operating systems use various techniques (segments, paging, etc.) to achieve a setup where each program runs inside its own separate address space. This has some fairly major advantages, but primarily one that I haven't mentioned yet: hardware 'sandboxing'. Most single-program operating systems of that era provided direct hardware access for all programs. This is, to put it lightly, a security nightmare. But programs still have to access hardware in some fashion -- to read/write files on hard drives, external storage devices, make sounds via speakers, communicate network cards . . .

This is the user-space/kernel-space distinction. Certain operations, such as direct hardware access, are considered privileged operations and are not available to programs. If a program wants to perform such actions, it has to make a request to the operating system's kernel. Such requests are often called system calls.

System calls on Linux

System calls usually provide a layer of abstraction on top of the hardware present, and often times actually do not involve hardware at all! For example, one of the widely-used system calls on Linux is the exit syscall, which, well . . . terminates the current process. fork/clone is another. exec, even. All of these are system calls that don't actually involve hardware access directly. In general, system calls are operations that require meta-process privileges (creating, destroying, signaling, inspecting) or hardware accesses (filesystem operations, memory allocation).

System calls, on 64-bit AMD/Intel Linux, are given numbers and invoked via one of two methods.

The legacy way

The traditional way to invoke a system call on Linux is by use of an interrupt, in particular, interrupt vector 0x80. The arguments to the system call are passed in via registers. Traditionally, on 32-bit Linuxes, the order of arguments was rbx, rcx, rdx, rsi, and then rdi. This has since changed, and on 64-bit Linux the order is different -- but, of course, not the same as the userspace calling convention. That would be too easy! Instead, the order of arguments is rdi, rsi, rdx, r10, r8, and r9.

Each system call is given a number, and this number is placed into the rax register before the interrupt is given, telling the kernel which syscall to execute. For example, syscall 60 is the exit() system call, and its first and only argument is the return code for the process. A full list of syscall numbers are available in the file arch/x86/include/generated/asm/unistd_64.h in a compiled Linux kernel source tree. For example, if you wanted to exit a process, it would translate to something like:

mov     rax, 60
mov     rdi, 127 ; we failed for some reason.
int     0x80

The new, faster ways

The new standard way to invoke system calls on 64-bit Linux systems (and on 32-bit kernels as well) is by using one of two 'new' instructions (both about a decade old): syscall or sysenter. The calling convention is still the same. The syscall instruction was added by AMD as an alternative to Intel's sysenter, which is . . . somewhat clunky.

Thanks to some kernel magic we can just replace all instances of int 0x80 with a syscall, and everything will work. So the example above becomes:

mov     rax, 60
mov     rdi, 127 ; we failed for some reason.
syscall

A note on userspace wrappers.

If you've used the POSIX API before, you may be going 'wait a moment here' -- it seems like these system calls do exactly the same thing as various glibc functions, like, well -- how about the exit() function?

In actuality, such functions are actually lightweight wrappers around system calls. They provide convenience and portability, and sometimes (brk, for example) perform a few small modifications of parameters/return values. Syscalls typically report errors by values in the range Generated by LaTeX; the wrappers will set errno appropriately from these error values and return an appropriate failure value.

A simple example: Hello, World!

Now, just to wrap everything up, how about a simple 'Hello, World!' example in assembly, using system calls? We'll start off by printing the string to standard output (fd 1), and then exiting with a return code of zero.

The C code for this example is something like:

#include <stdio.h>

int main(int argc, char *argv[]) {
    const char *msg = "Hello, World!\n";
    write(1, msg, strlen(msg));
    exit(0);
}

The first thing to do is to translate the write and exit wrapper calls into system calls. write and exit have syscall numbers 1 and 60, respectively, so the resulting code skeleton is something like:

; write() syscall
mov     rax, 1
; set up more arguments here
syscall

; exit() syscall
mov     rax, 60
; set up more arguments here
syscall

We want to exit with return code zero, so let's set the first argument to zero:

; write() syscall
mov     rax, 1
; set up more arguments here
syscall

; exit() syscall
mov     rax, 60
mov     rdi, 0       ; {sh,c}ould be `xor rdi, rdi'
syscall

We now need to set up the params to the write call: the first parameter is the FD, 1 (for stdout). Second is the string address, third the number of characters in the buffer. The resulting code, including nasm headers etc, looks like the following:

[bits 64]
[global _start]
[section .text]
_start:
    ; write() syscall.
    mov     rax, 1
    mov     rdi, 1
    mov     rsi, string_address
    mov     rdx, string_end - string_address - 1
    syscall

    ; exit() syscall.
    mov     rax, 60
    mov     rdi, 0
    syscall

[section .rodata]
string_address: db "Hello, World!", 10, 0
string_end:

To compile this, we need nasm:

nasm -felf64 hello.asm -o hello.o

The -felf64 flag tells nasm to output as a 64-bit relocatable object file, suitable for linking with other files to produce a proper ELF executable. I'll talk about ELF executables in the next article in more depth; for now, just think of an ELF executable as a program. Now that we have the object file, it needs to be linked to create a proper executable, which will be done with the ld linker:

ld hello.o -o hello

Running it, we get:

$ ./hello
Hello, World!

. . . as expected.

Summary

This article was intended to be a whirlwind introduction to some useful background regarding 64-bit AMD/Intel assembly, and some basic operating systems concepts. If you want more detail on any of this material, there are several decent resources available; I suggest the Intel Software Developer Manuals (and the AMD processor manuals as well) for more architecture/assembly details than you could possibly ever want. Consult a good textbook for more OS information.

Happy hacking,

- ethereal


  1. While technically the Intel 64-bit platform is actually licensed from AMD, I'll refer to the architecture as '64-bit Intel architecture' quite a lot, thus propagating the myth that it's actually an original Intel invention. This is more for brevity than anything else. 

  2. See Intel Software Developer manuals for documentation on lack of long-mode support; the `notability' is personal opinion. 

  3. If you want to see the full details, see the x86_64 ABI specification, in particular section 3.2. 

  4. Oddly, the spec seems to imply that a function is allowed to modify rsp however it wishes. This is something I've wanted to try for a while: create a copy of the stack elsewhere that is identical, set rsp to it -- leaving rbp unchanged -- and then watch the resulting 'non-standard-compliant' code explode. Or not -- perhaps gcc doesn't make such petty assumptions.