This is the first article in a series of at least three, possibly four. The aim is to detail, in as much detail as is reasonable, the process that a GNU/Linux operating system goes through to load and initialize a new process from an executable file from storage. The plan is to start off with some required background information first, and then proceed to begin actually talking about more interesting details: in particular, I'll talk about three types of executables: statically-linked, dynamically-linked, and position-independent statically-linked.
I'll use the AMD/Intel x86_64 platform 1, but almost all of this material will translate to other platforms: ARM, SPARC, etc. Some of the content of these articles is derived from original readings of source code; I'll try my best to denote this information whenever possible. Otherwise, I will try to quote the information source. If I miss something, please contact me and I will update accordingly.
So, without further ado, onto the background material!
Assembly
For the purposes of these articles, I will assume a basic knowledge of 32-bit x86 assembly and the Intel architecture. Turning this into a full-blown introduction to Intel assembly would take far too long. However, as many people are not familiar with AMD/Intel 64-bit assembly, I'll start with a brief tangent into 64-bit assembly -- but before I do so, I'd like to call attention to the syntax I will use.
nasm/gas syntax
These days, there are two major formats that programmers write assembly for AMD/Intel architectures with. On Linux systems, it seems to be somewhat traditional to use gas (AT&T) syntax. I personally prefer nasm (Intel) syntax, so that is what I will be using for the code examples here.
If you only know AT&T syntax, you should be able to follow the Intel syntax
without too much trouble. Just remember that mov
instructions have the
operands reversed . . .
Major differences between 32-bit and 64-bit assembly
64-bit assembly is really actually a superset of 32-bit assembly. Pretty much anything you write in 32-bit assembly will work flawlessly when compiled in 64-bit, with a few notable exceptions:2
-
BCD support is no longer available.
-
The
bound
instruction is not available. -
pushad
andpopad
are not available.
In 64-bit assembly, one still has access to all of the registers available in 32-bit, plus several extras:
-
Each of the usual registers (
eax
,ebx
,ecx
,edx
,esi
,edi
,esp
,ebp
) has a corresponding 64-bit extension, with thee
replaced with anr
. Much like the extensions from 16 bits to 32 bits, there is no way to access the upper 32 bits of the register directly (a laal
/ah
). -
There are eight new general-purpose registers for use:
r8
throughr15
. These are the 64-bit access names, you can access the low 8/16/32 bytes by appending ab/w/d
onto the end of the name, respectively.
Also, many instructions formerly considered processor extensions are now standard in 64-bit AMD/Intel assembly, including MMX, SSE, and SSE2. If you are not aware of these, I recommend reading about them -- they are extremely useful. Again, however, they are outside the scope of these articles.
Most importantly, however, the calling conventions have changed. Traditionally, the Linux 32-bit calling convention was to push arguments onto the stack. On 64-bit systems, however, the situation becomes a little bit more complicated. 3 The basic version is this: what you would have passed on the stack before is now passed via registers, to the point of reason. Instead of something like this:
language: asm
push eax
push dword [edx - 32]
call somefunction
add esp, 8
You now have:
language: asm
movzx rdi, eax
movzx rsi, dword [rdx - 32]
call somefunction
The order of registers is slightly peculiar: rdi
, rsi
, rdx
, rcx
, r8
,
and then r9
. Called functions are also required to preserve the contents of
the registers rbp
, rbx
, r12
, r13
, r14
, and r15
. Everything else may
be modified by the function and still lie within the calling convention.
4
The User-space/Kernel-space divide
Now, onwards to some basic concepts from operating system development! Way back when, operating systems were pretty much there to load and execute other code. These days, they pretty much do the same thing -- load and execute other code. The difference lies in how they do it. Old OSes (CP/M, DOS, etc.) had a single global memory space. The OS sat in the same memory as the -- single -- program, which was in the same place as the drivers for hardware (those that were not already in hardware or the OS itself), etc. This, as you can imagine, is somewhat limiting -- only one program at a time? -- but is also fairly unstable: poorly-written programs could accidentally overwrite parts of the OS or drivers. Not exactly a good thing . . .
Later operating systems use various techniques (segments, paging, etc.) to achieve a setup where each program runs inside its own separate address space. This has some fairly major advantages, but primarily one that I haven't mentioned yet: hardware 'sandboxing'. Most single-program operating systems of that era provided direct hardware access for all programs. This is, to put it lightly, a security nightmare. But programs still have to access hardware in some fashion -- to read/write files on hard drives, external storage devices, make sounds via speakers, communicate network cards . . .
This is the user-space/kernel-space distinction. Certain operations, such as direct hardware access, are considered privileged operations and are not available to programs. If a program wants to perform such actions, it has to make a request to the operating system's kernel. Such requests are often called system calls.
System calls on Linux
System calls usually provide a layer of abstraction on top of the hardware
present, and often times actually do not involve hardware at all! For example,
one of the widely-used system calls on Linux is the exit syscall, which, well
. . . terminates the current process. fork
/clone
is another. exec
, even.
All of these are system calls that don't actually involve hardware access
directly. In general, system calls are operations that require meta-process
privileges (creating, destroying, signaling, inspecting) or hardware accesses
(filesystem operations, memory allocation).
System calls, on 64-bit AMD/Intel Linux, are given numbers and invoked via one of two methods.
The legacy way
The traditional way to invoke a system call on Linux is by use of an interrupt,
in particular, interrupt vector 0x80
. The arguments to the system call are
passed in via registers. Traditionally, on 32-bit Linuxes, the order of
arguments was rbx
, rcx
, rdx
, rsi
, and then rdi
. This has since
changed, and on 64-bit Linux the order is different -- but, of course, not the
same as the userspace calling convention. That would be too easy! Instead, the
order of arguments is rdi
, rsi
, rdx
, r10
, r8
, and r9
.
Each system call is given a number, and this number is placed into the rax
register before the interrupt is given, telling the kernel which syscall to
execute. For example, syscall 60
is the exit()
system call, and its
first and only argument is the return code for the process. A full list of
syscall numbers are available in the file
arch/x86/include/generated/asm/unistd_64.h
in a compiled Linux kernel source
tree. For example, if you wanted to exit a process, it would translate to
something like:
language: asm
mov rax, 60
mov rdi, 127 ; we failed for some reason.
int 0x80
The new, faster ways
The new standard way to invoke system calls on 64-bit Linux systems (and on
32-bit kernels as well) is by using one of two 'new' instructions (both
about a decade old): syscall
or sysenter
. The calling convention is still
the same. The syscall
instruction was added by AMD as an alternative to
Intel's sysenter
, which is . . . somewhat clunky.
Thanks to some kernel magic we can just replace all instances of int 0x80
with a syscall
, and everything will work. So the example above becomes:
language: asm
mov rax, 60
mov rdi, 127 ; we failed for some reason.
syscall
A note on userspace wrappers.
If you've used the POSIX API before, you may be going 'wait a moment here' --
it seems like these system calls do exactly the same thing as various glibc
functions, like, well -- how about the exit()
function?
In actuality, such functions are actually lightweight wrappers around system
calls. They provide convenience and portability, and sometimes (brk
, for
example) perform a few small modifications of parameters/return values.
Syscalls typically report errors by values in the range ; the
wrappers will set
errno
appropriately from these error values and return
an appropriate failure value.
A simple example: Hello, World!
Now, just to wrap everything up, how about a simple 'Hello, World!' example in
assembly, using system calls? We'll start off by printing the string to
standard output (fd 1
), and then exiting with a return code of zero.
The C code for this example is something like:
language: C
#include <stdio.h>
int main(int argc, char *argv[]) {
const char *msg = "Hello, World!\n";
write(1, msg, strlen(msg));
exit(0);
}
The first thing to do is to translate the write
and exit
wrapper calls into
system calls. write
and exit
have syscall numbers 1
and 60
,
respectively, so the resulting code skeleton is something like:
language: asm
; write() syscall
mov rax, 1
; set up more arguments here
syscall
; exit() syscall
mov rax, 60
; set up more arguments here
syscall
We want to exit with return code zero, so let's set the first argument to zero:
language: asm
; write() syscall
mov rax, 1
; set up more arguments here
syscall
; exit() syscall
mov rax, 60
mov rdi, 0 ; {sh,c}ould be `xor rdi, rdi'
syscall
We now need to set up the params to the write
call: the first parameter is
the FD, 1
(for stdout). Second is the string address, third the number of
characters in the buffer. The resulting code, including nasm headers etc,
looks like the following:
language: asm
[bits 64]
[global _start]
[section .text]
_start:
; write() syscall.
mov rax, 1
mov rdi, 1
mov rsi, string_address
mov rdx, string_end - string_address - 1
syscall
; exit() syscall.
mov rax, 60
mov rdi, 0
syscall
[section .rodata]
string_address: db "Hello, World!", 10, 0
string_end:
To compile this, we need nasm
:
language: sh
nasm -felf64 hello.asm -o hello.o
The -felf64
flag tells nasm to output as a 64-bit relocatable object
file, suitable for linking with other files to produce a proper ELF executable.
I'll talk about ELF executables in the next article in more depth; for now,
just think of an ELF executable as a program. Now that we have the object file,
it needs to be linked to create a proper executable, which will be done with
the ld
linker:
language: sh
ld hello.o -o hello
Running it, we get:
$ ./hello
Hello, World!
. . . as expected.
Summary
This article was intended to be a whirlwind introduction to some useful background regarding 64-bit AMD/Intel assembly, and some basic operating systems concepts. If you want more detail on any of this material, there are several decent resources available; I suggest the Intel Software Developer Manuals (and the AMD processor manuals as well) for more architecture/assembly details than you could possibly ever want. Consult a good textbook for more OS information.
Happy hacking,
- ethereal
-
While technically the Intel 64-bit platform is actually licensed from AMD, I'll refer to the architecture as '64-bit Intel architecture' quite a lot, thus propagating the myth that it's actually an original Intel invention. This is more for brevity than anything else. ↩
-
See Intel Software Developer manuals for documentation on lack of long-mode support; the `notability' is personal opinion. ↩
-
If you want to see the full details, see the x86_64 ABI specification, in particular section 3.2. ↩
-
Oddly, the spec seems to imply that a function is allowed to modify
rsp
however it wishes. This is something I've wanted to try for a while: create a copy of the stack elsewhere that is identical, setrsp
to it -- leavingrbp
unchanged -- and then watch the resulting 'non-standard-compliant' code explode. Or not -- perhaps gcc doesn't make such petty assumptions. ↩