This is the second article in a multi-article series on Linux program loading. I originally thought it would be a three-article series, but it now seems like I may have enough material for as many as five. To keep things nice and ambiguous, I will call it a 'multi-part' article.

The first article was primarily background information. Now, time for some actual material: this time, we're talking about statically-linked executables.

A note on programs and processes

Just a quick word before we really begin -- a 'program' is a set of instructions, whereas a 'process' is a running instance of a 'program'. I'll try to make this distinction clear whenever possible, but I'm likely to mess up fairly often. Apologies in advance for this.

Executables on Linux: a.out and ELF

On Linux, there are actually two different types of exectuable files: those in the a.out format, and those in the ELF (Executable and Linking Format). The a.out format is older, and still used as the default format by BSD, I believe. Linux has moved to using ELF, and it is dominantly more popular there, so I'll talk about ELF instead of a.out. 1

It's worth noting here that the ELF is not only used for exectuable files, but also for reloctables (compiler object files), core dumps, and several others applications. So, exactly what is the ELF?

The ELF

The ELF is actually a very sane file format, all things considered. It consists of three parts:

For now, as we're focusing on statically-linked executables, the only ELF data structures of interest are the ELF sections and ELF program headers. We'll visit each in turn. First, however, some conventions should be established.

ELF data types

For portability concerns, the ELF specification defines a few typedefs. For the sake of completeness, I will give the types here. I'll be replacing these types with the stdint.h types throughout, as I prefer them.

The purpose of ELF

Before progressing further, a quick digression: what, exactly, is the purpose of an executable file? On Von Neumann architectures (i.e. all major general-purpose architectures designed within the last thirty years), programs are really just large chunks of memory values placed in specific locations. So, in essence, an executable file is just memory and location specifiers. How, exactly, we'll get to later. For the moment, let's start with the ELF header.

ELF header

The ELF header is arguably the most important piece of data in the entire file. On x86_64 platforms, it's defined as the following structure: (from file elf/elf.h in glibc distribution, types replaced as noted above)

typedef struct
{
  unsigned char e_ident[EI_NIDENT]; /* Magic number and other info */
  uint16_t e_type;          /* Object file type */
  uint16_t e_machine;      /* Architecture */
  uint32_t e_version;      /* Object file version */
  uint64_t e_entry;        /* Entry point virtual address */
  uint64_t e_phoff;        /* Program header table file offset */
  uint64_t e_shoff;        /* Section header table file offset */
  uint32_t e_flags;        /* Processor-specific flags */
  uint16_t e_ehsize;       /* ELF header size in bytes */
  uint16_t e_phentsize;    /* Program header table entry size */
  uint16_t e_phnum;        /* Program header table entry count */
  uint16_t e_shentsize;    /* Section header table entry size */
  uint16_t e_shnum;        /* Section header table entry count */
  uint16_t e_shstrndx;     /* Section header string table index */
} Elf64_Ehdr;

The names, coupled with the short documentation comments provided by the glibc developers, are, hmm -- self-explanatory if you know everything already, shall we say? Here's what some of them are, in a little bit more detail: (the rest will be explained later as we get to the data structures they reference)

That seems like enough meta-information, no? Let us move on to some more weighty topics.

ELF sections

As was noted earlier, programs are really just memory values and addresses. In the ELF, the memory values/addresses are specified in terms of ELF sections. A single section represents a contiguous region of memory that is intended to serve the same purpose -- that is, all data, all code, all meta-information, etc. Some are intended to be loaded and used by the target program, others only specify attributes about the program, still others ask for particular features to be enabled/disabled during load time, and so on. The general meta-information about a section is specified by the following structure:

typedef struct {
    uint32_t sh_name;        /* Section name (string tbl index) */
    uint32_t sh_type;        /* Section type */
    uint64_t sh_flags;       /* Section flags */
    uint64_t sh_addr;        /* Section virtual addr at execution */
    uint64_t sh_offset;      /* Section file offset */
    uint64_t sh_size;        /* Section size in bytes */
    uint32_t sh_link;        /* Link to another section */
    uint32_t sh_info;        /* Additional section information */
    uint64_t sh_addralign;   /* Section alignment */
    uint64_t sh_entsize;     /* Entry size if section holds table */
} Elf64_Shdr;

These entries specify general information about the section: what type it is, some special flags, the address the memory should be loaded to, where in the file the data for the section can be found, how large the section is, and what the data alignment should be. Other entries -- sh_link and sh_info, for example -- are highly type-specific.

One thing worth singling out in particular is the sh_name entry. This gives a reference to a C-style string that can be used to distinguish between ELF sections of the same type, and also to provide a human-readable description of the section. The sh_name field is not an offset into the file, but is instead an offset into the data of a special 'string table' ELF section. Since one cannot uniquely identify said string table without, well, knowing where the string table is located, this seems to be an issue. Thankfully, it was solved by adding an entry into the ELF header, e_shstrndx -- the ELF section string-table index. This ELF section is simply a number of C-style strings concatenated together -- with NULL separators, of course.

Also -- where exactly are these section headers in the ELF file? They are located at a file offset specified in the ELF header (e_shoff to be exact), and the number of them is also specified by the ELF header (e_shnum).

ELF section types vary: there are about twenty specified and recognized by most Linux utilities. There are a few that may be of interest for the moment:

We'll talk more about ELF sections in the future some time; this should be enough general information for the moment. This is still relevant information, but a depth-first approach here will leave this article sitting well above textbook-size. Such an approach will not only leave my fingers dead from exhaustion, but likely will not be terribly interesting.

ELF program headers

If it seems like a fair amount of work to iterate through the sections of an ELF executable and copy memory around, you're thinking along the same lines as the fine people behind the ELF specification. Since a lot of sections will actually be loaded into contiguous regions of a process's address space, it seems a waste to copy it over bits at a time, no?

Enter program headers. Their main purpose is to provide a simple way of specifying how a program should be loaded into memory. What to do with a string table, after all, is a little ambiguous -- is it loader-specific information? Should it be available for use at run-time by the program? Program headers solve this problem by demarking (hopefully) large swathes of memory to be loaded at once. The information in a program header is given by this structure:

typedef struct {
  uint32_t p_type;   /* Segment type */
  uint32_t p_flags;  /* Segment flags */
  uint64_t p_offset; /* Segment file offset */
  uint64_t p_vaddr;  /* Segment virtual address */
  uint64_t p_paddr;  /* Segment physical address */
  uint64_t p_filesz; /* Segment size in file */
  uint64_t p_memsz;  /* Segment size in memory */
  uint64_t p_align;  /* Segment alignment */
} Elf64_Phdr;

A few points of note:

Entry point

So suppose we've got a program all loaded into memory. We have to start executing from somewhere, right? The address that the OS is intended to begin execution at is specified in the ELF header, as the value e_entry. This isn't always the case, really, but it's a good enough answer for the moment. This changes when program interpreters get involved, which they will when we tackle dynamically-linked executables later. For now, though, we'll concern ourselves with statically-linked executables.

Statically-linked ELF executables

So what is a statically-linked ELF executable? Put simply, it is an executable that requires no external resources to load. Once it begins executing, it very well may require other data files etc, but insofar as the loading process is concerned, it is entirely self-contained. I'm likely beginning to sound like a broken record, but . . . we'll talk about advantages and disadvantages of static once we encounter its alternative, dynamically-linked executables.

For now, let's do something interesting!

(Manually) constructing an ELF executable: Hello, World!

In particular, how about we make an ELF executable manually -- that is to say, with a hex editor? Seems like a fairly good method of exercising information about the ELF specification, no?

So, what do we need? Obviously, the header is required. Nothing we can do about that. Since we don't particularly care too very much about human disassembly of the program, we can discard the ELF sections. So all we really need are three things: the ELF header, the program headers, and the data itself.

The data

The program we'll use for this demonstration is the canonical 'Hello, World!' program that does naught but print this fabled message to the standard output channel. In fact, it'll be a minor modification of that which was used in the previous article:

[bits 64]
[org 0x400078]
[global _start]
_start:
    ; write() syscall.
    mov     rax, 1
    mov     rdi, 1
    mov     rsi, string_address
    mov     rdx, string_end - string_address - 1
    syscall

    ; exit() syscall.
    mov     rax, 60
    mov     rdi, 0
    syscall

string_address: db "Hello, World!", 10, 0
string_end:

Note the addition of the nasm directive [org 0x400078] -- this instructs nasm to make the code assume it was loaded at that address. (Needed for the absolute dereferences of string_address and string_end.) Before, the linker was doing this bit; now we have to do it manually. The value 0x400078 is magic right now, we'll get to why it is this value in a moment.

If this were a real program, we'd probably split the code and data up so that the string was not present inside an executable page. Right now, however, if we have them both in the same place, only one program header is required.

If we compile the above code with nasm and then disassemble the result, we get the following:

0000000000400078 (05) b801000000               MOV EAX, 0x1
000000000040007d (05) bf01000000               MOV EDI, 0x1
0000000000400082 (10) 48be9f00400000000000     MOV RSI, 0x40009f
000000000040008c (05) ba0e000000               MOV EDX, 0xe
0000000000400091 (02) 0f05                     SYSCALL
0000000000400093 (05) b83c000000               MOV EAX, 0x3c
0000000000400098 (05) bf00000000               MOV EDI, 0x0
000000000040009d (02) 0f05                     SYSCALL
000000000040009f (03) 48656c                   INS BYTE [RDI], DX
00000000004000a2 (01) 6c                       INS BYTE [RDI], DX
00000000004000a3 (01) 6f                       OUTS DX, DWORD [RSI]
00000000004000a4 (02) 2c20                     SUB AL, 0x20
00000000004000a6 (01) 57                       PUSH RDI
00000000004000a7 (01) 6f                       OUTS DX, DWORD [RSI]
00000000004000a8 (02) 726c                     JB 0x400116
00000000004000aa (03) 64210a                   AND [FS:RDX], ECX
00000000004000ad (01) 00                       DB 0x0

So we have 0x36 (54) bytes of code/data that we need to load, in particular the hex sequence:

B8 01 00 00  00 BF 01 00  00 00 48 BE  9F 00 40 00
00 00 00 00  BA 0E 00 00  00 0F 05 B8  3C 00 00 00 
BF 00 00 00  00 0F 05 48  65 6C 6C 6F  2C 20 57 6F
72 6C 64 21  0A 00

ELF Header

Now let us construct the ELF header for this executable. The ELF identification is not terribly interesting right now -- it'll have all of the 'standard' values for a 64-bit Intel/AMD Linux program, so instead consider the elements in the Elf64_Ehdr structure.

This gives us, after little-endian conversion, the following hexadecimal values for the ELF header:

e_ident:     7F 45 4C 46  02 01 01 00  00 00 00 00  00 00 00 00
e_type:      02 00
e_machine:   3E 00
e_version:   01 00 00 00
e_entry:     78 00 40 00  00 00 00 00
e_phoff:     40 00 00 00  00 00 00 00
e_shoff:     00 00 00 00  00 00 00 00
e_flags:     00 00 00 00
e_ehsize:    40 00
e_phentsize: 38 00
e_phnum:     01 00
e_shentsize: 40 00
e_shnum:     00 00
e_shstrndx:  00 00

Which, translated into a solid block of data expressed in hex, is: (offsets in first column)

0x00: 7F 45 4C 46  02 01 01 00  00 00 00 00  00 00 00 00
0x10: 02 00 3E 00  01 00 00 00  78 00 40 00  00 00 00 00
0x20: 40 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
0x30: 00 00 00 00  40 00 38 00  01 00 40 00  00 00 00 00

So that's the first 64 bytes settled.

Program headers

We now need to place the program header somewhere in the executable, but before doing that, perhaps we should decide on some values?

Here's a copy of the Elf64_Phdr definition for reference:

typedef struct {
  uint32_t p_type;   /* Segment type */
  uint32_t p_flags;  /* Segment flags */
  uint64_t p_offset; /* Segment file offset */
  uint64_t p_vaddr;  /* Segment virtual address */
  uint64_t p_paddr;  /* Segment physical address */
  uint64_t p_filesz; /* Segment size in file */
  uint64_t p_memsz;  /* Segment size in memory */
  uint64_t p_align;  /* Segment alignment */
} Elf64_Phdr;

This gives us the hex data:

p_type:   01 00 00 00
p_flags:  05 00 00 00
p_offset: 00 00 00 00  00 00 00 00
p_vaddr:  00 00 40 00  00 00 00 00
p_paddr:  00 00 40 00  00 00 00 00
p_filesz: AE 00 00 00  00 00 00 00
p_memsz:  AE 00 00 00  00 00 00 00
p_align:  00 00 20 00  00 00 00 00

Which translates to the hex region:

0x40: 01 00 00 00  05 00 00 00  00 00 00 00  00 00 00 00
0x50: 00 00 40 00  00 00 00 00  00 00 40 00  00 00 00 00
0x60: AE 00 00 00  00 00 00 00  AE 00 00 00  00 00 00 00
0x70: 00 00 20 00  00 00 00 00

Executable data

After all of this, it's time to place the executable data itself, which was found via nasm earlier:

0x70: .. .. .. ..  .. .. .. ..  B8 01 00 00  00 BF 01 00
0x80: 00 00 48 BE  27 00 40 00  00 00 00 00  BA 0E 00 00
0x90: 00 0F 05 B8  3C 00 00 00  BF 00 00 00  00 0F 05 48
0xa0: 65 6C 6C 6F  2C 20 57 6F  72 6C 64 21  0A 00

The overall file in hex

Just by copy-pasting from the above, this gives us the resulting file:

0x00: 7F 45 4C 46  02 01 01 00  00 00 00 00  00 00 00 00
0x10: 02 00 3E 00  01 00 00 00  78 00 40 00  00 00 00 00
0x20: 40 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
0x30: 00 00 00 00  40 00 38 00  01 00 40 00  00 00 00 00 
0x40: 01 00 00 00  05 00 00 00  00 00 00 00  00 00 00 00
0x50: 00 00 40 00  00 00 00 00  00 00 40 00  00 00 00 00
0x60: AE 00 00 00  00 00 00 00  AE 00 00 00  00 00 00 00
0x70: 00 00 20 00  00 00 00 00  B8 01 00 00  00 BF 01 00
0x80: 00 00 48 BE  9F 00 40 00  00 00 00 00  BA 0E 00 00
0x90: 00 0F 05 B8  3C 00 00 00  BF 00 00 00  00 0F 05 48
0xa0: 65 6C 6C 6F  2C 20 57 6F  72 6C 64 21  0A 00

Executing it!

And the result:

$ awk -F: '{print $2}' hex-file | xxd -r -p > binary
$ chmod +x binary
$ ./binary
Hello, World!

As you'd expect! Take a moment to bask in the glory of manually creating a 'Hello, World!' program using naught but a hex editor and an assembler, and then let it sink in just how long that took . . . and realize there's a reason we have linkers.

And now for something a little different: loading an executable

Let's turn this around a little bit, shall we? How about a program that will load another program? That is, how about we simulate the process the kernel goes through to load a program.

Basic idea

As mentioned before, we're still only thinking about statically-linked exectuables. The process of dynamic linking/loading is pretty complicated, and we'll talk about it in the future. For the moment, we'll just concentrate upon how to load a statically-linked executable.

The real question is really, where will this executable be loaded to? This is where things will start to get a bit complicated, but bear with me here. The idea is that we'll load another program into the same address space as the current program. That is, we'll essentially replace the current program with another one!

So how will this work? Here's the basic outline:

  1. Read input ELF file program headers.
  2. Map memory where ELF will be loaded to.
  3. Copy contents of file into memory.
  4. Jump to ELF entry point.

We'll need to leave the initial environment as untouched as we can. There's some fairly sensitive values in there; getting them all correct is a bit of a touchy process. We'll talk about this in more detail sometime in the future, probably the next article.

There's one major thing that needs to be addressed, though . . . what happens if the target ELF executable specifies that it wants to be loaded into some address range that happens to be where our loader is present? After all, we're working within the same address space, so this might not work as expected, right?

Well, yes. There are a few ways to get around this, but for the moment, we'll just sort of hack it by setting the loader to be loaded to a non-standard address.

Finally, fair warning: this code cuts corners in an effort to be as straightforwards as possible. There's little to no error checking, and is all-around not very good code. You've been warned.

Assembly code

Because we want to not touch the initial program state, this means using assembly code for that extra little bit of control. The bulk will still be written in C, of course, but the 'driver' code will be in assembly, along with one very useful routine (file wrapper.s):

[bits 64]

[extern load]
[extern elf_entry]
[extern toload]

[section .text]
[global _start]
_start:
    ; the only things we're interested in saving are rsp and rdx.
    push rdx

    ; load first argument into toload pointer
    mov rax, [rsp + 0x18]
    mov [toload], rax

    call load

    pop rdx
    mov rax, [elf_entry]
    jmp rax

[global invoke_syscall]
invoke_syscall:
    ; save registers rbx, r12, r13, r14, and r15.
    sub rsp, 0x28
    mov [rsp + 0x00], rbx
    mov [rsp + 0x08], r12
    mov [rsp + 0x10], r13
    mov [rsp + 0x18], r14
    mov [rsp + 0x20], r15

    ; syscall number
    mov rax, rdi
    ; rearrange arguments to match syscall interface.
    ; userspace order: rdi, rsi, rdx, rcx, r8, r9, memory.
    ; kernel ordering: rdi, rsi, rdx, r10, r8, r9.

    ; But we've replaced the first argument with the syscall number.
    ; So shift . . .
    mov rdi, rsi
    mov rsi, rdx
    mov rdx, rcx
    mov r10, r8
    mov r8, r9
    mov r9, [rsp + 0x30]

    syscall

    ; restore stack
    add rsp, 0x28

    ret

[section .bss]
initial_rdx: resq 1

There isn't terribly much to say about this. The assembly contains both the primary 'driver' function (_start) and a utility function that allows the C code to invoke system calls. Just two things that are perhaps not obvious:

As for why we only save rsp and rdx to preserve the startup environment . . . that's in the next article.

C code

Now it's time for the corresponding C code: (file loader.c)

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <linux/elf.h>

extern uint64_t invoke_syscall(uint64_t number, uint64_t arg1, uint64_t arg2,
    uint64_t arg3, uint64_t arg4, uint64_t arg5, uint64_t arg6);

// used by wrapper
uint64_t elf_entry;
const char *toload;

// useful userspace functions
size_t strlen(const char *s);

// syscall wrappers
int open(const char *path, int oflag, ...);
int close(int fd);
int write(int fd, const void *buffer, int chars);
void exit(int code);
void *mmap(void *addr, size_t len, int prot, int flags, int fd, off_t off);

void load() {
    // toload will contain pointer to first cmdline argument
    if(toload == NULL) {
        const char *message = "No filename passed as argument.\n";
        write(1, message, strlen(message));
        exit(1);
    }

    {
        const char *message = "Loading . . .\n";
        write(1, message, strlen(message));
    }

    int fd = open(toload, O_RDONLY);
    // these aren't the userspace wrappers, they return -errno instead of
    // -1 and setting the variable errno.
    if(fd < 0) {
        const char *message = "Failed to open input file.\n";
        write(1, message, strlen(message));
        exit(1);
    }

    // just want to read the headers. Assume they're in the first 4KB.
    void *header_memory = mmap(NULL, 0x1000, PROT_READ, MAP_PRIVATE, fd, 0);

    if((int64_t)header_memory < 0) {
        const char *message = "Failed to map input file into memory.\n";
        write(1, message, strlen(message));
        exit(1);
    }

    // time to start mapping: header first.
    Elf64_Ehdr *elf_header = header_memory;

    // really ought to verify ELF magic etc. here . . .

    // this is an ugly cast to grab a pointer to the beginning of the pheaders
    Elf64_Phdr *pheaders =
        (void *)((uint8_t *)header_memory + elf_header->e_phoff);

    // try loading everything
    for(int i = 0; i < elf_header->e_phnum; i ++) {
        // only care about loadable sections.
        if(pheaders[i].p_type != PT_LOAD) continue;

        // convert from pheader permissions to mmap permissions
        // these might actually have the same values, not sure
        int permissions = 0;
        if(pheaders[i].p_flags & PF_R) permissions |= PROT_READ;
        if(pheaders[i].p_flags & PF_W) permissions |= PROT_WRITE;
        if(pheaders[i].p_flags & PF_X) permissions |= PROT_EXEC;

        // this is where having p_offset be page size is useful.
        // also, this doesn't take into account the fact that p_memsz and
        // p_filesz can be drastically different . . .
        void *target = mmap((void *)pheaders[i].p_vaddr, pheaders[i].p_memsz,
            permissions, MAP_FIXED | MAP_PRIVATE, fd, pheaders[i].p_offset);

        // check for errors.
        if((int64_t)target < 0) {
            const char *message = "Failed to map input file contents.\n";
            write(1, message, strlen(message));
            exit(1);
        }
    }

    close(fd);
    elf_entry = elf_header->e_entry;
}

size_t strlen(const char *s) {
    if(!s) return 0;

    size_t count = 0;
    while(*s++) count ++;
    return count;
}

int open(const char *path, int oflag, ...) {
    return (int)invoke_syscall(2, (uint64_t)path, oflag, 0, 0, 0, 0);
}

int close(int fd) {
    return invoke_syscall(3, fd, 0, 0, 0, 0, 0);
}

int write(int fd, const void *buffer, int chars) {
    return invoke_syscall(1, fd, (uint64_t)buffer, chars, 0, 0, 0);
}

void exit(int code) {
    invoke_syscall(60, code, 0, 0, 0, 0, 0);
}

void *mmap(void *addr, size_t len, int prot, int flags, int fd, off_t off) {
    return (void *)invoke_syscall(9, (uint64_t)addr, len, prot, flags, fd, off);
}

This one deserves a bit more explanation than the assembly. There are really two parts to this source file; the wrapper/helper functions and then the actual loading code itself.

Why the wrappers/helpers? We're writing this as essentially 'portable assembly'-style C, and as a result, don't have access to the standard libraries that would otherwise do this for us . . . it's worth noting that the syscall wrappers will return the same value as the actual system call itself. Linux kernel code breaks from userspace tradition and returns 0 on success, -errno on failure (the maximum value of errno is 4095, for reference). A quirk of the x86_64 architecture (see canonical addresses for more details), and the Linux implementation for the platform, means that valid addresses etc. will not have the highest bit set, so (for syscalls that return memory addresses) the signed 2's complement interpretation will only be negative if the system call failed.

Now, the loading code. It can be broken down into three main parts:

The process of loading the ELF headers is pretty straightforwards. It's a common paradigm in low-level code to, instead of using read/seek etc. to map the file into memory. If you've never run into memory-mapped file I/O before, the idea is pretty simple: treat reads from a region of memory as if they were reads from a file at that offset (i.e. redirect the read to the file itself), and vice-versa for writes. See the mmap(2) manpage for more details.

As noted before, there are several types of program headers. For the purposes of this extremely simple loader, all that really matters is getting the program content into memory. As a result, the only program headers of interest are those that denote loadable regions, i.e. those of type PT_LOAD. Each of these loadable regions are intended for a certain purpose -- code, data, read-only data, etc -- and as a result have particular permissions. These requested permissions are honoured, and an mmap() is used to load the contents of these program headers into memory.

Compiling and linking

$ gcc -ffreestanding loader.c -c
$ nasm -felf64 wrapper.s -o wrapper.o
$ ld wrapper.o loader.o -o loader -Ttext=0x800100

We instruct gcc that this is a 'freestanding' executable, i.e. that we're running outside of the traditional libc environment. The third line tells ld that we want to place the beginning of the code at 0x800100. Why 0x800100? Because it needs to start somewhere . . . and placing it at an offset of 0x100 is sufficient space for the ELF headers. Remember that sections have alignments that have to be satisfied in memory? For ease of loading, ld ensures that these alignments are also satisfied inside the ELF file as well, so unless we want a lot of padding and whatnot, placing it just above a page boundary is best. We use the base 0x800000 to try and ensure that it doesn't overlap the memory of the program that is being loaded.

Then we confirm that it works (by re-using our earlier example program): 3

$ ./loader binary
Loading . . .
Hello, World!
$ ./loader ~/c/hello-static
Loading . . .
./loader: Hello, World!


Modifying the target program

How about something else interesting: using our loader to modify the program as it is being loaded? This would be the first step towards several things -- among them, writing a dynamic loader, or writing an interpreter virus . . . both interesting.

For illustrative purposes, we'll modify the above loader to search for the string "Hello, World!" and change it to something else. We need some extra functions for this, so we'll end up changing three parts of the above program. First, add the following two helper function prototypes:

void *memmem(const void *haystack, size_t haystack_len,
    const void *needle, size_t needle_len);
void *memcpy(void *dest, const void *src, size_t n);

Next, this syscall wrapper prototype:

int mprotect(void *addr, size_t len, int prot);

The helper function implementations:

void *memmem(const void *haystack, size_t haystack_len,
    const void *needle, size_t needle_len) {

    const uint8_t *h8 = haystack;
    const uint8_t *n8 = needle;
    // do an inefficient O(mn) search.
    for(int off = 0; off+needle_len <= haystack_len; off ++) {
        int valid = 1;
        for(size_t i = 0; i < needle_len; i ++) {
            if(h8[off+i] != n8[i]) { valid = 0; break; }
        }
        if(valid) return (void *)&h8[off];
    }

    return NULL;
}

void *memcpy(void *dest, const void *src, size_t n) {
    uint8_t *d8 = dest;
    const uint8_t *s8 = src;
    for(size_t i = 0; i < n; i ++) {
        d8[i] = s8[i];
    }
    return dest;
}

Now for the syscall wrapper implementation:

int mprotect(void *addr, size_t len, int prot) {
    return (int)invoke_syscall(10, (uint64_t)addr, len, prot, 0, 0, 0);
}

Finally, the modification code (which should be added to the end of the program header processing loop):

const char *target_str = "Hello, World!";
void *search = memmem((void *)pheaders[i].p_vaddr,
    pheaders[i].p_memsz, target_str, strlen(target_str));
if(search != NULL) {
    mprotect((void *)((uint64_t)search & ~0xfff), 0x1000,
        PROT_READ | PROT_WRITE);
    const char *repl = "Modified, eh?";
    memcpy(search, repl, strlen(target_str));
    mprotect((void *)((uint64_t)search & ~0xfff), 0x1000,
        permissions);
}

And, when we compile/run this modified loader, we get what we'd expect:

$ gcc -ffreestanding loader-mod.c -c
$ nasm -felf64 wrapper.s -o wrapper.o
$ ld wrapper.o loader-mod.o -o loader-mod -Ttext=0x800100
$ ./loader-mod ./binary
Loading . . .
Modified, eh?

Summary

This article should hopefully help give a brief overview of how ELF files are put together, how they're constructed, and how they're loaded. We haven't touched the really interesting stuff yet -- i.e. dynamic linking/loading -- but this is much easier to digest.

Also, a secondary point that I wanted to use this article to illustrate: it's hard to do this stuff manually. But it's pretty easy to see how you can write a program to take care of this placement/arrangement . . . which is exactly what we'll do in a future article somewhere. But for now . . .

Happy hacking,

- ethereal


  1. I'm lead to believe that the a.out format is very similar in spirit to the ELF, though I haven't studied it in any great detail. 

  2. Here the physical address doesn't really matter -- the Linux program loader actually just ignores it. This is, after all, an environment with per-process memory addresses, and Linux doesn't allow userspace processes any control over what physical memory gets used. I like to set the virtual/physical addresses to be the same, primarily because that's what GNU binutils does . . . 

  3. Note that some static executables won't work properly due to the fact that we don't actually change the values in the auxiliary vector. We'll get around to fixing this in the future . . . the hello-static that I have here happens to not depend on this.