What’s the purpose of a base address in an executable?
For example, in Microsoft Visual C++’s linker, you can set a base address, or use the default of 0x1000000
. But with virtual memory, why would a base address be needed?
Why would you not just link it to 0x0
and put the initialization routine (mainCRTStartup
on Windows) there which would make it so writing to NULL (0x0
) would just overwrite the startup routine (a method that will never be called again). Plus, you could just set the code’s pages to execute only on an x86 processor so writing will cause an exception.
3
Traditionally executables get mapped at their desired base address, if that’s not in use yet. You can’t map at 0, since the lowest part of the address space is reserved (leads to crashes on null pointer dereferencing, instead of accessing memory you didn’t intend to acceess).
Since you only load a single .exe file into a process, they don’t need to use a unique base address. So most .exe files keep the default value 0x400000
. They often lack a relocation table, and thus must be loaded at that address.
For dlls things are a bit more complicated, since their target address can be in use already. In that case they’ll get mapped at another address, and a relocation table is used to patch parts of its code that now need to point to other addresses. This fixup costs both CPU time and memory (executables are mapped as copy-on-write, and the fixup means your application gets a unique copy of the dll). So base addresses for dlls are typically chosen to avoid overlaps and thus relocation.
Nowadays things are different. RAM is much cheaper, so the cost of relocation isn’t that important anymore. On the other hand security is very important now. So we use a process called ASLR where executables are mapped at unguessable addresses. This makes the base address the executable specifies rather meaningless.
Thus executables being able to choose their own base address is a performance optimization that was useful on slow computers with little RAM but isn’t useful on modern systems anymore since MS decided to trade a bit of performance for increased security.
Base addresses are needed to compute the virtual addresses of objects in the file after it has been loaded into virtual memory. Remember that the linker may have to link many PE files, and each one may request to be loaded at 0x0- this means that the PE file will have to cope with being located in different places in virtual memory.
The base address is used to make the other addresses relative- so if you have a vtable, you load the address 0x1000FF00, say. Then, when the dynamic linker rebases the executable because there was already something there, it knows how to modify this address to point to the new location.
4