I’m currently playing with elf files. I’m getting confused about how PT_LOAD segments are loaded into memory. I mean how p_offset, p_filesz, p_vaddr, and p_memsz are used.
First thing first, this is my program header output using **readelf**: =
➜ ~ readelf -l /usr/bin/cat
Elf file type is DYN (Shared object file)
Entry point 0x31f0
There are 13 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040
0x00000000000002d8 0x00000000000002d8 R 0x8
INTERP 0x0000000000000318 0x0000000000000318 0x0000000000000318
0x000000000000001c 0x000000000000001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x00000000000016e0 0x00000000000016e0 R 0x1000
LOAD 0x0000000000002000 0x0000000000002000 0x0000000000002000
0x0000000000004431 0x0000000000004431 R E 0x1000
LOAD 0x0000000000007000 0x0000000000007000 0x0000000000007000
0x00000000000021d0 0x00000000000021d0 R 0x1000
LOAD 0x0000000000009a90 0x000000000000aa90 0x000000000000aa90
0x0000000000000630 0x00000000000007c8 RW 0x1000
DYNAMIC 0x0000000000009c38 0x000000000000ac38 0x000000000000ac38
0x00000000000001f0 0x00000000000001f0 RW 0x8
NOTE 0x0000000000000338 0x0000000000000338 0x0000000000000338
0x0000000000000020 0x0000000000000020 R 0x8
NOTE 0x0000000000000358 0x0000000000000358 0x0000000000000358
0x0000000000000044 0x0000000000000044 R 0x4
GNU_PROPERTY 0x0000000000000338 0x0000000000000338 0x0000000000000338
0x0000000000000020 0x0000000000000020 R 0x8
GNU_EH_FRAME 0x000000000000822c 0x000000000000822c 0x000000000000822c
0x00000000000002bc 0x00000000000002bc R 0x4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 0x10
GNU_RELRO 0x0000000000009a90 0x000000000000aa90 0x000000000000aa90
0x0000000000000570 0x0000000000000570 R 0x1
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.gnu.property .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt
03 .init .plt .plt.got .plt.sec .text .fini
04 .rodata .eh_frame_hdr .eh_frame
05 .init_array .fini_array .data.rel.ro .dynamic .got .data .bss
06 .dynamic
07 .note.gnu.property
08 .note.gnu.build-id .note.ABI-tag
09 .note.gnu.property
10 .eh_frame_hdr
11
12 .init_array .fini_array .data.rel.ro .dynamic .got
In the above output, I can see four LOAD segments. I’m using gdb to start examine how those four segments are mapped into memory. I start the program and check the process mappings as follow:
(gdb) shell cat /proc/18331/maps
555555554000-555555556000 r--p 00000000 103:03 3276961 /usr/bin/cat
555555556000-55555555b000 r-xp 00002000 103:03 3276961 /usr/bin/cat
55555555b000-55555555e000 r--p 00007000 103:03 3276961 /usr/bin/cat
55555555e000-555555560000 rw-p 00009000 103:03 3276961 /usr/bin/cat
7ffff7fc9000-7ffff7fcd000 r--p 00000000 00:00 0 [vvar]
7ffff7fcd000-7ffff7fcf000 r-xp 00000000 00:00 0 [vdso]
7ffff7fcf000-7ffff7fd0000 r--p 00000000 103:03 3325699 /usr/lib/x86_64-linux-gnu/ld-2.31.so
7ffff7fd0000-7ffff7ff3000 r-xp 00001000 103:03 3325699 /usr/lib/x86_64-linux-gnu/ld-2.31.so
7ffff7ff3000-7ffff7ffb000 r--p 00024000 103:03 3325699 /usr/lib/x86_64-linux-gnu/ld-2.31.so
7ffff7ffc000-7ffff7ffe000 rw-p 0002c000 103:03 3325699 /usr/lib/x86_64-linux-gnu/ld-2.31.so
7ffff7ffe000-7ffff7fff000 rw-p 00000000 00:00 0
7ffffffde000-7ffffffff000 rw-p 00000000 00:00 0 [stack]
ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0 [vsyscall]
Let’s look at the fourth segment, as from the process mappings, I can see that this is a 0x2000 bytes segment with offset 0x9000 from the /usr/bin/cat file. What makes me confused is the difference between the p_offset (0x9a90) and p_vaddr (0xaa90).
After reading through a lot of materials, I think that because p_offset of this segment is 0x9a90, which means it will reside on the same page as the previous segment. Thus, p_vaddr is move 0x1000 bytes forward in order to be put in another page. So this means that the actual address used for mmap will be caculated by a bias plus p_vaddr, rounded down to the closest page start address. The offset is calculated by p_offset minus p_offset mod p_align. Meanwhile the length of mmap will be computed by p_filesz plus p_offset mod p_align:
mmap(PAGESTART(bias + p_vaddr), PAGE_ALIGN(p_filesz + p_offset mod p_align), FLAGS , p_offset - p_offset mod p_align)
So here comes to my first question: Is the above guess right?
Moving on, in the fourth segment, I can see many sections:
.init_array .fini_array .data.rel.ro .dynamic .got .data .bss
I’m checking this using objdump, the output is as follow:
➜ ~ objdump --disassemble-all /usr/bin/cat --start-address=0x9000 --stop-address=0xd000
/usr/bin/cat: file format elf64-x86-64
Disassembly of section .eh_frame:
0000000000009000 <.eh_frame+0xb18>:
Disassembly of section .init_array:
000000000000aa90 <.init_array>:
...
Disassembly of section .fini_array:
000000000000aa98 <.fini_array>:
...
Disassembly of section .data.rel.ro:
000000000000aaa0 <quoting_style_args@@Base-0x140>:
...
The point is, .init_array section starts at 0xaa90 (which is the p_vaddr), not as I expected. I think that this section should start at 0x9a90, which is the p_offset of this segment.
So if the offset is 0x9000 and the size is 0x2000 for this section, doesn’t it mean that I will totally miss some bytes from this segment (mmap only maps from 0x9000 to 0xB000, however .init_array starts at 0xaa90, which means the segment should ended at 0xaa90 + p_filesz = 0xaa90 + 0x630 = 0xb0c0) ?
Anh Phan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.