I’ve been looking at the RISC-V Assembly that is being generated from my C code and I feel like it is not being as optimal as possible.
I am using the following GCC compiler:
riscv-none-embed-gcc.exe --version
riscv-none-embed-gcc.exe (GNU MCU Eclipse RISC-V Embedded GCC, 64-bits) 7.2.0
I have the compiler optimisation level set to optimise for size -Os
The C code is essentially a getter function:
uint16_t Get_Value(void)
{
return (structA.structB.structC.structD.Value);
}
The generated assembly is:
4001ef38 <Get_Value>:
4001ef38: 800087b7 lui a5,0x80008
4001ef3c: dc478793 addi a5,a5,-572 # 80007dc4
4001ef40: 9747d503 lhu a0,-1676(a5)
4001ef44: 8082 ret
My understanding of this is as follows:
lui a5,0x80008
- Load
0x80008
into the upper 20bits of registera5
a5
is 0x80008000
- Load
addi a5,a5,-572
- Add -572 to
a5
a5
is 0x80007DC4
- Add -572 to
lhu a0,-1676(a5)
- Load the half word value stored at
a5
-1676 - Value loaded into
a0
comes from the address 0x80007738
- Load the half word value stored at
Given that lui
is loading the upper 20bits and the addi
/lhu
instructions can load load signed 12bit immediate values. Therefore, it should be possible to get to the location 0x80007738 with one fewer instruction.
I believe the following would achieve the outcome:
<Get_Value>:
lui a5,0x80007
lhu a0,1848(a5)
ret
Summary: Load 0x80007000 into a5
, load the half word unsigned from 1848 bytes offset from a5
(address would be 0x80007738).
Why would the compiler not choose a more appropriate address to load into a5
?
I have tried all the different optimisation levels, they all generate the same output except for no optimisation or optimise for debug; they produce much larger versions.
I have many getters as I thought this was good practise, so this inefficiency adds up.
Mark Satterthwaite is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.