I’m working with the Texas Instruments (TI) RM57L843 microcontroller, where I’m observing the execution time of a for loop that performs matrix multiplication. I compiled the same code using TI’s proprietary compiler (armcl) in Code Composer Studio and the GNU GCC ARM Toolchain v10.3. However, I obtained different execution times with each compiler.
Here is a simple piece of code I wrote that toggles PIN_OUT, a GIO pin on the microcontroller, while executing a for loop:
while (1) {
gioSetBit(gioPORTB, PIN_OUT, 0);
for (cntr = 0; cntr < 20; cntr++)
MatMult_AxB(C, A, B, Rows, Cols, Rows, Cols);
gioSetBit(gioPORTB, PIN_OUT, 1);
for (cntr = 0; cntr < 20; cntr++)
MatMult_AxB(C, A, B, Rows, Cols, Rows, Cols);
}
I connected an oscilloscope to PIN_OUT of my microcontroller and observed the following execution times: with all optimizations disabled, the loop took 6.6ms with armcl and 17ms with GCC. With maximum optimization enabled, armcl achieved 2.8ms, while GCC achieved 8.2ms.
For maximum optimizations in armcl, I enabled the -O4 (Whole program optimizations) and -opt_for_speed=5 flags. For GCC, I enabled the -O3 flag.
Why is amrcl faster than GCC? And what can I do to optimize GCC to match armcl’s performance?
2