I found in more than one SIMD program the instruction __attribute__((aligned(16)))
. When I looked for an explanation I found
The keyword attribute allows you to specify special attributes of
variables or structure fields.
Apparently variables have attributes. Are these attributes language-specific? what attributes can a variable have? where are these attributes stored?
Regarding the subject of aligned(16), I found this it
causes the compiler to allocate [a variable] on a 16-byte
boundary
But is it mandatory for SIMD variables (m128i for example) to be aligned on a 16 bytes?
If yes, I suppose this is the reason why we use __attribute((aligned(16))) like so:
int a[16] __attribute__((aligned(16))) = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 };
__vector signed int *va = (__vector signed int *) a;
attributes like above are an extension (to the standard C99 language specification) provided by GCC (and some few other compilers, e.g. Clang/LLVM). The aligned
attribute forces the compiler to align that variable (your a
array) to the specified alignment. The GCC documentation lists the attributes you can give, and you could even extend your GCC compiler (with some plugin or some MELT extension; however MELT was abandoned in 2017) to add your additional attributes.
This __attribute__((aligned(16)))
is indeed useful (on x86-64) for vector instructions like SSE3 or AVX, since it is aligning to 16 bytes your a
array. It should enable the optimizing compiler to generate more efficient machine code. (You probably need also to give some optimization flags like -mtune=native -O3
to gcc
). So using it is useful (but not mandatory) for “SIMD” code.
Attributes are a compile-time annotation. Of course they change the behavior of the compiler, but such attributes do not exist at run-time (like C types do not exist at runtime neither).
If you didn’t align your a
array I guess that the compiler would have to emit additional instructions and they would run slower. BTW, you could compile with gcc -O3 -fverbose-asm -S
(and other optimization flags) and, by looking inside the generated assembly code, see what the use of that attribute changes in the emitted assembly code. You could also benchmark your application (on Linux, see time(1) and time(7)…)
5