I’m curious about this, let’s say I have:
00000000001 90 nop
00000000002 90 nop
00000000003 90 nop
Is it executed exactly the same as this?
00000000001 0F1F00 nop dword [ds:rax]
What effect would the second example have as opposed to the first?
5
It depends on the machine architecture. The classic KA-10 (pdp-10) had lots of nop codes, probably a consequence of it’s highly regular instruction set, and the fact that it was all implemented by descrete components, not by microcode. Some NOPs referenced memory, some were skip tests that never skipped, but nonetheless tested the condition that might have caused a skip, and so on. “JFCL 0,” was advertised in the manual as the fastest nop.
4
It appears that the second example should run quicker than the first. In the first example, three separate instructions will be executed. In the second, only one instruction. The multi-byte NOPs are intended to be used for CPU “hints” (exactly how and when is apparently confidential). They can be useful for alignment purposes (to start a tight loop on a cache line), but currently they have no other use. It’s unclear whether the CPU actually evaluates the arguments, so it’s not possible to say whether it increases the instruction decoding time or incurs a memory access penalty. Anybody with a good ICE want to test one of these and see what addresses pop up on a bus trace?
1