why we can’t move a 64-bit immediate value to memory?

First I am a little bit confused with the differences between movq and movabsq, my text book says:

The regular movq instruction can only have immediate source operands that can be represented as 32-bit two’s-complement numbers. This value is then sign extended to produce the 64-bit value for the destination. The movabsq instruction can have an arbitrary 64-bit immediate value as its source operand and can only have a register as a destination.

I have two questions to this.

Question 1

The movq instruction can only have immediate source operands that can be represented as 32-bit two’s-complement numbers.

so it means that we can’t do

<code>movq $0x123456789abcdef, %rbp

</code>

<code>movq $0x123456789abcdef, %rbp </code>

movq    $0x123456789abcdef, %rbp

and we have to do:

<code>movabsq $0x123456789abcdef, %rbp

</code>

<code>movabsq $0x123456789abcdef, %rbp </code>

movabsq $0x123456789abcdef, %rbp

but why movq is designed to not work for 64 bits immediate value, which is really against the purpose of q (quard word), and we need to have another movabsq just for this purpose, isn’t that hassle?

Question 2

Since the destination of movabsq has to be a register, not memory, so we can’t move a 64-bit immediate value to memory as:

<code>movabsq $0x123456789abcdef, (%rax)

</code>

<code>movabsq $0x123456789abcdef, (%rax) </code>

movabsq $0x123456789abcdef, (%rax)

but there is a workaround:

<code>movabsq $0x123456789abcdef, %rbx

movq %rbx, (%rax) // the source operand is a register, not immediate constant, and the destination of movq can be memory

</code>

<code>movabsq $0x123456789abcdef, %rbx movq %rbx, (%rax) // the source operand is a register, not immediate constant, and the destination of movq can be memory </code>

movabsq $0x123456789abcdef, %rbx
movq    %rbx, (%rax)   // the source operand is a register, not immediate constant, and the destination of movq can be memory

so why the rule is designed to make things harder?

Yes, mov to a register then to memory for immediates that won’t fit in a sign-extended 32-bit, unlike -1 aka 0xFFFFFFFFFFFFFFFF. The why part is an interesting question, though:

Remember that assembly only lets you do what’s possible in machine code. Thus it’s really a question about ISA design. Such decisions often involve what’s easy for the hardware to decode, as well as encoding efficiency considerations. (Using up opcodes on rarely-used instructions would be bad.)

It’s not designed to make things harder, it’s designed to not need any new opcodes for mov, when AMD was extending x86 to 64-bit and aiming not to need a whole separate decoder unit for different modes. And also to limit 64-bit immediates to one special instruction format. mov is the only instruction that can ever use a 64-bit immediate at all (or a 64-bit absolute address, for load/store of AL/AX/EAX/RAX).

Check out Intel’s manual for the forms of mov (note that it uses Intel syntax, destination first, and so will my answer.) I also summarized the forms (and their instruction lengths) in Difference between movq and movabsq in x86-64, as did @MargaretBloom in answer to What’s the difference between the x86-64 AT&T instructions movq and movabsq?.

Allowing an imm64 along with a ModR/M addressing mode would also make it possible to run into the 15-byte upper limit on instruction length pretty easily, e.g. REX + opcode + imm64 is 10 bytes, and ModRM+SIB+disp32 is 6. So mov [rdi + rax*8 + 1234], imm64 would not be encodable even if there was an opcode for mov r/m64, imm64.

And that’s assuming they repurposed one of the 1-byte opcodes that were freed up by making some instructions invalid in 64-bit mode (e.g. aaa), which might be inconvenient for the decoders (and instruction-length pre-decoders) because in other modes those opcodes don’t take a ModRM byte or an immediate.

movq is for the forms of mov with a normal ModRM byte to allow an arbitrary addressing mode as the destination. (Or as the source for movq r64, r/m64). AMD chose to keep the immediate for these as 32-bit, same as with 32-bit operand size¹.

These forms of mov are the same instruction format as other instructions like add. For ease of decoding, this means a REX prefix doesn’t change the instruction length for these opcodes. Instruction-length decoding is already hard enough when the addressing mode is variable-length.

So movq is 64-bit operand-size but otherwise the same instruction format mov r/m64, imm32 (becoming the sign-extended-immediate form, same as every other instruction which only has one immediate form), and mov r/m64, r64 or mov r64, r/m64.

movabs is the 64-bit form of the existing no-ModRM short form mov reg, imm32. This one is already a special case (because of the no-ModRM encoding, with register number from the low 3 bits of the opcode byte). Small positive constants can just use 32-bit operand size for implicit zero-extension to 64-bit with no loss of efficiency (like 5-byte mov eax, 123 / AT&T mov $123, %eax in 32 or 64-bit mode). And having a 64-bit absolute mov is useful so it makes sense AMD did that.

Since there’s no ModRM byte, it can only encode a register destination. It would take a whole different opcode to add a form that could take a memory operand.

From one POV, be grateful you get a mov with 64-bit immediates at all; RISC ISAs like AArch64 (with fixed-width 32-bit instructions) need more like 4 instructions just to get a 64-bit value into a register. (Unless it’s a repeating bit-pattern; AArch64 is actually pretty cool. Unlike earlier RISCs like MIPS64 or PowerPC64)

If AMD64 was going to introduce a new opcode for mov, mov r/m, sign_extended_imm8 would be vastly more useful to save code size. It’s not at all rare for compilers to emit multiple mov qword ptr [rsp+8], 0 instructions to zero a local array or struct, each one containing a 4-byte 0 immediate. Putting a non-zero small number in a register is fairly common, and would make mov eax, 123 a 3-byte instruction (down from 5), and mov rax, -123 a 4-byte instruction (down from 7). It would also make zeroing a register without clobbering FLAGS 3 bytes.

Allowing mov imm64 to memory would be useful rarely enough that AMD decided it wasn’t worth making the decoders more complex. I agree with them in this case, but AMD was very conservative with adding new opcodes. So many missed opportunities to clean up x86 warts, like widening setcc would have been nice. (Intel finally got around to this with APX providing REX2 and EVEX prefixes for a zero-upper form of setcc.) But I think AMD wasn’t sure AMD64 would catch on and didn’t want to be stuck needing a lot of extra transistors and/or power to support a feature if people didn’t use it.

Footnote 1:
32-bit immediates in general is pretty obviously a good decision for code size. It’s very rare to want to add an immediate to something that’s outside the +-2GiB range. It could be useful for bitwise stuff like AND, but for setting/clearing/flipping a single bit the bts / btr / btc instructions are good (taking a bit-position as an 8-bit immediate, instead of needing a mask). You don’t want sub rsp, 1024 to be an 11-byte instruction; 7 is already bad enough.

Giant instructions? Not very efficient

At the time AMD64 was designed (early 2000s), CPUs with uop caches weren’t a thing. (Intel P4 with a trace cache did exist, but it is regarded as a mistake in hindsight.) Instruction fetch/decode happens in chunks of up-to-16 bytes, so having one instruction that’s nearly 16 bytes isn’t much better for the front end than movabs $imm64, %reg.

Of course, if the back-end isn’t keeping up with the front-end, that bubble of only 1 instruction decoded this cycle can be hidden by buffering between stages.

Keeping track of that much data for one instruction would also be a problem. The CPU has to put that data somewhere, and if there’s a 64-bit immediate and a 32-bit displacement in the addressing mode, that’s a lot of bits. Normally an instruction needs at most 64 bits of space for an imm32 + a disp32.

BTW, there are special no-ModRM opcodes for most operations with RAX and an immediate. (x86-64 evolved out of 8086, where AX/AL was more special, see this for more history and explanation). It would have been a plausible design for those add/sub/cmp/and/or/xor/... rax, sign_extended_imm32 forms with no ModRM to instead use a full imm64. The most common case for RAX, immediate uses an 8-bit sign-extended immediate (-128..127), not this form anyway, and it only saves 1 byte for instructions that need a 4-byte immediate. If you do need an 8-byte constant, though, putting it in a register or memory for reuse would be better than doing a 10-byte and-imm64 in a loop, though.

For the first question:

From the official documentation of gnu assembler:

In 64-bit code, movabs can be used to encode the mov instruction with the 64-bit displacement or immediate operand.

mov reg64, imm (in intel syntax, destination first) is the only instruction that accepts a 64-bit immediate value as a parameter. That’s why you can’t write a 64-bit immediate value directly to memory, only to a register. That form of mov uses an opcode that includes a register number, rather than specifying a reg/mem destination via a ModRM byte.

For the second question:

For other destinations, for example a memory location, a 32-bit immediate can be sign-extended to a 64-bit immediate (which means the top 33 bits are the same there). In this case, you use the movq instruction.

This is also possible if the target is a register, saving 3 bytes:

<code>48 B8 FF FF FF 7F 00 00 00 00 movabs $0x7FFFFFFF, %rax

48 C7 C0 FF FF FF 7F movq $0x7FFFFFFF, %rax

</code>

<code>48 B8 FF FF FF 7F 00 00 00 00 movabs $0x7FFFFFFF, %rax 48 C7 C0 FF FF FF 7F movq $0x7FFFFFFF, %rax </code>

48 B8 FF FF FF 7F 00 00 00 00   movabs $0x7FFFFFFF, %rax
48 C7 C0 FF FF FF 7F            movq   $0x7FFFFFFF, %rax

At the 64-bit immediate 0xFFFFFFFF, the top 33 bits are not the same (00...), so movl cannot be used here. That’s why I chose 0x7FFFFFFF in this example. But there is another option:

When writing to a 32-bit register (the lower part of a 64-bit register), the upper 32-bit of the register are zeroed. For a 64-bit immediate whose upper 32-bits are zero, movl can therefore also be used, which saves another a byte:

<code># with mov $imm32, reg/mem32. Assemblers won't use this for a register destination

C7 C0 FF FF FF FF movl $0xFFFFFFFF, %eax

</code>

<code># with mov $imm32, reg/mem32. Assemblers won't use this for a register destination C7 C0 FF FF FF FF movl $0xFFFFFFFF, %eax </code>

# with mov $imm32, reg/mem32.  Assemblers won't use this for a register destination
C7 C0 FF FF FF FF               movl   $0xFFFFFFFF, %eax

A further byte is saved by the assembler using the special case mov-to-register encoding. (movabs-immediate is the REX.W form of this opcode.)

<code># the mov $imm32, reg short-form encoding with no ModRM

B8 FF FF FF FF movl $0xFFFFFFFF, %eax

</code>

<code># the mov $imm32, reg short-form encoding with no ModRM B8 FF FF FF FF movl $0xFFFFFFFF, %eax </code>

# the mov $imm32, reg  short-form encoding with no ModRM
B8 FF FF FF FF                  movl   $0xFFFFFFFF, %eax

GAS and other assemblers will automatically use the shortest encoding for the instruction you actually wrote, e.g. they’ll encode mov $-1, %eax in 5 bytes.

But GAS does not automatically optimize %rax to %eax. For example, mov $0x00000000FFFFFFFF, %rax will use 10-byte movabsq, not movl.

It can also choose between movabs and movq if you use mov, depending on the size of the immediate. e.g. mov $1, %rax. But won’t optimize that to a 5-byte mov-immediate with 32-bit operand-size.

But if you use as -Os (or or gcc -Wa,-Os), GAS will use the 5-byte movl $-1, %eax encoding for mov $0xFFFFFFFF, %rax. It has the same architectural effect (one instruction that makes RAX=0x00000000FFFFFFFF), but it’s spelled differently in the asm source; using a different operand-size and thus a different register name.

NASM does this optimization (to a different operand-size) by default.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: Kiến thức lập trình - @ 21:13

Thẻ: assemblyx86-64cpu-architectureinstruction-setimmediate-operand

Thiết kế website giá rẻ

Danh mục

why we can’t move a 64-bit immediate value to memory?

Question 1

Question 2

Giant instructions? Not very efficient