Are there memory-memory instructions?

From Tanenbaum’s Structured Computer Organization,

Most instructions can be divided into one of two categories: register-memory or register-register.

Register-memory instructions allow memory words to be fetched into registers, where, for example, they can be used as ALU inputs
in subsequent instructions. (‘‘Words’’ are the units of data moved
between memory and registers. A word might be an integer. We will
discuss memory organization later in this chapter.) Other
register-memory instructions allow registers to be stored back into
memory.

A typical register-register instruction fetches two operands from the registers, brings them to the ALU input registers, performs
some operation on them (such as addition or Boolean AND), and stores
the result back in one of the registers. The process of running two
operands through the ALU and storing the result is called the data
path cycle and is the heart of most CPUs. To a considerable extent,
it defines what the machine can do. Modern computers have multiple
ALUs operating in parallel and specialized for different functions.
The faster the data path cycle is, the faster the machine runs.

Or is a memory-memory “operation” implemented as two register-memory instructions (one for read and the other for write)?
Isn’t this inefficient than moving data directly between two places in the same memory without going via a register?

Lots of machine architectures have memory-memory instructions.

The IBM System/360 and its successors have a whole set of instructions that operate on two locations in memory (the Storage Storage (SS) group). “Move Character” (MVC) instruction copies up to 256 bytes from one memory location to another, and even has a clear definition for when the source and destination ranges overlap. Similarly there are Compare Logical Character (CLC) (which does a string-comparison), OR Character (OC), AND Character (NC), and XOR Character (XC), which are bitwise logical operators, etc. The also have a set of decimal arithmetic instructions, which only operate on memory – there aren’t any registers for decimal math.

Then there are the memory-immediate instructions, which have one operand in memory and the other in the instruction itself. The DEC PDP-10 had Add One to Storage (AOS) and Subtract One from Storage (SOS). The IBM S/360 family had a wide range of Storage Immediate (SI) instructions, in which one operand was a memory location and the other was an 8-bit quantity in the instruction.

Memory chips do not have a mechanism for transferring data directly from one memory location to another. Hence, the processor must read the data from memory, and then write it to the new location.

In computer systems having DMA controllers, it is possible to perform memory transfers without involving the CPU. There are potential complications, such as cache coherency.

The Motorola 68000 (“68K”) architecture had an orthogonal instruction set, and both operands could specify absolute memory addresses. You could also do things like directly incremement / decrement the value at a specific memory location, whereas with a more RISC-like architecture you’d still be required to load memory to register, increment register, write (store) register back to memory.

The ColdFire architecture is the heir/successor to the 68K, and I think they might have trimmed away some of the more exotic instructions and addressing modes.

Of all the 32-bit and 64-bit CPUs produced every year, most of them use the ARM architecture.
The ARM architecture, like the DLX architecture and the RISC-V architecture and other load/store architectures,
has only 3 kinds of instructions — (1) instructions that have no effect on memory (“register-register instructions”), (2) instructions that LOAD from external memory into register(s) (and do practically nothing else), and (3) instructions that STORE a value from register(s) into external memory (and do practically nothing else). ( (2) and (3) are “register-memory instructions” ).

Is a memory-memory “operation” implemented as two register-memory
instructions (one for read and the other for write)?

Yes and no.

Computers built with the most common 32-bit or 64-bit CPUs have no memory-memory instructions.

Some read-modify-write-memory operations are very useful in building non-blocking algorithms on systems with more than one processor connected to the same memory.

Some less-common CPU architectures, such as the 32-bit x86 and 64-bit x86-64 architectures, do have memory-memory instructions. In particular, some can perform read-modify-write-memory in a single instruction, such as compare-and-swap.

ARM processors intended for use in multi-processor systems can perform read-modify-write-memory operation, but not as a single instruction — they split them up into multiple instructions such as load-linked/store conditional, where any one instruction either LOADs or STOREs, not both.

two register-memory
instructions (one for read and the other for write)? Isn’t this
inefficient than moving data directly between two places in the same
memory without going via a register?

Yes, this inefficiency is part of the von Neumann bottleneck.

Commodity DRAM only allows one address at a time to be selected,
so even those less-common CPUs that have memory-to-memory operations in a single instruction are forced to implement those instructions as multiple memory cycles — one memory cycle for the read, and and second memory cycle for the write.
In a small loop where instructions are being read from the instruction cache, a single instruction to do both doesn’t run any faster than the 2 separate instructions that would be required on a ARM processor.

Simply copying data from one place to another is extremely common,
so several techniques have been developed for speeding it up, bypassing some or all of the von Nuemann bottleneck:

some DMA hardware directly copies data from one chip to a different chip in a single memory cycle, typically reading from main memory and writing to some peripheral, or reading from some peripheral and writing into main memory. This requires sending different addresses and different READ/WRITE Enable signals to the two chips.
Displaying stuff on screen has historically used a variety of hardware speed-ups — character ROMs, tiled rendering, hardware sprites, blitter hardware for speeding up bit blit operations, dual-ported video DRAM, etc. Some of these techniques involve reading data from one chip and sending it directly to a different chip during a single memory cycle.
Some Computational RAM chips can copy large blocks of data from one location to another inside the memory chip, much faster than reading that data (from one address) out of the chip, then writing that data (to a different address) back into the chip.

A single load or store operation is hard enough to implement. It is actually one of the most important things to get both right and fast. There is alignment, caches, address translation, communication with other cores, exception handling, memory mapped hardware. It’s more complicated than most other instructions.

A modern ARM processor has load/store and nothing else. A modern x86 processor has more complex instructions (“add register x to memory address y”), but that kind of operation gets internal split into micro operations that only do load and store.

An operation moving data from memory to memory must contain two addresses, and addresses are complicated so you get massive instructions on x86, or your instructions just won’t fit into a 32 bit word on ARM. It must do the whole load/store logic twice. There can be two page faults in an instruction for aligned access, and four for unaligned access.

It is just an enormous amount of complexity for very little gain compared to just having two instructions.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 18:31

Thẻ: architecture

Thiết kế website giá rẻ

Danh mục

Are there memory-memory instructions?