How have languages influenced CPU design? [closed]

We are often told that the hardware doesn’t care what language a program is written in as it only sees the compiled binary code, however this is not the whole truth. For example, consider the humble Z80; its extensions to the 8080 instruction set include instructions like CPIR which is useful for scanning C-style (NULL-terminated) strings, e.g. to perform strlen(). The designers must have identified that running C programs (as opposed to Pascal, where the length of a string is in the header) was something that their design was likely to be used for. Another classic example is the Lisp Machine.

What other examples are there? E.g. instructions, number and type of registers, addressing modes, that make a particular processor favour the conventions of a particular language? I am particularly interested in revisions of the same family.

The existing answers focus on ISA changes. There are other hardware changes, too. For instance, C++ commonly uses vtables for virtual calls. Starting with the Pentium M, Intel has an “indirect branch predictor” component which accelerates virtual function calls.

The Intel 8086 instruction set includes a variation of “ret” which adds a value to the stack pointer after popping the return address. This is useful for many Pascal implementations where the caller of a function will push arguments onto the stack before making a function call, and pop them off afterward. If a routine would accept e.g. four bytes’ worth of parameters, it could end with “RET 0004” to clean up the stack. Absent such an instruction, such a calling convention would likely have required that code pop the return address to a register, update the stack pointer, and then jump to that register.

Interestingly, most code (including OS routines) on the original Macintosh used the Pascal calling convention despite the lack of a facilitating instruction in the 68000. Using this calling convention saved 2-4 bytes of code at a typical call site, but required an extra 4-6 bytes of code at the return site of every function that took parameters.

One example is MIPS, which has both add and addu for trapping and ignoring overflow respectively. (Also sub and subu.) It needed the first type of instruction for languages like Ada (I think–I’ve never actually used Ada though) which deal with overflows explicitly and the second type for languages like C that ignore overflows.

If I remember correctly, the actual CPU has some additional circuitry in the ALU for keeping track of overflows. If the only language people cared about was C, it wouldn’t need this.

The Burroughs 5000 series was designed to efficiently support ALGOL, and Intel’s iAPX-432 was designed to efficiently execute Ada. The Inmos Transputer had it’s own language, Occam. I think the Parallax “Propeller” processor was designed to be programmed using its own variant of BASIC.

It’s not a language, but the VAX-11 instruction set has a single instruction to load a process context, which was designed after a request from the VMS design team. I don’t remember the details, but ISTR it took so many instructions to implement that it put a serious upper limit on the number of processes they could schedule.

One thing nobody seems to have mentioned so far is that advances in compiler optimization (where the base language is largely irrelevant) drove the shift from CISC instruction sets (which were largely designed to be coded by humans) to RISC instruction sets (which were largely designed to be coded by compilers.)

The Motorola 68000 family introduced some autoincrement adressmode that made copying data through the cpu very efficient and compact.

[Updated example]

this was some c++ code that influenced 68000 assembler

while(someCondition)
    destination[destinationOffset++] = source[sourceOffset++]

implemented in conventional assembler (pseudocode, I forgot the 68000 assembler commands)

adressRegister1 = source
adressRegister2 = destination
while(someCondition) {
    move akku,(adressRegister1)
    move (adressRegister2), akku
    increment(adressRegister1, 1)
    increment(adressRegister2, 1)
}

with the new adressmode it became something simmilar to

adressRegister1 = source
adressRegister2 = destination
while(someCondition) {
    move akku,(adressRegister1++)
    move (adressRegister2++), akku
}

only two instructions per loop instead of 4.

IBM’s Z series mainframe, is the descendant of the IBM 360 from the 1960s.

There were several instructions which were put there to specifically to speed up COBOL and Fortran programs. The classic example being the BXLE – “Branch on Index Low Or Equal” which is most of a Fortran for loop or a COBOL PERFORM VARYING x from 1 by 1 until x > n encapsulated in a single instruction.

There is also a whole family of packed decimal instructions to support fixed point decimal arithmetic common in COBOL programs.

Early Intel CPUs had the following features, many of them now obsoleted in 64-bit mode:

ENTER, LEAVE and RET nn instructions [early manuals told explicitly those were introduced for block structured languages, e.g., Pascal, which supports nested procedures]
instructions for speeding up BCD arithmetic (AAA, AAM, etc.); also BCD support in x87
JCXZ and LOOP instructions for implementing counted loops
INTO, for generating a trap on arithmetic overflow (e.g., in Ada)
XLAT for table lookups
BOUND for checking array bounds

Sign flag, found in the status register of many CPUs, exists to easily perform signed AND unsigned arithmetic.

SSE 4.1 instruction set introduces instructions for string processing, both counted and zero-terminated (PCMPESTR, etc.)

Also, I could imagine that a number of system-level features were designed to support
safety of compiled code (segment limit checking, call gates with parameter copying, etc.)

Some ARM processors, mainly those in mobile devices, include(d) Jazelle extension, which is hardware JVM interpreter; it interprets Java bytecode directly. Jazelle-aware JVM can use the hardware to speed up the execution and eliminate much of JIT, but fallback to software VM is still ensured if bytecode cannot be interpreted on chip.

Processors with such unit include BXJ instruction, which puts processor in special “Jazelle mode”, or if activating the unit had failed, it is just interpreted as normal branch instruction. The unit reuses ARM registers to hold JVM state.

The successor to Jazelle technology is ThumbEE

As far as I know this was more common in the past.

There is a session of questions in which James Gosling said that there were people trying to make hardware that could deal better with JVM bytecode, but then these people would find out a way to do it with common “generic” intel x86 (maybe compiling the bytecode in some clever way).

He mentioned that there is advantage in using the generic popular chip (such as intel’s) because it has a large corporation throwing huge sums of money at the product.

The video is worth checking out. He talks about this at minute 19 or 20.

I did a quick page search and it seems that no one has mentioned CPU’s developed specifically to execute Forth. The Forth programming language is stack based, compact, and used in control systems.

The Intel iAPX CPU was specifically designed for OO languages. Didn’t quite work out, though.

The iAPX 432 (intel Advanced Processor architecture) was Intel’s first 32-bit microprocessor design, introduced in 1981 as a set of three integrated circuits. It was intended to be Intel’s major design for the 1980s, implementing many advanced multitasking and memory management features. The design was therefore referred to as a Micromainframe…

The iAPX 432 was “designed to be programmed entirely in high-level languages”, with Ada being primary and it supported object-oriented programming and garbage collection directly in hardware and microcode. Direct support for various data structures was also intended to allow modern operating systems for the iAPX 432 to be implemented using far less program code than for ordinary processors. These properties and features resulted in a hardware and microcode design that was much more complex than most processors of the era, especially microprocessors.

Using the semiconductor technology of its day, Intel’s engineers weren’t able to translate the design into a very efficient first implementation. Along with the lack of optimization in a premature Ada compiler, this contributed to rather slow but expensive computer systems, performing typical benchmarks at roughly 1/4 the speed of the new 80286 chip at the same clock frequency (in early 1982).

This initial performance gap to the rather low profile and low priced 8086-line was probably the main reason why Intel’s plan to replace the latter (later known as x86) with the iAPX 432 failed. Although engineers saw ways to improve a next generation design, the iAPX 432 Capability architecture had now started to be regarded more as an implementation overhead rather than as the simplifying support it was intended to be.

The iAPX 432 project was a commercial failure for Intel…

The 68000 had MOVEM which was most suited to pushing multiple registers onto the stack in a single instruction which is what many languages expected.

If you saw MOVEM (MOVE Multiple) preceding JSR (Jump SubRoutine) throughout the code then you generally knew that you were dealing with C complied code.

MOVEM allowed for auto increment of the destination register allowing each use to continue stacking on the destination, or removing from the stack in the case of auto decrement.

http://68k.hax.com/MOVEM

Atmel’s AVR architecture is entirely designed from the ground up to be suitable for programming in C. For example, this application note elaborates further.

IMO this is closely related with rockets4kids’es excellent answer, with early PIC16-s being developed for direct assembler programming (40 instructions total), with later families targeting C.

When the 8087 numerical coprocessor was designed, it was fairly common for languages to perform all floating-point math using the highest-precision type, and only round the result to lower precision when assigning it to a lower-precision variable. In the original C standard, for example, the sequence:

float a = 16777216, b = 0.125, c = -16777216;
float d = a+b+c;

would promote a and b to double, add them, promote c to double, add it, and then store the result rounded to float. Even though it would have been faster in many cases for a compiler to generate code that would perform operations directly on type float, it was simpler to have a set of floating-point routines which would operate only on type double, along with routines to convert to/from float, than to have separate sets of routines to handle operations on float and double. The 8087 was designed around that approach to arithmetic, performing all all arithmetic operations using an 80-bit floating-point type [80 bits was probably chosen because:

On many 16- and 32-bit processors, it’s faster to work with a 64-bit mantissa and a separate exponent than to work with value which divides a byte between the mantissa and exponent.
It’s very difficult to perform computations which are accurate to the full precision of the numerical types one is using; if one is trying to e.g. compute something like log10(x), it’s easier and faster to compute a result which is accurate to within 100ulp of an 80-bit type than to compute a result which is accurate to within 1ulp of a 64-bit type, and rounding the former result to 64-bit precision will yield a 64-bit value which is more accurate than the latter.

Unfortunately, future versions of the language changed the semantics of how floating-point types should work; while the 8087 semantics would have been very nice if languages had supported them consistently, if functions f1(), f2(), etc. return type float, many compiler authors would take it upon themselves to make long double an alias for the 64-bit double type rather than the compiler’s 80-bit type (and provide no other means of creating 80-bit variables), and to arbitrarily evaluate something like:

double f = f1()*f2() - f3()*f4();

in any of the following ways:

double f = (float)(f1()*f2()) - (extended_double)f3()*f4();
double f = (extended_double)f1()*f2() - (float)(f3()*f4());
double f = (float)(f1()*f2()) - (float)(f3()*f4());
double f = (extended_double)f1()*f2() - (extended_double)f3()*f4();

Note that if f3 and f4 return the same values as f1 and f2, respectively, the original expression should clearly return zero, but many of the latter expressions may not. This led to people condemning the “extra precision” of the 8087 even though the last formulation would generally be superior to the third and–with code that used the extended double type appropriately–would rarely be inferior.

In the intervening years, Intel has responded to language’s (IMHO unfortunate) trend toward forcing intermediate results to be rounded to the operands’ precision by designing their later processors so as to favor that behavior, to the detriment of code which would benefit from using higher precision on intermediate calculations.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 13:17

Thẻ: assembly, hardware, history, language-design, optimization

Thiết kế website giá rẻ

Danh mục

How have languages influenced CPU design? [closed]