I know absolutely nothing in low-level stuff, so this will be a very newbie question. Please excuse my ignorance.
Is machine language – the series of numbers to that tell the physical computer exactly what to do – always binary? I.e. always composed of only zeros one ones? Or could it also be composed of numbers such as 101, 242, 4 etc.?
8
Everything in a computer (to be precise, in any typical contemporary computer) is binary, at a certain level. “1s and 0s” is an abstraction, an idea we use to represent a way of distinguishing between two values. In RAM, that means higher and lower voltage. On the hard drive, that means distinct magnetic states, and so on. Using Boolean logic and a base 2 number system, a combination of 1s and 0s can represent any number, and other things (such as letters, images, sounds, etc) can be represented as numbers.
But that’s not what people mean when they say “binary code.” That has a specific meaning to programmers: “Binary” code is code that is not in text form. Source code exists as text; it looks like a highly formalized system of English and mathematical symbols. But the CPU doesn’t understand English or mathematical notation; it understands numbers. So the compiler translates source code into a stream of numbers that represent CPU instructions that have the same underlying meaning as the source code. This is properly known as “machine code,” but a lot of people call it “binary”.
19
Let’s have a look at an actual machine instruction. Suppose we have an ARM CPU and we want to add 143 to the value in register 2, placing the result in register 1. In ARM assembly language that’s written
ADD R1, R2, #143
This assembly instruction can be encoded as a single machine instruction. The specification of how that’s done is on physical page 156 of the ARM ARM, the amusingly-named Acorn RISC Machine Architecture Reference Manual. It’s also necessary to look at the definition of “shifter operand”, which begins on physical page 444.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
| Cond | 0 0 I 0 1 0 0 S| Rn | Rd | shifter operand |
As you seem to already understand, machine instructions are numbers, and on the ARM, they are numbers of a fixed size: 32 bits, divided into several fields. To encode the above ADD, we fill in the fields like this:
| cond | fmt | I | opcode | S | Rn | Rd | rot | imm |
| E | 00 | 1 | 0100 | 0 | 2 | 1 | 0 | 143 |
(The “shifter operand” got divided into “rot” and “imm” because I set I=1.) Now, to make that into a single 32-bit number, we have to expand it out to binary, because many of the fields are not tidy numbers of bits long:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
1 1 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 1 1 1
To humans that is a big blur; hexadecimal is easier for us to understand:
1110 0010 1000 0010 0001 0000 1000 1111 E 2 8 2 1 0 8 F
And so usually, in communication with other humans, we say that the “machine instruction” corresponding to ADD R1, R2, #143
is the hexadecimal number E282 108F
. We could equally say that it is the decimal number 3,800,174,735, but that obscures the pattern of fields more than hex does. (Someone with a lot of practice debugging on the bare metal on ARM would be able to pick condition code E, source and destination registers 2 and 1, immediate operand 8F = 143 out of E282 108F
with relative ease.)
All of the above representations encode the same machine instruction! I have only changed how I wrote it down.
In terms of “ones and zeroes”, if you load a program containing this instruction into RAM on a real computer, somewhere in memory the bit pattern 1110 0010 1000 0010 0001 0000 1000 1111 will appear (possibly backwards, because of endianness). But it is equally valid to say that somewhere in memory the hexadecimal number E282 108F
, or the decoded instruction ADD R1, R2, #143
appears. Bit patterns in RAM have no meaning in themselves; meaning comes from context. Conversely, that bit pattern / hexadecimal number isn’t necessarily an instruction at all! It would also appear in a program that made use of the unsigned 32-bit integer 3,800,174,735, or the single-precision IEEE floating point number -1.199634951 × 1021 as data.
3
Whenever someone uses the phrase “the ones and zeros” in most contexts, especially this context, they are, in my opinion, significantly misrepresenting what’s going on, and thus leading to confusion.
The computer doesn’t really just read “the ones and zeros” any more than when you read a book, you are reading “the letters”. Sure, both are strictly true, but those statements are leaving out a substantial piece of information: the structure of each.
In the case of English, the letters are structured into words, and the words make up sentences, according to a set of rules. The order of letters in words and the order of words in sentences can completely change the meaning.
A similar process is in play with computers and with machine language. The computer looks at the ones and zeros in discrete chunks, in bytes, and in groups of bytes.
Other posters have mentioned various ways that numbers can be encoded as individual bits. There are ints, floating point, text strings, etc, which give structure to the stream of bits and bytes.
Ultimately, the computer is conceptually looking at groups of bits, so it’s rarely ever looking at “10101010”, its looking at 101, 242, or 4, etc. What those numbers mean, depends on their context in the given ‘sentence’ they are part of.
All numbers stored in most computers are technically stored in a binary form. At a hardware level everything is represented as a series of high and low voltage signals. High voltage signals are ones/true values, low voltage signals are zeros/false values. These are the bits (short for binary digits) mentioned when talking about 32 bit or 64 bit machines. The number (32,64) in this case refers to how many bits can be addressed out of memory at a time.
So in most modern computers the machine code is just normal values stored in memory, but all of memory is made of bits.
Almost all “computers” these days use binary logic. However, the meaning of “computer” post-WW II has come to mean a computing device with persistent storage and stored programs, rather than just a simple computing engine like a calculator.
A few examples of the exceptions are:
- There might be a few odd-ball trinary (or more) logic systems in labs.
- There are a few analog computing systems are in use.
- An example of a future high performance computing system not using binary logic might be the D-Wave quantum annealing systems.
Machine language is not an universal language but rather a strictly CPU-related language – the language the CPU understands.
You can design a CPU that has 42 states instead of 2 states for the smallest element of memory. The problem is that you cannot come with a good enough implementation for such a CPU. Actually, some of the first computers (including ENIAC) were decimal computers that implicitly used a decimal machine language.
The fact that it’s decimal or binary or other value depends on the number of states the smallest element of memory (a bit) can take; 2 was not chosen for CPU design purposes but rather limited by electronic implementation: a transistor operates much better and faster with only 2 levels of voltage instead of 10 (or any other natural number larger than 2).