The original binary search algorithm in the JDK used 32-bit integers and had an overflow bug if (low + high) > INT_MAX
(http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html).
If we rewrote the same binary search algorithm using (signed) 64-bit integers, can we assume that low + high
will never exceed INT64_MAX because it’s physically impossible to have 10^18 bytes of memory?
When using (signed) 64-bit integers to represent physical quantities, is it reasonable to assume underflow and overflow can’t happen?
13
The short answer is no. However, for some applications your assumption might be correct.
Assuming a signed int, 2^63, with commas added for some clarity, = 9,223,372,036,854,775,808. So it’s roughly 9 * 10^18. 10^18 is an “Exa”.
Wikipedia says “As of 2013, the World Wide Web is estimated to have reached 4 zettabytes.[12]”, which is 4000 Exabytes. Therefore, the WWW is roughly 400 times larger than 2^63 bytes.
Therefore, there is at least one physical quantity that is much larger than a signed (or unsigned) 64 bit integer. Assuming that your units are bytes. If your units were something much larger, like GigaBytes, then you’d be o.k., but your precision of measurement would be low.
For another example, consider far away galaxies. The Andromeda Galaxy is actually one of the close ones, and it is 2.5 * 10^6 light years away. If your units were miles, that would be 14.5 * 10^18, more than a 64 bit signed integer. Now, obviously it depends on the units you use for your measurements, but some galaxies are way further away than Andromeda. (The furthest known one is 13 * 10^9 L.Y. away.) Depending on the precision you want for your measurement, it could overflow a 64 bit integer.
(Added) Yes, miles are a lousy unit for astronomical distance. A more normal unit might be an Astronomical Unit, roughly 93 million miles. Using that unit of measurement, the furthest known galaxy is roughly 10^15 A.U. (if my math is right), which would fit into a 64 bit int. However, if you wanted to also measure the distance to the Moon, to nearby orbiting satellites, that unit is too large.
One more example from electronics: the Farad (F), a unit of capacitance. Large capacitors range up to 5kF. And this number will likely increase over time as Hybrid cars, “smart grids”, etc. improve. Once can measure capacitance as small as 10^-18 F. So the overall range in “real” capacitance that we can measure today is 5*10^21, larger than a 64 bit integer.
12
You don’t even need to go cosmic when combinatorics are involved. There are 2^95 possible deals in a game of bridge and that’s on the small side of complexity.
4
The most relevant physical quantity for your question is computer RAM.
Windows Server 2012 supports up to 4 TB of physical memory. That’s 242 bytes. If RAM capacities continue doubling about every year, then in only 17 years from now “Windows Server 2032” will support 262 bytes of physical memory, at which time your low + high
will reach 263 – 2 and kiss the max signed 64-bit integer.
I hope no safety-critical systems fail, having assumed 64 bits will always be enough.
For a slightly more general use, the most relevant physical quantity is memory address space. (It’s useful to have much a larger address space than physical memory, e.g. to put many stacks in memory, all with room to grow.) Current x86-64 implementations support 48 bit virtual addresses, so we have only 14 years before these CPUs reach the 262 byte address-space limit.
And then there’s distributed shared memory “where the (physically separate) memories can be addressed as one (logically shared) address space”.
13
Is it reasonable to assume that any physical quantity can be represented by a 64-bit integer without overflow or underflow?
Not exactly. There are plenty of numbers that are both bigger and smaller than that, which is why we have floating point numbers. Floating point numbers trade off lesser precision for better range.
In the specific example that you cited, it is highly unlikely that you would ever need a number that’s larger than that. 64 bits corresponds to roughly 18 quintillion elements. But never say never.
Your assumption won’t handle physical quantities that can only be represented by floating point numbers. And even if you decided to scale all numbers, say by multiplying all numbers by 10000 (so the values are still integers but can be represented in ten-thousandths), this scheme still fails for numbers very close to zero, for example the electron mass (9.1094 * 10⎻³¹ kg).
That is a very real (and extremely small) physical quantity, here are some more you’re going to have trouble with. And if you argue that’s not a real physical quantity (even though it is in kg), consider:
10 kg (obviously physical quantity)
1 kg (same)
10⎻² kg (1/100 kg, or about 1/3 ounce) (also quite real)
So you see where I’m going with this. The last one you can’t handle either.
Of course, you could have a special field within the number to scale an integer portion up or down by a variable multiplier; gee now you just invented floating point.
5
First I would answer the question what physical values can/should be represented by an integer number?
Integer is a representation of a natural number (and differences between them) in a computer system, so applying it to anything else is wrong. Thus, invoking distances or other quantities that belong to continuous domain is not an argument. For such quantities there are real number representations. And you can always chose an arbirarily large unit and fit any value with a given precision.
So what are physical values that are natural numbers and can they overflow 64bit integer?
I can think of two. Number of physical objects (like atoms), and energy levels that a quantum system can be in. Those are two things that are strictly integer. Now, I know you can split an atom, but it still produces an integer amount and you cannot split it indefinitely. Both of those can easily surpass 64bit range of unsigned integer. Number of atoms is higher, and one atom can be in more than one energy state.
Whether information is physical or not, is very debatable. I would say it is not. Therefore, I wouldn’t say amount of information is a physical thing. So is not the amount of RAM or anything like that. If you would allow this, then easily number of atoms surpasses this number, because you need more than one atom to store one bit with today’s technology.
4
In addition to Jerry101’s answer, I would like to offer this very simple and practical test for correctness:
Suppose you allocate some memory via malloc
, in an 64-bit OS. Let’s suppose the memory allocator decides to return to you a valid memory block, of the size you requested, but where the 63-th bit is set.
In other words, let’s suppose there exists some programming environements where 0xFFFFFFFFxxxxxxxx
are legitimate memory ranges that might be returned from a call to malloc
.
The question is, will your code still work as intended?
When the analogous situation occurs to 32-bit operating systems, some software did not operate correctly if they are given memory addresses “in the upper half”. Originally, such memory addresses were thought to be only available to the privileged code (operating systems, device drivers, and peripheral hardware), but because of the 32-bit address space crunch, OS vendors decided to make part of that reserved space available to applications that ask for it.
Fortunately, this situation is quite unlikely to happen for 64-bit programs for a while, at least not in a decade.
When that situation finally happens, it means that 128-bit addressable processors and operating systems would have become mainstream by that time, and that they would be able to provide a “64-bit emulation environment” to allow those “legacy applications” to operate under assumptions similar to today’s 64-bit operating systems.
Finally, note that this discussion only focuses on memory addresses. A similar issue with timestamps must be taken with more precaution, because certain timestamp formats allocate a lot of bits of precision to the microseconds, and therefore leaves fewer bits available to represent time in the future. These issues are summarized in the Wikipedia article on Year 2038 problem.
This is a question you need to ask on a case-by-case basis. You should not be using a general assumption that 64-bit arithmetic will not overflow, because even when correct quantities will be in a much smaller range, a malicious data source could end up giving you unreasonable quantities which could overflow, and it’s better to be prepared for this situation than to be hit by it unexpectedly.
There are some cases where it makes sense to write code that depends on non-overflow of 64-bit numbers. The main class of example I know is counters, where the counter is incremented each time it’s used. Even at a rate of one increment per nanosecond (not practical) it would take more than a century to overflow.
Note that while it may at first seem “always wrong in principle” to rely on “time until failure” for the correctness of a system, we do this all the time with authentication/login. Given enough time (for brute forcing), any such system (whether based on passwords, private keys, session tokens, etc.) is broken.
Is it POSSIBLE for a physical quantity to not fit in 64 bits? Of course. Others have pointed out counting the number of atoms in the sun or the number of millimeters to the next galaxy. Whether such cases are relevant to your application depends on what your application is. If you’re counting the number of items in any given bin in your warehouse, 16 bits would likely be sufficient. If you’re compiling statistics about number of people in the world meeting various conditions, you need to be able to record billions, so you’ll need more than 32 bits, at which point you’d presumably go to 64 (as few computers have built-in support for 37 bit numbers, etc). If this is a chemistry application that’s counting moles worth of atoms, 64 bits will not be sufficient.
Technically, just because no computer today has 2^64 bytes of memory doesn’t necessarily mean that an array index could never be more than 2^64. There’s a concept called a “sparse array” where many of the elements of the array are not physically stored anywhere, and such unstored values are assumed to have some default value like null or zero. But I suppose that if you are writing a function to search an array or list of some kind, and the size of the field you are using to hold the index into the array is more than double the largest possible address, then checking for overflow when adding two indexes would not be strictly necessary.
1
It is unreasonable to assume a 64 bit integer can hold all numbers. Multiple reasons:
-
The max and min 64 bit integer are finite numbers. For every finite number a larger and smaller finite number exists.
-
Calculations with 128 bit and 256 bit numbers are currently used in various places. Many processors have specific instructions that operate on 128 bit integers.
-
20 years ago, a 1 GB disk was considered “large”. Today a 1 TB disk is considered small. 20 years ago average desktops had about 16 MB RAM. My current desktop has more than 16 GB RAM. Harddisk space and RAM has grown exponentially in the past, and is predicted to grow exponentially in the future. Unless somebody can come up with a very good reason why it should stop growing, it doesn’t make sense to assume it will stop.