I’m working on designing a new programming language and trying to decide how I will do variable comparisons. Along with many different types of languages, I’ve used PHP for years and personally had zero bugs related to its comparison operations other than situations where 0 = false. Despite this, I’ve heard a lot of negativity towards its method of comparing types.
For example, in PHP:
2 < 100 # True
"2" < "100" # True
"2" < 100 # True
In Python, string comparison goes like this:
2 < 100 # True
"2" < "100" # False
"2" < 100 # False
I don’t see any value in Python’s implementation (how often do you really need to see which of two strings is lexicographically greater?), and I see almost no risk in PHP’s method and a lot of value. I know people claim it can create errors, but I don’t see how. Is there ever really going to be a situation where you are testing if (100 = “100”) and you don’t want the string to be treated as a number? And if you really did, you could use === (which I’ve also heard people complain about but without any substantial reason).
So, my question is, not counting some of PHP’s weird conversion and comparison rules dealing with 0’s and nulls and strings mixed with characters and numbers, are there any substantial reasons that comparing ints and strings like this is bad, and are there any real reasons having a === operator is bad?
13
The biggest problem is that an equivalence relationship, the mathy term for things like ==
, is supposed to satisfy 3 laws.
- reflexivity,
a == a
- commutativity
a == b
meansb == a
- transitivity
a == b
andb == c
meansa == c
All of these are very intuitive and expected. And PHP doesn’t follow them.
'0'==0 // true
0=='' // true
'0'==''// false, AHHHH
So it’s not actually an equivalence relationship, which is a pretty distressing realization for some mathy people (including me).
It also hints at one of the things that people really hate about implicit casts, they often behave unexpectedly when combined with the mundane. It’s basically just an arbitrary set of rules because it’s unprincipled in this sense, weird stuff happens and it all needs to be specified case by case.
Basically we sacrifice consistency and the developer has to shoulder the extra burden of making sure there’s no funny (and expensive) conversions happening behind the scene’s. To quote this article
Language consistency is very important for developer efficiency. Every
inconsistent language feature means that developers have one more
thing to remember, one more reason to rely on the documentation, or
one more situation that breaks their focus. A consistent language lets
developers create habits and expectations that work throughout the
language, learn the language much more quickly, more easily locate
errors, and have fewer things to keep track of at once.
EDIT:
Another gem I stumbled across
NULL == 0
NULL < -1
So if you try to sort anything, it’s nondetermistic and entirely dependent on the order in which comparisons are made. Eg suppose bubble sort.
bubble_sort([NULL, -1, 0]) // [NULL, -1, 0]
bubble_sort([0, -1, NULL]) // [-1, 0, NULL]
9
Its the old practitioners versus theoreticians conflict.
In Computer Science theory strong typing etc is considered good, and, there are a lot of theoretical (and some practical) ambiguities that arise e.g. do you really want (“2.000” == “2.0”) to evaluate to true?
However in practice its just plain nice to be able to code:
if (user_choice == 2)
rather than the equivalent Java
try {
if (Integer.parseInt(user_choice) == 2) {
isTwo = true;
} else {
isTwo = false;
} catch (NumberFormatException nex) {
isTwo = false;
}
17
One of the more complicated projects that I have undertaken was embedding the PHP library into Google’s v8 in order to allow javascript to create and otherwise access PHP objects directly, without a form of bridge, and without any form of interpretation (outside of javascirpt that is). This required access to the zend tables of a PHP object, and all properties and methods of a PHP object were accessible directly in this way from javascript.
From this experience I also got a taste of the challenge of handling PHP data types within C (the lower level structures that is) in a way that is useful in a more dynamic, abstract level above. This was within javascript at this point, not PHP, but was still the value of the true, internal representation of a PHP object.
There are many challenges in these situations, including when it comes to comparisons. In the end though, at least in my opinion, it was more important to respect and adhere to the dynamic nature of PHP. This was a matter of function, not a matter of logical interpretation or intuitive perception etc.
The discussion of whether “0” == true needs to be based on what “0” really is. As a numerical type, 0 is the binary value for false. However, as a string, “0” is valid, non null, and non empty. So in the end, it’s a matter of whether you think “0” should be seen as a string, a numerical type, or as third option, whether you should try to handle it as both in different situations. Each view has valid, empirical reasons for why one way is more or less useful than the other, yet in the end there is no real or “factual” answer to this, and will always come down to how you, I, and the other guy values “0”.
Edit: Also keep in mind that comparisons are one of the most basic operation of a language. While it is possible to make comparisons more “intelligent”, this requires more logic within the actual comparison operation itself. With the frequency that comparisons are used, this can cause a negative effect on performance, especially with string type. So even this side of the discussion would likely become a matter of preference and/or priority.
Edit 2: To answer the root of your question more fully, dynamic types need either a form of dynamic comparison or a larger variety of comparison operators.
- ‘==’ alone will only suffice if the operation ‘==’ performs is smart enough to know how to handle each type under all use scenarios, specifically if the left operand and the right operand are different types. The downside to the more dynamic operation is a likely performance hit. The smarter the operation is, the more noticeable the cost is likely to be.
- more forms of operators (such as the addition of ‘===’) may seem foreign to people new to your language and will likely receive criticism.
- third option would be forcing appropriate comparisons for each type at all times, but this will sacrifice dynamism and will go against the goals of a dynamic language (at least in my opinion).
PHP tries to handle comparisons dynamically, but includes ‘===’ to allow strict comparisons as well. The only other option would be to make the default comparisons ‘smarter’ which will cost performance.
While it’s true that the additional operators add complication and dynamic comparisons leave room for error, a junior high student should be able to grasp the concept (and many do). ‘==’ means that left and right operands can be different types, but can lead to different results in specific, defined situations. ‘===’ means that left and right must be the same types. As for practice, it’s better to use ‘===’ when checking the return value of a function, as well as is_string, is_int, is_bool, etc. This is the downside of a dynamic language, and not doing so will likely lead to unexpected comparison results at times.
But in the end, a ‘good’ comparison will always follow a defined, documented behavior. A language can use Klingon for comparisons as long as the results are well defined.
Data on the Web is transmitted as strings, for example a GET request:
www.foo.bar/?number=2
Even though we intuitively know that this is a number that is sent as a request and not a string, technically it is a string since it is the only type of data you can get from an url.
IMO, Web languages such as PHP and Javascript have dynamic typing allowing automatic conversion of types depending on context mainly because of that. This is a pragmatic approach and I think this is also why they vastly dominate the Web.
I think that it’s likely that if PHP had been created for another environment than the Web (like Python, C, whatever…) the typing rules would have been different and probably more strongly typed and less dynamic, but this is a Web language and dynamic typing fits the ecosystem it is in.
Another point is this. When I have
"x" == "x"
"2" < "100"
I’d expect also that
"x2" < "x100"
"2x" < "100x"
In other words, comparision result should remain the same when I append the same string on the front or the end. This is the same with numbers:
if a < b then so is 2*a < 2*b and 2+a < 2+b and a-2 < b-2
=== EDIT ===
It looks like I got downvoted because people read the above in the following way:
String operations must behave like numerical ones.
or
String operations must behave like I said because numerical ones do.
And then give the “counterexample” of
2+a == a+2 therefore “2x” == “x2”??
No, that’s not what I said.
Here it is more formal:
If e and § form a Monoid on some type T, then we should have it that, for any x,y,z of type T the following holds:
(z § x) < (z § y) <=> x < y
(x § z) < (y § z) <=> x < y
Since
0 and + form a Monoid on Integers
1 and * form a Monoid on Integers
"" and append form a Monoid on Strings
[] and list concatenation form a Monoid on lists, ...
we could expect the above general laws to hold. They don’t in PHP, so that’s why we say PHP string comparision is broken.
Observe that
a § b == b § a
is not a requirement for Monoids, though it is true for addition and multiplication of numbers. Specifically, it is invalid to conlude that string append must be commutative because integer addition is commutative.
9
You ask what situation would ever come up in which you want to compare 100 to “105” and want the string to be converted to a number, but I’m having more trouble coming up with a situation in which you’d want to deliberately compare 100 to “105” at all.
This situation happens in PHP all the time, of course, as well as in many other programming languages, but convert-and-compare is almost never what you actually want. Very nearly 100% of the time, you want to compare two numbers or two strings, and the only reason the types don’t match is that something’s wrong in the way you process the user input.
PHP’s tries to Do What I Mean, which it assumes to be “convert the string to a number, and compare the numbers.” At first blush, this sounds like a reasonable way of doing things, as long as PHP’s assumptions about what you want are correct. But the moment you depart from those assumptions, this comes back to bite you. It might not even cause harm during that particular operation, instead wreaking havoc much later down the line, but the earlier the error can be caught, the easier it is to find the source and fix it.
And that, ultimately, is the problem with PHP’s type comparison: in a well-meaning attempt to Do What I Mean, it masks errors that come up earlier in the code. Hidden bugs stay hidden because of it, only surfacing later to cause trouble.