I was reading an article about bad programming practices.
It mentioned –
“Yo-Yo code” that converts a value into a different representation, then converts it back to where it started (eg: converting a decimal into a string and then back into a decimal, or padding a string and then trimming it)
I don’t understand why the particular example he gives is a bad way to write programs. It seems okay to me to convert back if the situation requires it so that the value can be used.
Can anyone explain more about this?
20
Even if you do need both the numeric and the string representation of a number, it’s better to convert just once and also hang on to the original value, instead of converting again every time you need one or the other.
The principle is, as always, that code that doesn’t exist cannot have subtle defects, while code that exists often does. That may sound paranoid, but experience teaches us that it’s appropriate. If you approach programming with a permanent light anxiety of “I’m not really smart enough to understand this complex system”, you’re on the right track.
22
It’s bad for three major reasons:
- It shows you haven’t thought about what type/format the variable should actually be but are instead converting it to what you need at that moment. This shows lack of design thought.
- It’s probably wasteful. You are almost certainly wasting cycles, and lines of code, doing conversions that don’t need to be there. This will make your code slower and more bloated than it needs to be.
- Type conversions are prone to subtle errors. By dotting these conversions through your code you are increasing the likelihood of error.
I suspect reason 1 is the reason that your source was thinking of based on the context in which it was mentioned.
I would rephrase the description as “code which converts a type to a different representation for the purpose of doing something which could have been done just as well or better in the original and then converts it back. There are many situations where converting something to a different type, acting upon it, and converting it back is entirely appropriate and failure to do so would result in incorrect behavior.
As an example where conversion is good:
One has four float
values of arbitrary signs whose magnitudes may differ by a factor of up to 1,000, and one needs to compute the sum to within 0.625 units in the last place. Converting all four values to double
, computing the sum, and converting the result back to float
will be much more efficient than would be any approach using float
alone.
Floating-point values are at best accurate to 0.5 units in the last place (ULP). This example would require that the worst-case rounding error by no more than 25% above optimum worst-case error. Using a double will yield a value which will be accurate within 0.5001 ULP. While a 0.625 ULP requirement might seem contrived, such requirements are often important in successive-approximation algorithms. The more tightly the error bound is specified, the lower the worst-case iteration requirement.
As an example where conversion is bad:
One has a floating-point number, and wishes to output a string which will represent its value uniquely. One approach is to convert the number to a string with a certain number of digits, try to convert it back, and see if the result matches.
But this is actually a poor approach. If a decimal string represents a value which sits almost precisely on the halfway point between two floating-point values, it’s fairly expensive for a string-to-float method to guarantee that it will always yield the nearer float
value, and many such conversion methods don’t uphold such a guarantee (among other things, doing so would in some cases require reading all the digits of a number, even if it was billions of digits long).
It is much cheaper for a method to guarantee that it will always return a value that is within 0.5625 units in the last place (ULP) of the represented value. A robust “reversible” decimal-to-string formatting routine should compute how far the output value is from the correct value, and continue outputting digits until the result is within 0.375 (ULP) if not 0.25 (ULP). Otherwise, it may output a string which some conversion methods will process correctly, but other conversion methods won’t.
It is better to sometimes output a digit that might not be “necessary” than output a value that might be misinterpreted. The key part is that the decision of how many digits should be output should be made based upon numeric calculations related to the output process, rather than upon the result of one particular method’s attempt to convert the string back to a number.
1
Various reasons
-
It’s pointless and adds complexity – both in the amount of code to write and maintain, and the amount of CPU time needed
-
It can lose accuracy or worse, corrupt the value entirely
-
It wastes memory (potentially, depending on language) as you end up storing more representations of a number you need
-
A good practice is only to keep the first, most accurate representation possible, for any data you receive. Perform any calculations using this data, and only ever convert it if you need to output it or display it in an easier to read format.
3
Why? Because even the best of us can make mistakes.
Look at what happened when Microsoft tried to implement a “round-trip” format specifically for making sure float <-> string conversions were safe: https://stackoverflow.com/q/24299692/541686
When I was at school (and post school in electrical engineering) we were taught to divide after multiplying. Division often many digits and get rounded. Multiplying after division multiplies the division error.
Type conversions are the same, you risk losing data. CInt(1.3) = 1.
In my language, Basic, we only do type conversions (a VB6 program spends 90% of it’s time doing ANSI/Unicode conversion, for all API calls the runtime makes).
Type conversion is implied in all we do.
Print 5
The string “5” gets printed from the numeric literal.
form1.caption = "My Form"
The unicode string literal gets converted into a ANSI string and sent to SetWindowsTextA by the forms package.
Even this works in basic
a = "5"
b = 3
c = a + b (= 8)
I’m a variant programmer these days – I don’t even think about type. I just rely on the autoconversions.
Anyway my 3 pet peeves are
Assigning string literals to variables to use them (wastes memory and slow)
Pointless functions when code could be inline (and the compiler will probably undo your function and inline it anyway)
Setting all objects to nothing as the last lines before an End Function or end of program.
and a 4th for short programs
Pointlessly dimming your 3 variables in a 5 line program.