Simon Peyton Jones himself recognizes that reasoning about performance in Haskell is hard due to the non strict semantics.
I have yet to write a significant project in haskell so I wonder: can I reason about performance only at the beginning of a project (when choosing basic data structures & IO library) and whenever a problem arise, deal with it with the profiler?
To put it differently, is it possible (ie not too painful) to postpone dealing with performance when you have performance issues, or do you have to learn to predict how GHC will run your code (for exemple: infer what the strictness analyser will decide)?
1
The other answers provide broad advice about performance reasoning. This answer specifically addresses non-strict semantics.
While laziness does make it harder to reason about performance, it isn’t as complicated as you might think. Although laziness is quite useful in some situations, most of the time a lazy language gets used in the same way that a strict language would be used. Consequently, performance reasoning for strict languages can be applied (with a few adjustments) to lazy languages.
In terms of time complexity, eager evaluation does strictly more work than lazy evaluation. Both strategies produce the same result in most cases. (More precisely, if eager evaluation doesn’t run into any errors, it produces the same result as lazy evaluation.) Therefore, to reason about the time complexity of a Haskell program, you can pretend that it evaluates eagerly. In those infrequent situations where laziness matters, this estimate will be too high and should be revised downwards.
While lazy evaluation gives you lower time complexity than eager evaluation, it sometimes gives you higher space complexity, i.e. space leaks. Higher space complexity can be fixed by adding strictness annotations to make a program execute more eagerly. Profiling tools are pretty good at tracking down the cause of space leaks. I’d categorize this as either correctness debugging or performance debugging, depending on the severity.
4
Optimize the big stuff before you code, and the little things when you’re finished.
For example, before you start coding you should be thinking about the following:
- Are the libraries/frameworks that you’re going to use decently fast?
- Try to keep your data structures simple.
- Try to keep your algorithms and design patterns as simple as possible.
- How fast is the language I’m using?
…and so on.
Then, when you’re almost finished with your code, think about the little things like which built-in function is faster, should I rewrite that area of code to make it more efficient, etc.
This is true for any language, and it really depends on what type of software that you’re writing. Haskell is a general-purpose language, so you’re probably (correct me if I’m wrong) not making anything that needs to be extremely fast. If so, you should be using a lower-level language. If speed is an issue, but not enough that you should be using a low-level language, then you should be optimizing your performance based on the above.
Of course, everybody does things differently, so if your personal experience makes you think that you should be doing it differently based on your situation, then you probably should.
4
I’d like to add to Dynamic’s answer:
- Make your code modular and loosely coupled.
- Hide implementation details of your modules.
When you realize later in development what bottlenecks your code has, it can be really painful to refactor the whole project. If your code is well structured, the bottlenecks are easier to find and to optimize/fix.