Why is it that diff programs work on a line-by-line basis instead of a hierarchal one?
All code can be expressed in a hierarchy, even though it’s not immediately apparent.
Most of the data we work with is hierarchal as well.
What are the potential issues with building a piece of software that can diff based on hierarchy?
4
-
Because usually, diffs are created to be able to compare any file, not only hierarchical-organized source code or data.
-
Because in order to obtain a tree from a source code, one needs to parse it first. Reading lines – every app can do that. Being able to parse C++, Ada, Java, COBOL, Haskell and hundreds of programming languages and non-programming languages is not so easy.
-
Because showing some code as a tree will be extremely ugly. Imagine PHP code mixed with HTML with a deep hierarchy (including PHP code in HTML attributes).
But in some particular contexts when we are sure to have a limited set of languages, like Visual Studio, it would be nice to have a tree-based diff, indeed (as an option, with a choice between text-based and tree-based diff).