The context of this question is choosing tools for writing design specifications for software projects.
These documents will be written and maintained by architects and developers, I’m not talking about marketing requirements. Some of them may be shared outside the team, but only in a processed, non-editable form (PDF, all the docs are assumed being able to be exported to this format).
These are architecture documents, describing the structure of code components, implementation methods, protocols, data formats, etc. They take the form of text with diagrams and identifier names and the occasional code snippets, this isn’t about API documentation that might be generated from source files.
The docs will be under source control, fortunately nobody here needs to be educated about that. It’s inevitable that versioning will arise over time: we do maintain old versions of the software. Issue tracking might not be adhered to as strictly for documentation as for code.
How important is it to be able to easily compare and merge such documents? We have diverging opinions in the team, ranging from “nobody ever merges documentation and if needed Word has a merge tool” to “merging is crucial and git merge
must work”. Note “Word has a merge tool” is something that I quote but don’t agree with, having had the painful experience of merging two bugfixes that I had made to two copies of the same document
I have a vague memory of a rule in some company (Google, perhaps, since it’s so often cited) that “if you can’t merge it, it doesn’t exist”, but I’m unable to find it now.
I’m looking for either well-reasoned arguments, or authoritative-looking (and preferably motivated) citations.
10
For discussion in your case, it is important to take into account that diff feature is already available in your “setup” (more on that below).
Consider re-focusing discussion regarding diffs to how easy you need it to be, or more precisely, do you need “extra” usability features provided by diff tools targeted at working with code (more on that below).
Diff capability is there due to the fact that all the docs are assumed being able to be exported to PDF. Thing is, there are tools 1, 2 capable to produce PDF diffs.
- For the sake of completeness it is worth noting that PDF also can be saved as plain text, which in theory would make things essentially the same as working with routine code diffs, but since all my attempts to use it led to meaningless garbage, I won’t go further here.
In my experience, working with PDF diffs was pretty much close to regular code diffs, in the sense that one can refer, describe and discuss these in a meaningful way. “In 7th diff at page 654, word foo
should be replaced with bar
“. Reviewing some 200-300 diffs in 2-3 files as a matter of 2-3 hours has been perfectly doable this way.
- Adding that PDF also supports 3 annotations, reviewing PDF diffs may feel almost like working with a real code review tool like Cricible or Code Collaborator.
The question you should ask here is, again, how easy you need it to be? Does anticipated usage of doc diff involve scenarios where code review tools offer substantial benefit over PDF diffs?
Say, having 2-3 “rounds” review of 1000 diffs in 100 files in a day is perfectly sensible with a decent code review tool, but I can not even imagine this to be done with PDF diffs, as there is unlikely a demand in a tool capable of that, given typical PDF usage.
2
For your particular case, I’d say it doesn’t matter that much for the following reasons:
-
It’s already a version controlled document, so you can reach back in time to review old versions as necessary.
-
It’s already strictly access controlled for writable copies.
-
The pain driving you to consider a more easily merged form appears to be a low frequency event as evidenced by the conflicting responses from your fellow teammates. If they had all experienced that problem on a regular basis then they would support a more easily merged format.
As an alternative solution, consider generating an additional redline document that highlights the changes between versions. Despite it’s many frustrations, MS Word provides change tracking capabilities and can track changes from multiple users within a document. To keep your main line document branch clean, you could create a separate branch for redline documents.
Here are some contraindications that would make me change my answer to “yes, it matters incredibly!”
-
If the document could be edited by outside of your core team and injected back into the workflow stream, I would insist on easy merge & compare capabilities. For example, if you were exchanging design documents with a client and the requirements were in high flux then merging would be critical.
-
If multiple versions of the document exist and are edited at the same time. The classic scenario here is emailing out a draft spec which is edited by multiple recipients. Those editors then email out to their changes only to their own cadre of reviewers who then repeat the cycle. (And I’ve been there, done that on this one. Document control here hurts!)
-
If you didn’t have a central repository to provide a known, good, latest copy of the document.
-
If there was a lot of content overlap in the various documents and the overlapping content was subject to frequent change so multiple document updates were required for a single code change.
2
To answer you question properly, we first have to assess whether easily could ever be used to describe the process of comparing and merging of documents.
Documents are not code. There are many obvious differences between merging code and merging documents, but some subtle ones also: if a mistake is made when merging code there are many automated ways of detecting this, ranging from compiler errors to formal unit-testing. Conversely, with documents, a visual inspection will almost always be required; just because document parts don’t appear to conflict, (and still conform to DITA rules in this scenario) there is no guarantee that a change in a title (or even the way it is rendered) in one part of the document can leave another part redundant or invalid.
The above is an excerpt from a recent blog post by myself:
Reviewing XML Documents for an N-Way Merge
N-Way merging was considered because documents have narrative content. As such, they are often edited and reviewed in a more ad-hoc way to code that has a very strict structure (so far as a compiler or interpreter is concerned).
My view after working on a solution for this problem is that this kind of merge is unlikely ever to be ‘easy’ but it is certainly possible to make it manageable.
To help illustrate some of the challenges, here is one view of a prototype web app (the one used for the blog) for exploring review methods for an N-Way document merge: