Importance of diffing and merging for design specifications documentation

The context of this question is choosing tools for writing design specifications for software projects.

These documents will be written and maintained by architects and developers, I’m not talking about marketing requirements. Some of them may be shared outside the team, but only in a processed, non-editable form (PDF, all the docs are assumed being able to be exported to this format).

These are architecture documents, describing the structure of code components, implementation methods, protocols, data formats, etc. They take the form of text with diagrams and identifier names and the occasional code snippets, this isn’t about API documentation that might be generated from source files.

The docs will be under source control, fortunately nobody here needs to be educated about that. It’s inevitable that versioning will arise over time: we do maintain old versions of the software. Issue tracking might not be adhered to as strictly for documentation as for code.

How important is it to be able to easily compare and merge such documents? We have diverging opinions in the team, ranging from “nobody ever merges documentation and if needed Word has a merge tool” to “merging is crucial and git merge must work”. Note “Word has a merge tool” is something that I quote but don’t agree with, having had the painful experience of merging two bugfixes that I had made to two copies of the same document

I have a vague memory of a rule in some company (Google, perhaps, since it’s so often cited) that “if you can’t merge it, it doesn’t exist”, but I’m unable to find it now.

I’m looking for either well-reasoned arguments, or authoritative-looking (and preferably motivated) citations.

For discussion in your case, it is important to take into account that diff feature is already available in your “setup” (more on that below).

Consider re-focusing discussion regarding diffs to how easy you need it to be, or more precisely, do you need “extra” usability features provided by diff tools targeted at working with code (more on that below).

Diff capability is there due to the fact that all the docs are assumed being able to be exported to PDF. Thing is, there are tools ^{1, 2} capable to produce PDF diffs.

^{For the sake of completeness it is worth noting that PDF also can be saved as plain text, which in theory would make things essentially the same as working with routine code diffs, but since all my attempts to use it led to meaningless garbage, I won’t go further here.}

In my experience, working with PDF diffs was pretty much close to regular code diffs, in the sense that one can refer, describe and discuss these in a meaningful way. “In 7th diff at page 654, word foo should be replaced with bar“. Reviewing some 200-300 diffs in 2-3 files as a matter of 2-3 hours has been perfectly doable this way.

Adding that PDF also supports ³ annotations, reviewing PDF diffs may feel almost like working with a real code review tool like Cricible or Code Collaborator.

The question you should ask here is, again, how easy you need it to be? Does anticipated usage of doc diff involve scenarios where code review tools offer substantial benefit over PDF diffs?

Say, having 2-3 “rounds” review of 1000 diffs in 100 files in a day is perfectly sensible with a decent code review tool, but I can not even imagine this to be done with PDF diffs, as there is unlikely a demand in a tool capable of that, given typical PDF usage.

For your particular case, I’d say it doesn’t matter that much for the following reasons:

It’s already a version controlled document, so you can reach back in time to review old versions as necessary.
It’s already strictly access controlled for writable copies.
The pain driving you to consider a more easily merged form appears to be a low frequency event as evidenced by the conflicting responses from your fellow teammates. If they had all experienced that problem on a regular basis then they would support a more easily merged format.

As an alternative solution, consider generating an additional redline document that highlights the changes between versions. Despite it’s many frustrations, MS Word provides change tracking capabilities and can track changes from multiple users within a document. To keep your main line document branch clean, you could create a separate branch for redline documents.

Here are some contraindications that would make me change my answer to “yes, it matters incredibly!”

If the document could be edited by outside of your core team and injected back into the workflow stream, I would insist on easy merge & compare capabilities. For example, if you were exchanging design documents with a client and the requirements were in high flux then merging would be critical.
If multiple versions of the document exist and are edited at the same time. The classic scenario here is emailing out a draft spec which is edited by multiple recipients. Those editors then email out to their changes only to their own cadre of reviewers who then repeat the cycle. (And I’ve been there, done that on this one. Document control here hurts!)
If you didn’t have a central repository to provide a known, good, latest copy of the document.
If there was a lot of content overlap in the various documents and the overlapping content was subject to frequent change so multiple document updates were required for a single code change.

To answer you question properly, we first have to assess whether easily could ever be used to describe the process of comparing and merging of documents.

Documents are not code. There are many obvious differences between merging code and merging documents, but some subtle ones also: if a mistake is made when merging code there are many automated ways of detecting this, ranging from compiler errors to formal unit-testing. Conversely, with documents, a visual inspection will almost always be required; just because document parts don’t appear to conflict, (and still conform to DITA rules in this scenario) there is no guarantee that a change in a title (or even the way it is rendered) in one part of the document can leave another part redundant or invalid.

The above is an excerpt from a recent blog post by myself:

Reviewing XML Documents for an N-Way Merge

N-Way merging was considered because documents have narrative content. As such, they are often edited and reviewed in a more ad-hoc way to code that has a very strict structure (so far as a compiler or interpreter is concerned).

My view after working on a solution for this problem is that this kind of merge is unlikely ever to be ‘easy’ but it is certainly possible to make it manageable.

To help illustrate some of the challenges, here is one view of a prototype web app (the one used for the blog) for exploring review methods for an N-Way document merge:

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 14:15

Thẻ: comparison, configuration-management, documentation, merging

Importance of diffing and merging for design specifications documentation

The context of this question is choosing tools for writing design specifications for software projects.

I have a vague memory of a rule in some company (Google, perhaps, since it’s so often cited) that “if you can’t merge it, it doesn’t exist”, but I’m unable to find it now.

I’m looking for either well-reasoned arguments, or authoritative-looking (and preferably motivated) citations.

For discussion in your case, it is important to take into account that diff feature is already available in your “setup” (more on that below).

Diff capability is there due to the fact that all the docs are assumed being able to be exported to PDF. Thing is, there are tools ^{1, 2} capable to produce PDF diffs.

^{For the sake of completeness it is worth noting that PDF also can be saved as plain text, which in theory would make things essentially the same as working with routine code diffs, but since all my attempts to use it led to meaningless garbage, I won’t go further here.}

Adding that PDF also supports ³ annotations, reviewing PDF diffs may feel almost like working with a real code review tool like Cricible or Code Collaborator.

The question you should ask here is, again, how easy you need it to be? Does anticipated usage of doc diff involve scenarios where code review tools offer substantial benefit over PDF diffs?

For your particular case, I’d say it doesn’t matter that much for the following reasons:

It’s already a version controlled document, so you can reach back in time to review old versions as necessary.
It’s already strictly access controlled for writable copies.
The pain driving you to consider a more easily merged form appears to be a low frequency event as evidenced by the conflicting responses from your fellow teammates. If they had all experienced that problem on a regular basis then they would support a more easily merged format.

Here are some contraindications that would make me change my answer to “yes, it matters incredibly!”

If the document could be edited by outside of your core team and injected back into the workflow stream, I would insist on easy merge & compare capabilities. For example, if you were exchanging design documents with a client and the requirements were in high flux then merging would be critical.
If multiple versions of the document exist and are edited at the same time. The classic scenario here is emailing out a draft spec which is edited by multiple recipients. Those editors then email out to their changes only to their own cadre of reviewers who then repeat the cycle. (And I’ve been there, done that on this one. Document control here hurts!)
If you didn’t have a central repository to provide a known, good, latest copy of the document.
If there was a lot of content overlap in the various documents and the overlapping content was subject to frequent change so multiple document updates were required for a single code change.

To answer you question properly, we first have to assess whether easily could ever be used to describe the process of comparing and merging of documents.

Documents are not code. There are many obvious differences between merging code and merging documents, but some subtle ones also: if a mistake is made when merging code there are many automated ways of detecting this, ranging from compiler errors to formal unit-testing. Conversely, with documents, a visual inspection will almost always be required; just because document parts don’t appear to conflict, (and still conform to DITA rules in this scenario) there is no guarantee that a change in a title (or even the way it is rendered) in one part of the document can leave another part redundant or invalid.

The above is an excerpt from a recent blog post by myself:

Reviewing XML Documents for an N-Way Merge

My view after working on a solution for this problem is that this kind of merge is unlikely ever to be ‘easy’ but it is certainly possible to make it manageable.

To help illustrate some of the challenges, here is one view of a prototype web app (the one used for the blog) for exploring review methods for an N-Way document merge:

Filed under: softwareengineering - @ 14:15

Thẻ: comparison, configuration-management, documentation, merging

Thiết kế website giá rẻ

Danh mục

Importance of diffing and merging for design specifications documentation

Importance of diffing and merging for design specifications documentation