Say, I have a branch with several commits in it.
- In versions A and B, I have a file with mixed line endings (mostly LF, but some CRLF), and made various other changes to it.
- In version C, I made additional changes and also accidentally converted a big chunk of the line endings to Windows (CRLF) style (as if I had run the ‘unix2dos’ program). Oops!
- In versions D and E and so forth up to Z, I continued to make changes.
Now, I just did a diff
when I compared version E to version A and noticed that all the line endings that were not CRLF got unintentionally changed to CRLF. Oops. That’s not good. It will make it really hard to use ‘git blame’ and similar tools later on. Some detective work reveals to me that this happened in version C.
What I wanted to end up with was versions C … Z having all lines except for my intentional changes unaffected (keeping their original line endings). But now, C … Z all contain only CRLF lines, and will create huge merge conflicts when I eventually merge my results back into the master branch. I could just add another revision to the end that would fix the accidental line endings back to what they should have been, but that would interfere with further attempts to use ‘git blame’ and similar tools in the future, so I’d prefer to avoid that. I’d like to rewrite history so that my goof never happened.
So, what do I do now to fix the damage I did by the accidental change to line endings in version C? I want to revise history so that C … Z get at least most of the original line endings that they would have had if I hadn’t accidentally converted all line endings to CRLF in revision C. I don’t care about line endings for the code I added or changed, but I want to keep all other lines of code with the line endings they originally had.
I have tried a wide variety of ‘interactive rebase’ scenarios, and nothing seems to fix it well. What I find is that the merge algorithm used by the interactive rebase sees all of the changes to the lines that were changed from LF to CRLF as being lines that were ‘changed’, so it sees huge blocks of adjacent lines of code as having been deleted with an LF line ending and then inserted with a CRLF line ending. These huge blocks really mess with the ‘diff’ algorithm’s ability to identify blocks of code that were changed for other reasons.
-
I tried doing an interactive rebase to the version before C, using
git rebase -i B
and then fixed the whitespace problems in C, and then tried to continue from there withgit rebase --continue
, but I immediately hit massive conflicts in revision D due to all the changed line endings. I tried fixing them in D (took over an hour: this is a big file) and did anothergit rebase --continue
and then hit the same problem in revisions E and F, and then gave up. -
Using
git rebase -i -s recursive -Xignore-space-at-eol
, on the other hand, makes the diffs look much better (in Visual Studio) because it doesn’t see the changes to line endings, and could therefore do a better job at figuring out changes to the code that didn’t involve line ending changes.However, when all conflicts have been resolved for a given revision, the result of each merge ends up not making any end-of-line whitespace changes at all.
This is because, according to the rules of
-Xignore-space-at-eol
, when it is comparing a line with only whitespace changes to one that is unchanged, it always picks the unchanged one for the final merged output. So my changes to whitespace get discarded in the final merged file, and my only option is to (manually, painfully) add them back in after the merge is done by diffing against the original file A (did I mention that this file is huge?), at which point I do ‘git rebase –continue’ and hit another merge conflict for the next revision, and so forth. (So basically as much work as the version without -Xignore-space-at-eol.)
What I really want is a Git option that isn’t ignore-space-at-eol
, but is instead prefer-space-at-eol
, which is to say that would consider lines to be equivalent for the purposes of diff regardless of their line endings, but when choosing between equivalent lines to be included in the final merged file, in a comparison between a line that had a change in its line ending and a matched equivalent one that didn’t, would always choose the one that had a change (whereas ignore-space-at-eol
always chooses the one that doesn’t have a change).
I’ve spent several days trying to figure out a way to do this, and googling to see if anybody else has solved this problem, and found nothing. Most ‘solutions’ seem to assume I can either run dos2unix or unix2dos on each revision as I am rebasing to quickly normalize the line endings and make the merge go cleanly, but I can’t do that because the file had mixed line endings to start with. Approaches like filter-branch or filter-repo don’t seem helpful either.
Is there some straightforward way to push whitespace changes (indentation changes, end-of-line changes, trailing whitespace, etc.) through multiple revisions as part of an interactive rebase without confusing the diff algorithm and thus creating lots of merge conflicts for each changed block of lines? Something like the prefer-space-at-eol suggestion I made above?
Any ideas?