How to git fetch pull requests only after a given PR?
As project maintainer, when you review pull requests from GitHub in order to potentially merge them, you typically add the following line to your .git/config
file. Maybe there are other methods, that’s just the one I use.
[remote "origin"]
url = ....
fetch = +refs/heads/*:refs/remotes/origin/*
this => fetch = +refs/pull/*/head:refs/pull/origin/*
At every git fetch
(or git pull
), all new PR’s are fetched and you can work on them.
If you work on a fresh git work area, the first git fetch
downloads all existing PR’s since the beginning.
There is one situation where this is a problem: My repo went fat because of past errors on binary files. The .git
subdirectory was 2 GB big and any git clone
operation took ages. This history was rewritten, pushed, the branches were cleaned up, etc. Now, the .git
history is only 43 MB. So far, so good. Except that when you add the fetch = +refs/pull/*/...
line, it fetches all past PR’s, including before the history rewrite, which are based on the old fat history. The first git fetch
takes ages and the .git
subdirectory is back to 2 GB. All the old fat history was downloaded only because of old PR’s.
Assuming that we are no longer interested in PR’s before a given one, how would you configure your local repo to fetch only PR’s which are more recent than that one?
It is possible to fetch a selected PR with the following line in .git/config
. Being based on the current (rewritten) history, it does not inflate the repo more than necessary.
fetch = +refs/pull/1507/head:refs/pull/origin/1507
However, we need to add such lines one by one for any new PR.
Is there a way to say “fetch all PR starting from 1507 onwards”? Or any other PR selection criteria, such as time?
The basic idea is to avoid all PR’s before a given point.
Killing the GitHub repo and recreating it is not an option. This is an open source project with too much history, releases, issues, discussions, etc. If you are interested, this is https://github.com/tsduck/tsduck
Is there a way to say “fetch all PR starting from 1507 onwards”? Or any other PR selection criteria, such as time?
I don’t think there is a native way, which means you need to script it.
And the gh pr
commands should be helpful, both for:
- listing the PRs:
gh pr list
. - checkout the ones you need:
gh pr checkout
.
2
Easy way to do dramatically cut clone/fetch size in cases like this is
git clone -n --filter=tree:0 $url $path
cd $path
git config --add remote.origin.fetch +refs/pull/*/head:refs/pull/origin/*
git fetch # to tee up the barest sketch of the pulls
git fetch --filter=blob:limit=32k
git checkout
to start from just the commit metadata for pretty much everything, but that’ll be pleasingly compact, then the last fetch and resets the filter and the checkout uses it.
If you want to play around with a local repo, you can set up to allow local filtering with e.g.
git config uploadpack.allowfilters 1
git config uploadpack.allowanysha1inwant 1
git clone -n --filter=tree:0 file://$PWD `mktemp -d`; cd $_ # history-sketch no-checkout clone
git verify-pack -v .git/objects/pack/*.idx # show exactly what got fetched
git fetch --filter=blob:limit=32k
git verify-pack -v .git/objects/pack/*.idx # this gets just the tip tree, no checkout yet
git checkout
git verify-pack -v .git/objects/pack/*.idx # now you've added just the checked-out tree
with any quoting you need if you’ve got spaces in your own paths.