Reduce fetch and checkout times in git
Some repos can be huge, like Azure/azure-sdk-for-net (at the time this was written) due to a number of factors, like history, old binaries, or other large files. A repo could also have a relatively small history but a huge amount of files that take a very long time to check out. You can both reduce the time it takes to fetch such a repo and how long it takes to check out files.
Reduce how much you fetch
One way to reduce how much much you fetch is to fetch a single branch. For example, assuming your upstream
and origin
remotes are set to the primary repository and your fork respectively, it’s rarely necessary to fetch more than the main branch e.g., git fetch upstream main
to fetch just the upstream
remote’s main
branch. This will avoid fetching any other branches and all the objects referenced by those branches, tags, etc. I take advantage of this in my git sync
alias, for example.
But if you often find yourself just running something like git pull
, it’s beneficial to set remote tracking refspecs in your configuration file to only those branches of interest.
If you open your .git/config file from the root of your repo, you should see something like this somewhere:
[remote "upstream"]
url = https://github.com/Azure/azure-sdk-for-net.git
fetch = +refs/heads/*:refs/remotes/upstream/*
This means that any remote branches from upstream
will be fetched if not specified explicitly. You can change that pattern and add more, for example:
[remote "upstream"]
url = https://github.com/Azure/azure-sdk-for-net.git
fetch = +refs/heads/main:refs/remotes/upstream/main
fetch = +refs/heads/release/*:refs/remotes/upstream/release/*
This will fetch only main
and any branch starting with release/
from the upstream
remote if no branch is specified explicitly.
If you work in a large team, this can save a lot of time if people are often pushing branches to a shared remote like your typical upstream
remote.
Shallow clones
Another way to reduce how much you fetch is to create a shallow clone:
git clone --depth=1 https://github.com/heaths/azure-sdk-for-net.git
cd azure-sdk-for-net
git remote add upstream https://github.com/Azure/azure-sdk-for-net.git
Note that none of the history prior to the cloned branch - often main
for many repos - will not be fetched, nor will any objects those commits reference. This could affect some commands that depend on the history, though that’s probably unlikely in most cases. Any commits after that time will continue to accrue, however.
You can run git fetch --unshallow
at any time to restore full history. There are also ways you can make an existing repo a shallow clone, but by then you’ve already fetched the brunt of the repo. Given that fact, and that it’s not straight forward, I’ll not cover that now.
Reduce how much you check out
When you run git checkout {branch}
or similar, it checks out all files in the HEAD
of that branch in your local repo. To limit how many files are checked out, you can either create a sparse clone or set up a sparse checkout on an existing repo.
Sparse checkout when cloning
To create a sparse checkout when you initially clone, run the following command:
git clone --sparse https://github.com/heaths/azure-sdk-for-net.git
cd azure-sdk-for-net
git remote add upstream https://github.com/Azure/azure-sdk-for-net.git
By default, this creates a .git/info/sparse-checkout file with the default content:
/*
!/*/
The file format is similar to .gitignore but with the default cone
mode you can only specify directories. This default value checks out any files - including dotfiles - directly under the repo root directory, but no subdirectories. You can use the git sparse-checkout add
command to add patterns, but these will be merely appended to the end of the file. If you add a directory that already exists, it will be added to the end while the old entry/entries remain resulting in duplicates. git sparse-checkout add
also checks out files immediately, so if you want to negate any paths beneath it may waste a bit of time.
Instead, I find it easier to just open .git/info/sparse-checkout and modify it by hand. For example, in the repo I’ve been using as an example, I might want to only check out engineering system files and services I’m working on:
/*
!/*/
/.config/
/.vscode/
/common/
/eng/
/sdk/cognitivelanguage/
/sdk/keyvault/
After you make modifications, run git sparse-checkout reapply
to affect changes.
Non-cone mode
Though the default content checks out all files under the root, it seems no other path can specify files when using cone
mode. If you have files under a directory e.g., sdk/*
that you need and want to negate the rest e.g., !sdk/*/
you’ll need to pass --no-cone
to git sparse-checkout set
along with all the patterns you want to enable; though, git sparse-checkout set --no-cone
enables options to disable cone
mode so you could still edit .git/info/sparse-checkout by hand afterward.
git sparse-checkout set --no-cone '/*' '!/*/' '/.config' '/.vscode' '/common' '/eng' '/sdk/*' '!/sdk/*/' '/sdk/cognitivelanguage' '/sdk/keyvault'
git sparse-checkout reapply
Converting an existing repo to sparse checkouts
If you have already cloned a repo, you can create a sparse checkout by running git sparse-checkout set
. Optionally it can take patterns on the command line or from stdin with --stdin
, but I still personally find manually editing the .git/info/sparse-checkout file afterward is easier.
Note that if you use worktrees - another way to reduce how much you fetch if you need multiple clones of a single repository - git sparse-checkout set
will create worktree-specific configuration to avoid adversely affecting other worktrees.
Combining
You can, of course, combine both approaches to really trim how much you fetch and checkout:
git clone --depth=1 --sparse https://github.com/heaths/azure-sdk-for-net.git
cd azure-sdk-for-net
git remote add upstream https://github.com/Azure/azure-sdk-for-net.git
You can even do this with the GitHub CLI, which conveniently clones your fork (if any, which is recommended) as origin
and the upstream
automatically:
gh repo clone azure-sdk-for-net -- --depth=1 --sparse
History
- 2022-11-09 - Updated now that non-
cone
mode is deprecated.