3 minutes
Remove Files From Git Repo
Why Remove Files From A Repo?
I encountered issues migrating repositories from Bitbucket to GitHub because some
repos contained large files that had been committed long ago, exceeding GitHub’s
file size limits. After searching for a solution, I found that my options were to
use git LFS to manage the files and allow for pushing to GitHub or to remove the
files from commit history. For the latter, the best method is to use
git-filter-repo
. Surprisingly, this tool isn’t the first option mentioned in
most guides, even though it’s the quickest and easiest solution I’ve come across.
WARNING: A major caveat with this method is that the commit hashes will change, as the history is being rewritten.
This was acceptable for my scenario, but be sure to understand this before proceeding.
The Issue
While migrating a repository, I cloned the Bitbucket repo:
$ git clone ssh://git@bitbucket.org/example_project/example_repo.git
Cloning into 'example_repo'...
remote: Enumerating objects: 12024, done.
remote: Counting objects: 100% (12023/12023), done.
remote: Compressing objects: 100% (4717/4717), done.
remote: Total 12023 (delta 7084), reused 8839 (delta 4984), pack-resused 0
Receiving objects: 100% (12024/12024), 198.48 MiB | 17.58 MiB/s, done.
Resolving deltas: 100% (7041/7041), done.
Updating files: 100% (462/462), done.
Then I tried to push it to GitHub:
$ git push --mirror --verbose ssh://git@github.com/example_org/example_repo.git
Pushing to ssh://github.com/example_org/example_repo.git
Enumerating objects: 12023, done.
Counting objects: 100% (12023/12023), done.
Delta compression using up to 12 threads
Compressing objects: 100% (4717/4717), done.
Writing objects: 100% (12023/12023), done.
Total 12023 (delta 7084), reused 12015 (delta 7077), pack-resused 0 (from 0)
remote: Resolving deltas: 100% (7041/7041), done.
remote: error: Trace: <REDACTED>
remote: error: See https://gh.io/lfs for more information.
remote: error: File files/unwanted-file.txt is 215.47 MB; this exceeds GitHub's file size limit of 100.00 MB
To ssh://github.com/example_org/example_repo.git
! [remote rejected] main -> main (pre-receive hook declined)
error: failed to push some refs to 'ssh://github.com/example_org/example_repo.git'
It failed because files/unwanted-file.txt
exceeded GitHub’s 100.00 MB limit.
This file was unnecessary and should never have been committed. Unfortunately,
mistakes happen – we’re only human.
If you’re interested in finding out when the file was committed, you can track
its origin using git log
:
$ git log --oneline -- files/unwanted-file.txt
8940776 Update files
In my case, the file was committed back in 2019, meaning there was a lot of history after this point.
How To Resolve?
- Install the git addon:
$ pip install git-filter-repo
- Analyze the repo:
$ git filter-repo --analyze
- Remove the file in question:
$ git filter-repo --force --invert-paths --path-match files/unwanted-file.txt
files/unwanted-file.txt
Parsed 719 commits
New history written in 0.14 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at 625eaf8 Merge pull request #49 in example_org/example_repo from develop to master
Enumerating objects: 12023, done.
Counting objects: 100% (12023/12023), done.
Delta compression using up to 12 threads
Compressing objects: 100% (4717/4717), done.
Writing objects: 100% (12023/12023), done.
Total 12023 (delta 7084), reused 12015 (delta 7077), pack-resused 0 (from 0)
Completely finished after 17.40 seconds.
- Now you should be able to push your changes to GitHub:
$ git push --mirror --verbose ssh://git@github.com/example_org/example_repo.git
Pushing to ssh://github.com/example_org/example_repo.git
Enumerating objects: 12023, done.
Counting objects: 100% (12023/12023), done.
Delta compression using up to 12 threads
Compressing objects: 100% (4717/4717), done.
Writing objects: 100% (12023/12023), done.
Total 12023 (delta 7084), reused 12015 (delta 7077), pack-resused 0 (from 0)
remote: Resolving deltas: 100% (7041/7041), done.
To ssh://github.com/example_org/example_repo.git
+ 1331952...ca25ae3 main -> main (forced update)
Conclusion
If the large files you’re dealing with are actually necessary for your project, you’ll want to look into using Git LFS (Large File Storage). Git LFS is designed specifically to handle large files like these, allowing you to track them without exceeding GitHub’s file size limits. It’s a great option if the files need to stay in the repo, but just remember, it does require some additional setup and configuration.