I have a Git repository which contains a number of subdirectories. Now I have found that one of the subdirectories is unrelated to the other and should be detached to a separate repository.

How can I do this while keeping the history of the files within the subdirectory?

I guess I could make a clone and remove the unwanted parts of each clone, but I suppose this would give me the complete tree when checking out an older revision etc. This might be acceptable, but I would prefer to be able to pretend that the two repositories doesn't have a shared history.

24 Answers
24

You want to clone your repository and then use git filter-branch to mark everything but the subdirectory you want in your new repo to be garbage-collected.

To clone your local repository:

git clone /XYZ /ABC

(Note: the repository will be cloned using hard-links, but that is not a problem since the hard-linked files will not be modified in themselves - new ones will be created.)

Now, let us preserve the interesting branches which we want to rewrite as well, and then remove the origin to avoid pushing there and to make sure that old commits will not be referenced by the origin:

Now you might want to also remove tags which have no relation with the subproject; you can also do that later, but you might need to prune your repo again. I did not do so and got a WARNING: Ref 'refs/tags/v0.1' is unchanged for all tags (since they were all unrelated to the subproject); additionally, after removing such tags more space will be reclaimed. Apparently git filter-branch should be able to rewrite other tags, but I could not verify this. If you want to remove all tags, use git tag -l | xargs git tag -d.

Then use filter-branch and reset to exclude the other files, so they can be pruned. Let's also add --tag-name-filter cat --prune-empty to remove empty commits and to rewrite tags (note that this will have to strip their signature):

and now you have a local git repository of the ABC sub-directory with all its history preserved.

Note: For most uses, git filter-branch should indeed have the added parameter -- --all. Yes that's really --space--all. This needs to be the last parameters for the command. As Matli discovered, this keeps the project branches and tags included in the new repo.

Edit: various suggestions from comments below were incorporated to make sure, for instance, that the repository is actually shrunk (which was not always the case before).

Very good answer. Thanks! And to really get exactly what I wanted, I added "-- --all" to the filter-branch command.
– matliDec 12 '08 at 9:17

12

Why do you need --no-hardlinks? Removing one hardlink won't affect the other file. Git objects are immutable too. Only if you'd change owner/file permissions you need --no-hardlinks.
– vdboorFeb 1 '10 at 9:58

66

An additional step I would recommend would be "git remote rm origin". This would keep pushes from going back to the original repository, if I'm not mistaken.
– TomApr 5 '10 at 19:51

12

Another command to append to filter-branch is --prune-empty, to remove now-empty commits.
– Seth JohnsonSep 12 '11 at 2:31

8

Like Paul, I did not want project tags in my new repo, so I did not use -- --all. I also ran git remote rm origin, and git tag -l | xargs git tag -d before the git filter-branch command. This shrunk my .git directory from 60M to ~300K. Note that I needed to run both of these commands to in order to get the size reduction.
– saltycraneNov 17 '11 at 21:18

The Easy Way™

It turns out that this is such a common and useful practice that the overlords of git made it really easy, but you have to have a newer version of git (>= 1.7.11 May 2012). See the appendix for how to install the latest git. Also, there's a real-world example in the walkthrough below.

Note:<name-of-folder> must NOT contain leading or trailing characters. For instance, the folder named subproject MUST be passed as subproject, NOT ./subproject/

Note for windows users: when your folder depth is > 1, <name-of-folder> must have *nix style folder separator (/). For instance, the folder named path1\path2\subproject MUST be passed as path1/path2/subproject

Note: This leaves all the historical references in the repository.See the Appendix below if you're actually concerned about having committed a password or you need to decreasing the file size of your .git folder.

...

Walkthrough

These are the same steps as above, but following my exact steps for my repository instead of using <meta-named-things>.

Here's a project I have for implementing JavaScript browser modules in node:

Next I create a new repo on Github or bitbucket, or whatever and add it is the origin (btw, "origin" is just a convention, not part of the command - you could call it "remote-server" or whatever you like)

clearing your history

By default removing files from git doesn't actually remove them from git, it just commits that they aren't there anymore. If you want to actually remove the historical references (i.e. you have a committed a password), you need to do this:

After that you can check that your file or folder no longer shows up in the git history at all

git log -- <name-of-folder> # should show nothing

However, you can't "push" deletes to github and the like. If you try you'll get an error and you'll have to git pull before you can git push - and then you're back to having everything in your history.

So if you want to delete history from the "origin" - meaning to delete it from github, bitbucket, etc - you'll need to delete the repo and re-push a pruned copy of the repo. But wait - there's more! - If you're really concerned about getting rid of a password or something like that you'll need to prune the backup (see below).

making .git smaller

The aforementioned delete history command still leaves behind a bunch of backup files - because git is all too kind in helping you to not ruin your repo by accident. It will eventually deleted orphaned files over the days and months, but it leaves them there for a while in case you realize that you accidentally deleted something you didn't want to.

So if you really want to empty the trash to reduce the clone size of a repo immediately you have to do all of this really weird stuff:

That said, I'd recommend not performing these steps unless you know that you need to - just in case you did prune the wrong subdirectory, y'know? The backup files shouldn't get cloned when you push the repo, they'll just be in your local copy.

make that git filter-branch --index-filter "git rm -r -f --cached --ignore-unmatch ABC" --prune-empty HEAD and it will be much faster. index-filter works on the index while tree-filter has to checkout and stage everything for every commit.
– fmarcSep 17 '09 at 19:58

50

in some cases messing up the history of repository XYZ is overkill ... just a simple "rm -rf ABC; git rm -r ABC; git commit -m'extracted ABC into its own repo'" would work better for most people.
– EvgenyOct 28 '10 at 23:24

2

You probably wish to use -f (force) on this command if you do it more than once, e.g., to remove two directories after they have been separated. Otherwise you will get "Cannot create a new backup."
– Brian CarltonApr 18 '11 at 17:59

4

If you're doing the --index-filter method, you may also want to make that git rm -q -r -f, so that each invocation won't print a line for each file it deletes.
– Eric NaesethOct 12 '11 at 19:55

I would suggest editing Paul's answer, only because Paul's is so thorough.
– Erik AronestyMar 5 '14 at 15:38

This does not answer the question. From the docs it says The result will contain that directory (and only that) as its project root. and indeed this is what you will get, i.e. the original project structure is not preserved.
– NicBrightJun 2 '17 at 13:11

1

@NicBright Can you illustrate your issue with XYZ and ABC as in the question, to show what's wrong?
– AdamOct 26 '17 at 16:02

git-subtree is now part of Git, although it's in the contrib tree, so not always installed by default. I know it is installed by the Homebrew git formula, but without its man page. apenwarr thus calls his version obsolete.
– echristophersonMay 10 '13 at 16:04

Note:<name-of-folder> must NOT contain leading or trailing characters. For instance, the folder named subproject MUST be passed as subproject, NOT ./subproject/

Note for windows users: when your folder depth is > 1, <name-of-folder> must have *nix style folder separator (/). For instance, the folder named path1\path2\subproject MUST be passed as path1/path2/subproject. Moreover don't use mvcommand but move.

Final note: the unique and big difference with the base answer is the second line of the script "git filter-branch..."

Note: This leaves all the historical references in the repository.See the Appendix in the original answer if you're actually concerned about having committed a password or you need to decreasing the file size of your .git folder.

This worked for me with slight modification. Because my sub1 and sub2 folders didn't exist with the initial version, I had to modify my --tree-filter script as follows: "mkdir <name-of-folder>; if [ -d sub1 ]; then mv <sub1> <name-of-folder>/; fi". For the second filter-branch command I replaced <sub1> with <sub2>, omitted creation of <name-of-folder>, and included -f after filter-branch to override the warning of an existing backup.
– pglezenFeb 11 '16 at 19:38

This does not work if any of the subdirs have changed during the history in git. How can this be solved?
– nietrasMar 3 '16 at 12:06

@nietras see rogerdpack's answer. Took me a while to find it after reading and absorbing all the info in these other answers.
– AdamOct 30 '17 at 17:18

The original question wants XYZ/ABC/(*files) to become ABC/ABC/(*files). After implementing the accepted answer for my own code, I noticed that it actually changes XYZ/ABC/(*files) into ABC/(*files). The filter-branch man page even says,

The result will contain that directory (and only that) as its project root."

In other words, it promotes the top-level folder "up" one level. That's an important distinction because, for example, in my history I had renamed a top-level folder. By promoting folders "up" one level, git loses continuity at the commit where I did the rename.

My answer to the question then is to make 2 copies of the repository and manually delete the folder(s) you want to keep in each. The man page backs me up with this:

[...] avoid using [this command] if a simple single commit would suffice to fix your problem

I like the style of that graph. May I ask what tool you're using?
– Slipp D. ThompsonMar 30 '13 at 18:17

2

Tower for Mac. I really like it. It's almost worth switching to Mac for in itself.
– MM.Apr 2 '13 at 21:02

2

Yep, though in my case, my subfoldered targetdir had been renamed at some point and git filter-branch simply called it a day, deleting all commits made prior to the rename! Shocking, considering how adept Git is at tracking such things and even migration of individual content chunks!
– Jay AllenMay 31 '13 at 9:25

It appears that most (all?) of the answers here rely on some form of git filter-branch --subdirectory-filter and its ilk. This may work "most times" however for some cases, for instance the case of when you renamed the folder, ex:

ABC/
/move_this_dir # did some work here, then renamed it to
ABC/
/move_this_dir_renamed

If you do a normal git filter style to extract "move_me_renamed" you will lose file change history that occurred from back when it was initially move_this_dir (ref).

It thus appears that the only way to really keep all change history (if yours is a case like this), is, in essence, to copy the repository (create a new repo, set that to be the origin), then nuke everything else and rename the subdirectory to the parent like this:

Clone the multi-module project locally

Branches - check what's there: git branch -a

Do a checkout to each branch to be included in the split to get a local copy on your workstation: git checkout --track origin/branchABC

Make a copy in a new directory: cp -r oldmultimod simple

Go into the new project copy: cd simple

Get rid of the other modules that aren't needed in this project:

git rm otherModule1 other2 other3

Now only the subdir of the target module remains

Get rid of the module subdir so that the module root becomes the new project root

git mv moduleSubdir1/* .

Delete the relic subdir: rmdir moduleSubdir1

Check changes at any point: git status

Create the new git repo and copy its URL to point this project into it:

This will not save you any space in your .git folder, but it will preserve all your change history for those files even across renames. And this may not be worth it if there isn't "a lot" of history lost, etc. But at least you are guaranteed not to lose older commits!

I had exactly this problem but all the standard solutions based on git filter-branch were extremely slow. If you have a small repository then this may not be a problem, it was for me. I wrote another git filtering program based on libgit2 which as a first step creates branches for each filtering of the primary repository and then pushes these to clean repositories as the next step. On my repository (500Mb 100000 commits) the standard git filter-branch methods took days. My program takes minutes to do the same filtering.

For what it's worth, here is how using GitHub on a Windows machine. Let's say you have a cloned repo in residing in C:\dir1. The directory structure looks like this: C:\dir1\dir2\dir3. The dir3 directory is the one I want to be a new separate repo.

As I mentioned above, I had to use the reverse solution (deleting all commits not touching my dir/subdir/targetdir) which seemed to work pretty well removing about 95% of the commits (as desired). There are, however, two small issues remaining.

FIRST, filter-branch did a bang up job of removing commits which introduce or modify code but apparently, merge commits are beneath its station in the Gitiverse.

This is a cosmetic issue which I can probably live with (he says...backing away slowly with eyes averted).

SECOND the few commits that remain are pretty much ALL duplicated! I seem to have acquired a second, redundant timeline that spans just about the entire history of the project. The interesting thing (which you can see from the picture below), is that my three local branches are not all on the same timeline (which is, certainly why it exists and isn't just garbage collected).

The only thing I can imagine is that one of the deleted commits was, perhaps, the single merge commit that filter-branchactually did delete, and that created the parallel timeline as each now-unmerged strand took its own copy of the commits. (shrug Where's my TARDiS?) I'm pretty sure I can fix this issue, though I'd really love to understand how it happened.

In the case of crazy mergefest-O-RAMA, I'll likely be leaving that one alone since it has so firmly entrenched itself in my commit history—menacing at me whenever I come near—, it doesn't seem to be actually causing any non-cosmetic problems and because it is quite pretty in Tower.app.

Create an empty repo somewhere. We'll assume we've created an empty repo called xyz on GitHub that has path : git@github.com:simpliwp/xyz.git

Push to the new repo.
#add a new remote origin for the empty repo so we can push to the empty repo on GitHub
git remote add origin_xyz git@github.com:simpliwp/xyz.git
#push the branch to the empty repo's master branch
git push origin_xyz XYZ:master

Clone the newly created remote repo into a new local directory
#change current directory out of the old repo
cd /path/to/where/you/want/the/new/local/repo
#clone the remote repo you just pushed to
git clone git@github.com:simpliwp/xyz.git

An advantage of this method compared to "The Easy Way" is that the remote is already set up for the new repo, so you can immediately do a subtree add. In fact this way seems easier to me (even without git splits)
– M.MMay 12 '15 at 5:58

Nice post, but I notice the first paragraph of the doc you linked says If you create a new clone of the repository, you won't lose any of your Git history or changes when you split a folder into a separate repository. Yet according to comments on all the answers here both filter-branch and the subtree script result in the loss of history wherever a subdirectory has been renamed. Is there anything that can be done to address this?
– AdamOct 30 '17 at 11:53

Found the solution for preserving all commits, including those preceding directory renames/moves - it's rogerdpack's answer to this very question.
– AdamOct 30 '17 at 17:42

The only problem is that I can't use the cloned repo any more
– QiulangApr 3 '18 at 13:17

You might need something like "git reflog expire --expire=now --all" before the garbage collection to actually clean the files out. git filter-branch just removes references in the history, but doesn't remove the reflog entries that hold the data. Of course, test this first.

Turn git directories into their very own repositories in their own location. No subtree funny business. This script will take an existing directory in your git repository and turn that directory into an independent repository of its own. Along the way, it will copy over the entire change history for the directory you provided.

I'm sure git subtree is all fine and wonderful, but my subdirectories of git managed code that I wanted to move was all in eclipse.
So if you're using egit, it's painfully easy.
Take the project you want to move and team->disconnect it, and then team->share it to the new location. It will default to trying to use the old repo location, but you can uncheck the use-existing selection and pick the new place to move it.
All hail egit.

The "fine and wonderful" part of subtree is that your subdirectory's history comes along for the ride. If you don't need the history, then your painfully easy method is the way to go.
– pglezenFeb 11 '16 at 19:55

Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).