94

Moving Files from one Git Repository to Another, Preserving History

Posted May 17th, 2011 in Development and tagged , , by Greg Bayer
                                

If you use multiple git repositories, it’s only a matter of time until you’ll want to refactor some files from one project to another.  Today at Pulse we reached the point where it was time to split up a very large repository that was starting to be used for too many different sub-projects.

After reading some suggested approaches, I spent more time than I would have liked fighting with Git to actually make it happen. In the hopes of helping someone else avoid the same trouble, here’s the solution that ended up working best. The solution is primarily based on ebneter’s excellent question on Stack Overflow.

Another solution is Linus Torvald’s “The coolest merge, EVER!” Unfortunately, his approach seems to require more manual fiddling than I would like and results in a repository with two roots. I don’t completely understand the implications of this, so I opted for something more like a standard merge.

Goal:

  • Move directory 1 from Git repository A to Git repository B.

Constraints:

  • Git repository A contains other directories that we don’t want to move.
  • We’d like to perserve the Git commit history for the directory we are moving.

Get files ready for the move:

Make a copy of repository A so you can mess with it without worrying about mistakes too much.  It’s also a good idea to delete the link to the original repository to avoid accidentally making any remote changes (line 3).  Line 4 is the critical step here.  It goes through your history and files, removing anything that is not in directory 1.  The result is the contents of directory 1 spewed out into to the base of repository A.  You probably want to import these files into repository B within a directory, so move them into one now (lines 5/6). Commit your changes and we’re ready to merge these files into the new repository.

git clone <git repository A url>
cd <git repository A directory>
git remote rm origin
git filter-branch --subdirectory-filter <directory 1> -- --all
mkdir <directory 1>
mv * <directory 1>
git add .
git commit

Merge files into new repository:

Make a copy of repository B if you don’t have one already.  On line 3, you’ll create a remote connection to repository A as a branch in repository B.  Then simply pull from this branch (containing only the directory you want to move) into repository B.  The pull copies both files and history.  Note: You can use a merge instead of a pull, but pull worked better for me. Finally, you probably want to clean up a bit by removing the remote connection to repository A. Commit and you’re all set.

git clone <git repository B url>
cd <git repository B directory>
git remote add repo-A-branch <git repository A directory>
git pull repo-A-branch master --allow-unrelated-histories
git remote rm repo-A-branch

Update: Removed final commit thanks to Von’s comment.
Update 2: Added “–allow-unrelated-histories” thanks to several comments.

                                

94 Responses so far.

  1. Kate Ebneter says:

    Very nice writeup, and I’m happy to see that my question/answer was helpful!

  2. Von says:

    Just tried this on a couple quickly whipped up repos and it worked well. Only comment was there was nothing to commit after the pull, so I’m not sure you need the last two steps.

  3. Greg Bayer says:

    Great catch! The commit is not required with the pull approach. Will update accordingly.

  4. Anonymous says:

    Thank you, for explaining this!. Your post will be a big help. I have wanted to do this for some time, but kept putting it off!

  5. Joseph Chiu says:

    hi, based on what I read on http://stackoverflow.com/questions/1365541/how-to-move-files-from-one-git-repo-to-another-not-a-clone-preserving-history, I wonder if your line 5 should be “mkdir -p <directory >” and line 6 should be “git mv * <directory 1>” ?

  6. Greg Bayer says:

    Thanks for the feedback. In this case I believe the results should be similar either way.

    1) “mkdir -p” is only required if the new directory is more than one level deep.

    2) There shouldn’t be much difference between “mv *” and “git mv *” in this case.

  7. Mguyre says:

    Need to add a git fetch between steps 4 and 5 for the new repository to retrieve the tags and history

  8. Adam Monsen says:

    Great post, thank you. One suggestion: change the title as follows: s/file/one directory/.

  9. Adam Monsen says:

    Great post, thank you. One suggestion: change the title as follows: s/Files/one directory/.

  10. Adam Monsen says:

    Sorry, I meant: change “Files” to “one directory” or “a directory”.

  11. Greg Bayer says:

    Thanks for the suggestion. You can actually use this approach to move an arbitrary set of files by first moving them into a temporary directory. Because of this, the current title seems to be appropriate and more general.

  12. Navitf says:

    Hi, But when files are moved into a temporary directory, the command: “git filter-branch –subdirectory-filter” extract history that is relevant only to the temporary directory and thus real history logs are not preserved. Any idea how to overcome this?

  13. Greg Bayer says:

    Thats an interesting point. This wasn’t a problem in my case, so I haven’t looked into it. Maybe another reader can suggest a solution?

  14. 123456 says:

    I get an error message when I run the get filter-branch command:
    $ git filter-branch –subdirectory-filter mt — –all
    C:Program Files (x86)Git/libexec/git-core/git-filter-branch: line 289: /libexe
    c/git-core/git: Bad file number
    Could not get the commits

    In typical git fashion, the error message is incomprehensible to me. Any idea what’s going wrong?

  15. Greg Bayer says:

    I think that means git can’t access one of your files. Based on a few posts I see on stackoverflow.com, this could be caused by a bad network connection or proxy configuration.

  16. 123456 says:

    Don’t understand how this could be the case. I cloned the repo to local disk.. and to my understanding, “git remote rm origin” severs the link between my local repo and the remote one.. so I don’t see where networks/proxies would enter into it.

  17. Xavier MARTIN says:

    Really really helpful post…
    Thanks much for sharing 🙂

  18. SteveALee says:

    if you don’t use git mv * the deleted files will not be staged for a commit. Also use -k to skip error of moving the dir itself

  19. Robbie Van Gorkom says:

    We had the same issue at the office, I wrote a script to exactly this. https://github.com/vangorra/git_split

  20. Rasheed Barnes says:

    Good stuff. worked like a charm.

  21. Bernd says:

    Thaaaanks a lot. This post saved me a shit ton of hours. 🙂

  22. devguy says:

    this is cool, but it seems that you can only see the old history if you do git log –follow [file] , which is kinda inconvenient, especially in a large project. am i missing something? is there a way to modify this process so that the –follow is not required?

  23. Greg Bayer says:

    You should be able to see the history in all the normal ways. Personally I tend to use gitk or github to view the history for old files.

    How do you want to be able to view it?

  24. Bastian Krol says:

    Thanks a bunch for this guide.

    Here is my version:

    To move some directories from repository A to repository B without losing history:

    1. git clone tmp-repo
    2. cd tmp-repo
    3. git checkout
    4. git remote rm origin # not really needed
    5. git filter-branch –subdirectory-filter — –all
    6. mkdir -p
    7. git mv -k *
    8. git commit
    9. cd # clone it, if you didn’t do already
    10. Create a new branch and check it out
    11. git remote add origin-tmp-repo
    12. git pull origin-tmp-repo
    13. rm -rf

    Repeat all steps with every that needs to be moved. You’ll need a new tmp-repo for every directory, because “git filter-branch –subdirectory …” can only take one directory as an argument and the repo is largely unusable after executing the command. That’s why there is a rm -rf in step 13. When transferring subsequent directories, steps 10 and 11 can be omitted.

    When you are done with all directories, you should do
    git remote rm origin-tmp-repo and git push in local repository B.

  25. webdevguy says:

    This was very helpful. I had a much simpler requirement. I needed to pull one directory and all it’s history out of one repo and create a new repo for just that directory. Here are the steps that worked for me:

    1. git clone <git repository A url>
    2. cd <new git repository>
    3. git remote rm origin
    4. git filter-branch –subdirectory-filter <directory 1> — –all

  26. Thanks for putting this up, worked like a charm. I just set up git as a deployment mechanism, this helped me split out the bits I needed from my main repository.

  27. using “git mv *” worked better for me. Somehow the history is ‘better’ kept that way… Not sure why and how though.

  28. Arthur Taborda says:

    Easier: to merge files to new repository, simply do:

    git push :master

  29. zupeanut says:

    “It goes through your history and files, removing anything that is not in directory 1”

    The problem with this is if files that are *currently* in directory 1 but previously were not will have history before the move lost.

  30. Cat Lookabaugh says:

    I was reading this post, and it doesn’t seem to quite be what I’m after. Caveat: I’m new to git, so I just may not understand.

    Here’s what I want:

    I’m using git to store documentation. We have different docbooks for each of our products and some content that fits more than one book. So, we have a common repository with that content (in a directory) and other repos for each product. The current process to get at the shared content involves some scripts and the git subtree command, with the shared content ending up in the product repo in a directory called shared-files. It’s a bit convoluted. Currently the shared content is only used for two products.

    I need to add another set of shared content that will be used by all product docbooks (stored in its own directory in common). I want to be able to pull one or both shared content directories into a product repo. I don’t need any history. I just want the current up-to-date common files in my product repo in their own directory. If I want to update the shared content, I will do so directly in the common repo.

    One way to accomplish this would be to clone the common repo and the product repo to my pc and then literally copy the directory I want from common repo into the product repo.

    Is there a relatively simple way to do this without cloning the common repo? I’d appreciate your advice!

  31. Greg Bayer says:

    Have you considered adding the common repo as a submodule of each of the product repos?

  32. Cat Lookabaugh says:

    My research thus far seems to indicate that subtree is preferred over submodule, though i’m not sure why. would submodule option allow me to pick and choose which directory or directories i want from the common repo?

  33. Greg Bayer says:

    It wouldn’t let you choose certain directories, but it might be simpler overall if thats not a hard requirement. You could also combine the submodule approach with a simple script that deletes directories you don’t need from the local clone of the common submodule in each product repo.

    You could also consider creating separate common repos for each directory and only pulling in those you need for each product.

  34. Cat Lookabaugh says:

    food for thought…thanks 🙂

  35. gitnewbie says:

    Used the exact commands listed above and I see only the first 2 entries with git log although git log on the source shows me many more. Looks like partial history is moved.

  36. Eliyahu says:

    Thanks!

  37. Brandon Mintern says:

    Thanks! This was very helpful.

    Note that if your repository B branch has `rebase = true` set, then you will almost certainly want to use `git pull –no-rebase repo-A-branch master`.

  38. Dave says:

    This helped … a lot!!! Thank you Greg!!!

    Motivated by your *excellent* posting (I can’t stress that enough), I dug around some more, since I also had to go the other direction … that is, once I did this, and had my desired subdirectory now as the entire project, I then wanted to submerge it into a subdirectory, still keeping the history, of course (i.e. no –follow required).

    I found http://stackoverflow.com/questions/4042816/how-can-i-rewrite-history-so-that-all-files-are-in-a-subdirectory

    Which had:

    git filter-branch –prune-empty –tree-filter ‘
    if [[ ! -e foo/bar ]]; then
    mkdir -p foo/bar
    git ls-tree –name-only $GIT_COMMIT | xargs -I files mv files foo/bar
    fi’

    This worked for me, pretty much literally.

    If this helps anyone, I dedicate the good will to Greg, who exemplifies what these postings should be all about imho.

    Thanks again Greg.

  39. romu says:

    Hi, bit late to come here now, but I’ve discovered this article because of a similar need.

    I ran this instructions on the code I needed to move, and everything was fine…except a little problem, the history is not preserved at all. The files are well moved but with no history.

    Any idea? Thanks.

  40. Karthik T says:

    I am using this to extract some stuff out as a gem and it works like a … gem! I didnt realize the first set of commands were destructive.. but no worries!

  41. Carlos Vinicius says:

    perfect <3

  42. efalk says:

    This is what I’ve always done, but that “mv * ” or “git mv * ” step results in one massive commit which looks like a zillion file deletions followed by a zillion file adds. Is there no way to get filter-branch to leave the directory structure alone?

  43. NW says:

    This solution appears to only preserve the master branch history for the folder/files moved, not the history of any other branches that involve them. Any thoughts on how to preserve that history as well?

  44. Guest123456 says:

    I hit the same problem, also on a local repo. The issue seemed to be that the $rev_args argument from the relevant git command is just too long (see https://github.com/github/windows-msysgit/blob/master/libexec/git-core/git-filter-branch) – the resulting shell command when that list is expanded is too long which results in the ‘Bad file number’ error.

    I tried the following horrible hack, which worked for me (but YMMV, so be careful!).
    – Find and edit git-filter-branch (this was at /libexec/git-core/git-filter-branch for me, using git-bash on Windows)
    – Comment out lines 287-9 (git rev-list …. || die “Could not get commits”)
    – Replace with this:
    rm ../revs
    for rev in $rev_args
    do
    git rev-list –reverse –topo-order –default HEAD –parents –simplify-merges $rev “$@” >> ../revs

    done

    (so basically just run the same command over and over with each subsequent commit, appending to the same output file).

    From that point the rest of the script worked ok (and this step ran quickly).

  45. Andhra says:

    thank you

  46. Alexandr Artemov says:

    Thank you! I had a problem with this script – it always deleted my directory with cloned repo and then failed. I fixed it, don’t remember how. But there’s another problem as well:

    if I specify not empty target repo, it fails with the following error:

    To git@:tools/utility.git

    ! [rejected] master -> master (fetch first)

    error: failed to push some refs to ‘git@:tools/utility.git’

    hint: Updates were rejected because the remote contains work that you do

    hint: not have locally. This is usually caused by another repository pushing

    hint: to the same ref. You may want to first integrate the remote changes

    hint: (e.g., ‘git pull …’) before pushing again.

    hint: See the ‘Note about fast-forwards’ in ‘git push –help’ for details.

  47. Gonzalo Casas says:

    Thanks!

  48. Robert Goldman says:

    Thank you very much for this post. It just saved my life.

  49. ChanderG says:

    Thanks a lot.

  50. Hyip says:

    Thanks for the script. script. I found it work to automate the process into an independent repository.

Leave a Reply