94

Moving Files from one Git Repository to Another, Preserving History

Posted May 17th, 2011 in Development and tagged , , by Greg Bayer
                                

If you use multiple git repositories, it’s only a matter of time until you’ll want to refactor some files from one project to another.  Today at Pulse we reached the point where it was time to split up a very large repository that was starting to be used for too many different sub-projects.

After reading some suggested approaches, I spent more time than I would have liked fighting with Git to actually make it happen. In the hopes of helping someone else avoid the same trouble, here’s the solution that ended up working best. The solution is primarily based on ebneter’s excellent question on Stack Overflow.

Another solution is Linus Torvald’s “The coolest merge, EVER!” Unfortunately, his approach seems to require more manual fiddling than I would like and results in a repository with two roots. I don’t completely understand the implications of this, so I opted for something more like a standard merge.

Goal:

  • Move directory 1 from Git repository A to Git repository B.

Constraints:

  • Git repository A contains other directories that we don’t want to move.
  • We’d like to perserve the Git commit history for the directory we are moving.

Get files ready for the move:

Make a copy of repository A so you can mess with it without worrying about mistakes too much.  It’s also a good idea to delete the link to the original repository to avoid accidentally making any remote changes (line 3).  Line 4 is the critical step here.  It goes through your history and files, removing anything that is not in directory 1.  The result is the contents of directory 1 spewed out into to the base of repository A.  You probably want to import these files into repository B within a directory, so move them into one now (lines 5/6). Commit your changes and we’re ready to merge these files into the new repository.

git clone <git repository A url>
cd <git repository A directory>
git remote rm origin
git filter-branch --subdirectory-filter <directory 1> -- --all
mkdir <directory 1>
mv * <directory 1>
git add .
git commit

Merge files into new repository:

Make a copy of repository B if you don’t have one already.  On line 3, you’ll create a remote connection to repository A as a branch in repository B.  Then simply pull from this branch (containing only the directory you want to move) into repository B.  The pull copies both files and history.  Note: You can use a merge instead of a pull, but pull worked better for me. Finally, you probably want to clean up a bit by removing the remote connection to repository A. Commit and you’re all set.

git clone <git repository B url>
cd <git repository B directory>
git remote add repo-A-branch <git repository A directory>
git pull repo-A-branch master --allow-unrelated-histories
git remote rm repo-A-branch

Update: Removed final commit thanks to Von’s comment.
Update 2: Added “–allow-unrelated-histories” thanks to several comments.

                                

94 Responses so far.

  1. zzamboni says:

    I managed to do what you say by replacing line 4 with “git filter-branch –tree-filter ‘rm -rf $(ls | egrep -v )’ — –all”, which causes the filter to remove unneeded things instead of just selecting that one directory.

  2. Eve Weinberg says:

    Hello – i’m trying to follow your steps. I’m confused about step6. Am i supposed to use the quote marks? Do I use the words in= ?

    My target path is: Users/NeverOdd/Desktop/of_v0.8.4_osx_release/Rules-of-Art-School, and is repo B, just a new blank folder? or the URL. My repo B is here: https://github.com/evejweinberg/Rules-of-Art-School

  3. Daniel Kahlenberg says:

    regarding step6 etc. those are placeholders only replace them by real path names.

  4. Haley says:

    This was very useful. I made a couple of tweaks to use a branch before pushing back to the server, but otherwise it did just what I needed.

  5. Gaurav Negi says:

    Thanks This is useful. However GIT COMMIT ID will get changed. Is there anyway we keep the GIT commit id also the same ?

  6. balajinix says:

    This was useful. Thank you.

  7. Olga Maciaszek-Sharma says:

    This is a great tutorial. Thanks.

  8. Mike says:

    To quote Randy Moss, Straight Cash Homey!!! Real Nice. Thank you.

  9. SECURITY NOTE: this operation can be dangerous if you are trying to move isolated content from private repository into public one. filter-branch modifies history, but it seems, that all original objects, that were present in source repository, are left intact inside .git folder in the resulting repository. At least `git gc –aggressive –force` shows the same number of objects in both repos. Be careful.

  10. Vishnu Viswanath says:

    no need to push after the git commit in the first section of commands(get files ready to move) ?

  11. Tiberiu Tanasa says:

    I’ve tried your solution, but I got stuck at step 4.

    C:aynmisc [development]> git filter-branch –subdirectory-filter — –all

    usage: git filter-branch [–env-filter ] [–tree-filter ]

    [–index-filter ] [–parent-filter ]

    [–msg-filter ] [–commit-filter ]

    [–tag-name-filter ] [–subdirectory-filter ]

    [–original ] [-d ] [-f | –force]

    […]

    Any idea why I’m getting this?

  12. Tiberiu Tanasa says:

    I forgot to mention that I was using a git shell provided together with GitHub Desktop for Windows (version 3.0.14.0). It seems that it is a problem with this particular git application. I also tried to do the same steps with git installed from https://git-scm.com/ and it works perfect.

  13. Dani Church says:

    For anyone else that gets here and is confused: Bastian’s post seems to have gotten run through an XML filter, which has confused things. Imagine that every =”” disappears, and you get {target-path in repo-b} (replacing the angle brackets with curly braces), which is much easier to understand.

  14. Michal Plichta says:

    Gr8 post! Can you tell me how to remap commiter from repository A to other commiter in repository B. In repoA I have commits as: 1stname.2ndname@company.com and in repoB as: userid@company.com

  15. Simon Greensmith says:

    Accepting this is an old post, for the benefit of anyone reading it now, Git have introduced a command “subtree” that will do this in a heartbeat, maintaining only the relevant commit history e.g.:
    From within old repo:
    $ git subtree split -P -b

    Then from within brand new directory:
    $ git init
    $ git pull
    $ git remote add origin
    … etc.

  16. neelima m says:

    Hi, thanks this is working for directories. Do you know how to move individual files similarly? –subdirectory-filter needs to be replaced with some other option?

  17. neelima m says:

    Hi Simon, This works fine. But my requirement is to add one more step, that is, I want the files in the new repo to be in the same directory structure as they were in the old repo. So after doing a git pull, if I move the files into the old/directory/structure, and do a commit, the previous history of the files is lost. Only my last commit is shown. Any idea of how I can achieve this and still retain all the history for the files that are copied? Thanks.

  18. Ben Warner says:

    If using git 2.9 (and later I assume), you will need to use the –allow-unrelated-histories flag on the git pull.

    e.g.

    git pull repo-A-branch master –allow-unrelated-histories

  19. smartester.com says:

    I got this done much easier way ..
    First I created a new repository.
    I went to git client(source tree) and changed the url of my existing remote repository to new repository and did a force push

  20. tej says:

    First I created a new repository.
    I went to git client(source tree) and changed the url of my existing remote repository to new repository and did a force push

  21. niquis7 says:

    Thanks for this post, it helped me a lot. Git surely has a plethora of tools!

  22. Rajpaul Bagga says:

    (So that someone who gets the error can find this comment by searching for the error, as I tried to do):

    If you don’t add the flag you get this error:
    fatal: refusing to merge unrelated histories

  23. Saurabh Jain says:

    I followed these steps but not able to see log history. When I run `git log .` I get only one commit.

  24. Piter Vergara says:

    Thanks!!

    I had to do some extra steps to keep my tags. Since the commits are rewritten when we do ‘filter-branch’, the tags will not point to the new commits. so, to also rewrite the tags I have exchanged:
    git filter-branch –subdirectory-filter — –all
    to
    git filter-branch –subdirectory-filter –tag-name-filter cat — –all

    Besides that, I have used a merge instead of a pull, because pull didn’t add my tags. So, instead of:

    git remote add repo-A-branch
    git pull repo-A-branch master

    I did

    git remote add repo-A-branch
    git fetch repo-A-branch
    git checkout -b master
    git merge repo-ambiente/master

  25. Fakabbir Amin says:

    Won’t mv * move everything to the directory instead of moving a particular folder ? In my case everything is moved to that directory..

  26. Fakabbir Amin says:

    Alh, Got it, git filter-branch –subdirectory-filter would take a lot of time and interputing it in between will be of no use (in my case it took 2.5 hours). To get the console output do “GIT_TRACE=1” and then run the commands.

  27. aousterh says:

    When I tried this approach, I found that step 4 in the merge portion dumped the contents of directory 1 into the root of repository B, instead of into /. Is that the expected behavior? How can I have the files end up in / instead? Thanks!

  28. Mohamed Ezz says:

    Thank you for this comment!

  29. Artem says:

    Thank you for this post. I translated it into Russian, changed a bit and posted into my own blog. Of course I mentioned that I based on your post and posted a link here 🙂

  30. dwec says:

    Can you elaborate on this for a git novice? I followed the steps and my “repository B” is 268KB whereas the original “repo A” was 2.4MB. If I copy the desired subdirectory from repo A somewhere using unix copy the result is 264KB worth of files. What extra data is being copied using the filter-branch technique described above?

  31. Johnney Darkness says:

    I had to use –allow-unrelated-histories to convince git that step 4 part 2 was okay. Also you probably need to clarify to cd back up between the steps … or maybe I got this wrong.

  32. Bart says:

    You may need the “–allow-unrelated-histories” switch in the pull command… I used “fetch” and “merge” instead of “pull”, and the “merge” needed that switch.

  33. Steve Terpe says:

    Hey Greg, you may want to update this to note that
    “`
    git pull repo-A-branch master master
    “`
    now require “` –allow-unrelated-histories“` flag.

    Still one of my most all-time useful bookmarks.

  34. Greg Bayer says:

    Thanks! Will make the edit.

  35. Steve Swinsburg says:

    Thanks for this. Worked well. Perhaps add the last command ‘git push origin master’ so that the new directory gets pushed up.

  36. Big thank you ! You saved my day!

  37. Jaimon says:

    I’m trying to move a folder from one repo to another without losing the history. In step1, history is intact until step 6. I lose history as soon as I move files from root of the repo to the newly created folder. I’m wondering how others have achieved this. It will great if one of you (who got it working in the recent times) could post the exact commands used. Thanks in advance.

  38. Jaimon says:

    Finally I got it working by replacing lines #5-8 with one more filter-branch operation.
    git filter-branch -f –index-filter \
    ‘git ls-files -s | /usr/local/bin/sed -e “s/\(\t\)\(.*\)$/\1\/\2/” |
    GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
    git update-index –index-info &&
    mv “$GIT_INDEX_FILE.new” “$GIT_INDEX_FILE”‘ HEAD

    Remember to use gnu-sed. I wasted almost 2 days trying to figure out why a simple sed regex doesn’t work. Replace in above sed regex with the folder name that you want to have. -f switch is needed after filter-branch as the previous command must have created a backup reference.

  39. Stanley says:

    Thank you for this! Saved heaps of wrestling. I couldn’t get line 4 working but eventually replaced it with this:

    [git filter-branch –tree-filter “find . -not -path ‘./directory’ -delete”]

    which also keeps the directory itself intact

  40. Chris S says:

    For what it’s worth, I’ve been using this tutorial when needed for the last couple of years. It’s withstood the test of time for all I’m concerned. Nice work!

  41. Mona says:

    Hi, I did this flow (took a long time) but history is not in the target repository :-/

  42. Rafael says:

    Hint about this command:
    git pull repo-A-branch master –allow-unrelated-histories

    In my first tries, I got:
    fatal: Couldn’t find remote ref heads/master
    fatal: The remote end hung up unexpectedly

    It took me a while to figure it out that master is from “source” , as I wanted to move from develop branch, the correct command is:
    git pull repo-A-branch develop –allow-unrelated-histories

Leave a Reply