86

Moving Files from one Git Repository to Another, Preserving History

Posted May 17th, 2011 in Development and tagged , , by Greg Bayer
                                

If you use multiple git repositories, it’s only a matter of time until you’ll want to refactor some files from one project to another.  Today at Pulse we reached the point where it was time to split up a very large repository that was starting to be used for too many different sub-projects.

After reading some suggested approaches, I spent more time than I would have liked fighting with Git to actually make it happen. In the hopes of helping someone else avoid the same trouble, here’s the solution that ended up working best. The solution is primarily based on ebneter’s excellent question on Stack Overflow.

Another solution is Linus Torvald’s “The coolest merge, EVER!” Unfortunately, his approach seems to require more manual fiddling than I would like and results in a repository with two roots. I don’t completely understand the implications of this, so I opted for something more like a standard merge.

Goal:

  • Move directory 1 from Git repository A to Git repository B.

Constraints:

  • Git repository A contains other directories that we don’t want to move.
  • We’d like to perserve the Git commit history for the directory we are moving.

Get files ready for the move:

Make a copy of repository A so you can mess with it without worrying about mistakes too much.  It’s also a good idea to delete the link to the original repository to avoid accidentally making any remote changes (line 3).  Line 4 is the critical step here.  It goes through your history and files, removing anything that is not in directory 1.  The result is the contents of directory 1 spewed out into to the base of repository A.  You probably want to import these files into repository B within a directory, so move them into one now (lines 5/6). Commit your changes and we’re ready to merge these files into the new repository.

git clone <git repository A url>
cd <git repository A directory>
git remote rm origin
git filter-branch --subdirectory-filter <directory 1> -- --all
mkdir <directory 1>
mv * <directory 1>
git add .
git commit

Merge files into new repository:

Make a copy of repository B if you don’t have one already.  On line 3, you’ll create a remote connection to repository A as a branch in repository B.  Then simply pull from this branch (containing only the directory you want to move) into repository B.  The pull copies both files and history.  Note: You can use a merge instead of a pull, but pull worked better for me. Finally, you probably want to clean up a bit by removing the remote connection to repository A. Commit and you’re all set.

git clone <git repository B url>
cd <git repository B directory>
git remote add repo-A-branch <git repository A directory>
git pull repo-A-branch master --allow-unrelated-histories
git remote rm repo-A-branch

Update: Removed final commit thanks to Von’s comment.
Update 2: Added “–allow-unrelated-histories” thanks to several comments.

                                
  • I managed to do what you say by replacing line 4 with “git filter-branch –tree-filter ‘rm -rf $(ls | egrep -v )’ — –all”, which causes the filter to remove unneeded things instead of just selecting that one directory.

  • Eve Weinberg

    Hello – i’m trying to follow your steps. I’m confused about step6. Am i supposed to use the quote marks? Do I use the words in= ?

    My target path is: Users/NeverOdd/Desktop/of_v0.8.4_osx_release/Rules-of-Art-School, and is repo B, just a new blank folder? or the URL. My repo B is here: https://github.com/evejweinberg/Rules-of-Art-School

  • Daniel Kahlenberg

    regarding step6 etc. those are placeholders only replace them by real path names.

  • Haley

    This was very useful. I made a couple of tweaks to use a branch before pushing back to the server, but otherwise it did just what I needed.

  • Gaurav Negi

    Thanks This is useful. However GIT COMMIT ID will get changed. Is there anyway we keep the GIT commit id also the same ?

  • balajinix

    This was useful. Thank you.

  • Olga Maciaszek-Sharma

    This is a great tutorial. Thanks.

  • Mike

    To quote Randy Moss, Straight Cash Homey!!! Real Nice. Thank you.

  • SECURITY NOTE: this operation can be dangerous if you are trying to move isolated content from private repository into public one. filter-branch modifies history, but it seems, that all original objects, that were present in source repository, are left intact inside .git folder in the resulting repository. At least `git gc –aggressive –force` shows the same number of objects in both repos. Be careful.

  • Vishnu Viswanath

    no need to push after the git commit in the first section of commands(get files ready to move) ?

  • Tiberiu Tanasa

    I’ve tried your solution, but I got stuck at step 4.

    C:aynmisc [development]> git filter-branch –subdirectory-filter — –all

    usage: git filter-branch [–env-filter ] [–tree-filter ]

    [–index-filter ] [–parent-filter ]

    [–msg-filter ] [–commit-filter ]

    [–tag-name-filter ] [–subdirectory-filter ]

    [–original ] [-d ] [-f | –force]

    […]

    Any idea why I’m getting this?

  • Tiberiu Tanasa

    I forgot to mention that I was using a git shell provided together with GitHub Desktop for Windows (version 3.0.14.0). It seems that it is a problem with this particular git application. I also tried to do the same steps with git installed from https://git-scm.com/ and it works perfect.

  • Dani Church

    For anyone else that gets here and is confused: Bastian’s post seems to have gotten run through an XML filter, which has confused things. Imagine that every =”” disappears, and you get {target-path in repo-b} (replacing the angle brackets with curly braces), which is much easier to understand.

  • Michal Plichta

    Gr8 post! Can you tell me how to remap commiter from repository A to other commiter in repository B. In repoA I have commits as: 1stname.2ndname@company.com and in repoB as: userid@company.com

  • Simon Greensmith

    Accepting this is an old post, for the benefit of anyone reading it now, Git have introduced a command “subtree” that will do this in a heartbeat, maintaining only the relevant commit history e.g.:
    From within old repo:
    $ git subtree split -P -b

    Then from within brand new directory:
    $ git init
    $ git pull
    $ git remote add origin
    … etc.

  • neelima m

    Hi, thanks this is working for directories. Do you know how to move individual files similarly? –subdirectory-filter needs to be replaced with some other option?

  • neelima m

    Hi Simon, This works fine. But my requirement is to add one more step, that is, I want the files in the new repo to be in the same directory structure as they were in the old repo. So after doing a git pull, if I move the files into the old/directory/structure, and do a commit, the previous history of the files is lost. Only my last commit is shown. Any idea of how I can achieve this and still retain all the history for the files that are copied? Thanks.

  • theowoo
  • Ben Warner

    If using git 2.9 (and later I assume), you will need to use the –allow-unrelated-histories flag on the git pull.

    e.g.

    git pull repo-A-branch master –allow-unrelated-histories

  • smartester.com

    I got this done much easier way ..
    First I created a new repository.
    I went to git client(source tree) and changed the url of my existing remote repository to new repository and did a force push

  • tej

    First I created a new repository.
    I went to git client(source tree) and changed the url of my existing remote repository to new repository and did a force push

  • niquis7

    Thanks for this post, it helped me a lot. Git surely has a plethora of tools!

  • Rajpaul Bagga

    (So that someone who gets the error can find this comment by searching for the error, as I tried to do):

    If you don’t add the flag you get this error:
    fatal: refusing to merge unrelated histories

  • Saurabh Jain

    I followed these steps but not able to see log history. When I run `git log .` I get only one commit.

  • Piter Vergara

    Thanks!!

    I had to do some extra steps to keep my tags. Since the commits are rewritten when we do ‘filter-branch’, the tags will not point to the new commits. so, to also rewrite the tags I have exchanged:
    git filter-branch –subdirectory-filter — –all
    to
    git filter-branch –subdirectory-filter –tag-name-filter cat — –all

    Besides that, I have used a merge instead of a pull, because pull didn’t add my tags. So, instead of:

    git remote add repo-A-branch
    git pull repo-A-branch master

    I did

    git remote add repo-A-branch
    git fetch repo-A-branch
    git checkout -b master
    git merge repo-ambiente/master

  • Fakabbir Amin

    Won’t mv * move everything to the directory instead of moving a particular folder ? In my case everything is moved to that directory..

  • Fakabbir Amin

    Alh, Got it, git filter-branch –subdirectory-filter would take a lot of time and interputing it in between will be of no use (in my case it took 2.5 hours). To get the console output do “GIT_TRACE=1” and then run the commands.

  • aousterh

    When I tried this approach, I found that step 4 in the merge portion dumped the contents of directory 1 into the root of repository B, instead of into /. Is that the expected behavior? How can I have the files end up in / instead? Thanks!

  • Mohamed Ezz

    Thank you for this comment!

  • Artem

    Thank you for this post. I translated it into Russian, changed a bit and posted into my own blog. Of course I mentioned that I based on your post and posted a link here 🙂

  • dwec

    Can you elaborate on this for a git novice? I followed the steps and my “repository B” is 268KB whereas the original “repo A” was 2.4MB. If I copy the desired subdirectory from repo A somewhere using unix copy the result is 264KB worth of files. What extra data is being copied using the filter-branch technique described above?

  • Awesome!

  • Johnney Darkness

    I had to use –allow-unrelated-histories to convince git that step 4 part 2 was okay. Also you probably need to clarify to cd back up between the steps … or maybe I got this wrong.

  • Bart

    You may need the “–allow-unrelated-histories” switch in the pull command… I used “fetch” and “merge” instead of “pull”, and the “merge” needed that switch.

  • Steve Terpe

    Hey Greg, you may want to update this to note that
    “`
    git pull repo-A-branch master master
    “`
    now require “` –allow-unrelated-histories“` flag.

    Still one of my most all-time useful bookmarks.

  • Thanks! Will make the edit.