A place for spare thoughts

16/06/2011

First git-tfs usage problems

Filed under: git, git-tfs — Ivan Danilov @ 21:45

[This post assumes reader is at least somewhat acknowledged with git and git-tfs bridge]

UPDATE: the bug described here is already fixed in git-tfs, so if you have latest version built from sources – it is not relevant for you. v0.11 still has the bug. Probably next version wouldn’t.

TFS doesn’t differ checkins based on author’s email. In fact it doesn’t even know author’s email. And that fact leads to some problems when working with git-tfs bridge as for git author’s email is crucial part of every commit influencing commit SHA-1 hash.

Goals

Suppose you have remote TFS server with slow connection (or – more likely today – it is not very reliable connection and could fail often) and you want to minimize network activity. And TFS server has some developers behind it of course. Naturally with DVCS like git it leads to such desired schema:

Dev1 \                                                     / TFS Dev1
      \        git         [   slow   ]                   /
Dev2 ------- Central ----- [  network ] ---- TFS Server ---- TFS Dev2
...   /     repository     [connection]                   \   ...
DevN /                                                     \ TFS DevN

So when TFS pulling is required – any developer on the left executes git tfs pull (or fetch), and pushes tfs/default branch to the Central so that every other developer on the left could get it without going to TFS.

That is the goal. In the ideal world it would work this way from the begining. Oh, wait! In the ideal world there wouldn’t have been TFS on the schema in the first place 🙂

First attempt

So lets return to reality. Just to test things when not everything going as expected I set up git repo called test-central:

test-central$ git tfs clone tfs_url test-central
--- and make it bare, like that
test-central $ cd test-central
test-central $ mv .git ..    # save .git somewhere (not important where really)
test-central $ rm -fr *      # remove everything in the folder
test-central $ mv ../.git .  # take .git back
test-central $ mv .git/* .   # get everything from .git to the repo's root
test-central $ rmdir .git    # delete .git
test-central $ git config --bool core.bare true  # tell git it is bare repo

Then I created test-dev-1 repository:

test-dev-1 $ git clone test-central test-dev-1
test-dev-1 $ cd test-dev-1
test-dev-1 $ git tfs bootstrap
test-dev-1 $ git config user.name dev-1
test-dev-1 $ git config user.email dev-1@email.com

And test-dev-2 repository:

test-dev-2 $ git clone test-central test-dev-2
test-dev-2 $ cd test-dev-2
test-dev-2 $ git tfs bootstrap
test-dev-2 $ git config user.name dev-2
test-dev-2 $ git config user.email dev-2@email.com

Both devs have the same TFS history cloned from test-central. Now tfs-dev-1 have some changes and sends them to TFS. test-dev-1 spots new changeset in TFS and decides to pull them:

test-dev-1 $ git tfs pull  # suppose fast-forward merge for simplicity

Now this changeset is stored in his local repository with author’s name tfs-dev-1 and (as TFS don’t have emails) author’s email dev-1@email.com. So he pushes this commit to test-central to share it with other developers:

test-dev-1 $ git push

At this time test-dev-2 also spots new changeset. He doesn’t know that dev-1 already got it (or just forgot to check) so he also decides to pull it from TFS:

test-dev-2 $ git tfs pull

His commits have author’s name also tfs-dev-1, but author’s email is dev-2@email.com this time! So his commit from git’s point of view is entirely different from dev-1’s commits. And so…

test-dev-2 $ git push

…results in a conflict.

That seems pretty bad. So to provide commits originated in TFS with ‘shareability’ they should have the same email, right? So probably git-tfs bridge should set email to some predefined value for every commit that originates from TFS changeset.
This way test-dev-1’s and test-dev-2’s commits will both have some identical fake value like TFS@email.com and SHA-1 hashes will be equal and everything will be great. Right?

Second attempt

Apparently it is not so easy (we’re already back to the real world, remember?)

Let me explain with an example a problem I’ve faced an hour ago. The most simple scenario. Single dev, single git local repository, just one new commit. As basic as possible.

At the start git repo is like that (tfs is tfs/default – just shortage):

   A <---- B
        [master]
         [tfs]

Then I make some changes and commit them to git:

   A <---- B <------ C
         [tfs]    [master]

Commit C is normal git commit so it has author='dev' and email='dev@email.com'.

After that I want to checkin my commit to TFS so I execute 'git tfs checkin'. Nothing changes within my git repo. 'git tfs fetch' gets back my commit from tfs. And weird things start to happen…

Commit that came from TFS when we did 'git tfs fetch' (lets call it D for clarity) has author='dev-1-tfs-account-name' and email='TFS@email.com' (as we agreed above). You're already know how graph will look like, yeah? 🙂

   A <---- B <------ C
           \      [master]
            \
             \<----- D
                   [tfs]

That doesn’t seem like fast-forward we were desiring from [tfs] branch… For the same reason as before commit D differs from C. But we want them to be equal! What we need for such outcome to become real, then?
Yeah, even more restrictive rule:

$ git config user.name dev-1-tfs-account-name
$ git config user.email TFS@email.com

Well, TFS@email.com was chosen absolutely arbitrary so you could set it to any fake value you like.

Conclusion

To work more-or-less comfortable with TFS every developer should have git’s user.name equal to TFS account name and all developers should share single email.

P.S. In the last example you could merge C with D, get some commit E (without any conflicts actually as B->C and B->D diffs are absolutely the same)… but than you’ll have even the simplest graph looks like DNA molecule. It’s not what I can call comfortable work.

Advertisements

13 Comments »

  1. Good catch. I had tried to make commits identical, no matter who `git tfs fetched`. I missed that detail, and never tested with multiple users. I think this worked right at some point, but I’m not sure why it’s not working right now.

    Comment by Matt — 16/06/2011 @ 22:28

  2. […] I described here previously my goal was to establish central git repository and avoid redundant round-trips to TFS […]

    Pingback by How to establish git central repository for working against TFS with git-tfs bridge « A place for spare thoughts — 18/07/2011 @ 17:40

  3. I’m trying to wrap my head around this and I’m having a little bit of trouble. According to the model you’ve set up, Dev1, Dev2, …, DevN pull from the central repository, pull any updates from TFS, then push those updates back into the central repository. So where is the benefit from doing it this way?

    Comment by Jeff C. — 18/07/2012 @ 23:35

    • I wanted to be able to get everything from TFS only once and then share it with others without going through the slow connection again and again – and of course integrity of repository is a requirement. How things are pushed back is not so important here. In the end we stopped with checking in to TFS from DevX machines directly.

      Comment by Ivan Danilov — 18/07/2012 @ 23:43

      • “In the end we stopped with checking in to TFS from DevX machines directly.” If you’re still updating TFS, this would really be ideal. How did you make this happen?

        Comment by Jeff C. — 19/07/2012 @ 20:54

  4. Moreover, our TFS is going down from time to time, so another benefit of such scheme is that we have scheduled script on central repo that is getting latest changes from TFS every 5 minutes. Thus, even if TFS is down – we have latest changes. And everybody have them – not only those lucky developers who catched the time-window when TFS was up and answering 🙂

    Comment by Ivan Danilov — 18/07/2012 @ 23:47

    • I tried to set up a similar scheme, but had problems pulling from TFS once I made the centralized repository into a bare one. If I run “git tfs pull” on a bare repository, it returns with error code 128, so i’m not sure if it’s working correctly.

      Comment by Jeff C. — 19/07/2012 @ 21:03

  5. Jeff C. :

    “In the end we stopped with checking in to TFS from DevX machines directly.” If you’re still updating TFS, this would really be ideal. How did you make this happen?

    We’re forced to unfortunately. What problem do you have with updating TFS? Just set up git-tfs on every dev machine. I’ll try to describe our current environment in the next post because it would be too much for a comment. I’ll try to do it today.

    Comment by Ivan Danilov — 19/07/2012 @ 20:59

    • That would be great! I’d much rather deal with Git than with TFS. 🙂

      Comment by Jeff C. — 19/07/2012 @ 21:05

  6. Jeff C. :

    I tried to set up a similar scheme, but had problems pulling from TFS once I made the centralized repository into a bare one. If I run “git tfs pull” on a bare repository, it returns with error code 128, so i’m not sure if it’s working correctly.

    Oh, I see. For some reason git-tfs won’t work with bare repositories. I believe it is due to no one bothered to check if everything is ok with bares, but nevertheless.
    I ended up with yet one repository beside bare one just to get updated from TFS constantly. So we have two dedicated repositories: first is a sandbox that has update from TFS scheduled every five minutes (and if everything is fine these updates are being pushed to main repo); and a second is our main repository that is accessible through the http protocol. No one except sandbox script pushes into master branch of main repository, thus main/master always follows TFS line. When someone needs to push to TFS – he runs rcheckin (well, we have customized one, but that’s for another time), changes go to TFS and then sandbox’s script will pick them up. Thus we also ensuring that git-tfs pushed everything correctly – otherwise main/master and HEAD will diverge and it will be immediately obvious something went wrong.

    Comment by Ivan Danilov — 19/07/2012 @ 21:13

  7. Ah, so *that’s* the missing piece. Thanks!

    Comment by Jeff C. — 19/07/2012 @ 22:56

  8. Okay, one more question. I looked at your auto-update script and it looks like the sandbox both pushes to and pulls from the “main” repository (i.e. the bare one). So the sandbox would also have the bare one set up as its origin. Is this correct?

    Comment by Jeff C. — 17/08/2012 @ 23:25

  9. Yes. That is useful in case someone checked in something from git to tfs and synchronized central git repository as well. Then this commit is already there and there’s no need to fetch it from TFS yet one time.

    Comment by Ivan Danilov — 17/08/2012 @ 23:33


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: