A place for spare thoughts

23/08/2012

My environment for day-to-day work with git-tfs

Filed under: git, git-tfs, tools — Ivan Danilov @ 16:22

I’ve been asked many times to describe how things are organized at our project with git-tfs, so it’s the time to write it down – repeating these things over and over isn’t effective at all.

So, below I will describe how things are arranged to make TFS2010, msysgit and git-tfs work together.

First, setup involves several physical boxes, so lets name them at once:

  1. DevA, DevB – two developer’s boxes;
  2. GitServer – Windows Server 2008 R2 box with IIS – hosts central git repository, exposes git repositories over HTTP and has scheduler task to synchronize central git repository with TFS;
  3. TfsServer – remote host with TFS Server 2010 (it is our corporate beast and I don’t have administrative access there);

I will assume you already installed git and git-tfs to your machines. So, let’s start.

 

1. Cloning TFS repository first time

First, you have to clone repository from TFS somewhere. On your DevA box open bash console, go to the folder where you want sources reside and execute this:

git tfs clone -d http://TfsServer:8080/tfs $/ProjectA/Src proj-git

or maybe, if you don’t interested in full history and want to have just last TFS sources state:

git tfs quick-clone -d http://TfsServer:8080/tfs $/ProjectA/Src proj-git

It may take up to several hours if you have large repository and slow connection. If everything is fine – folder proj-git will appear there. Inside are your working copy with latest TFS version and folder .git with configuration.

 

2. Setting up GitServer and auto-update from TFS

git-tfs doesn’t work with bare repositories (at least it was so at the time I followed its development), thus we will need two repositories for each TFS one: first will be bare, and accessible through HTTP; and second one would be private – there we will setup scheduled fetching from TFS server, each time new changes are found in TFS – we will push them to first repository. To distinguish these two I will refer to first as Central and second as Working.

  1. Install IIS server role
  2. Install GitWebAccess. As I wrote before last version seems broken, so you can download my working version. Unpack it to C:\inetpub\wwwroot\gwa, and follow this manual. I will suppose you created folder C:\GitRepos for your git repos as suggested in this manual.
  3. Create new folder for Working repo somewhere. For this example it would be C:\working
  4. Copy content of proj-git folder (where you’ve cloned TFS repository on 1st step before) to C:\working and execute
    git checkout -B master tfs/default
    

    thus master branch will be set at your last TFS commit

  5. Add C:\working\.gitignore file if it is not present with these lines:
    /update
    /log.txt
    

    If file is present – just add such lines there. update is the script we will write next and log.txt… well, it is log of updating. As updating will be executed by scheduler – it is the only way to know what’s happening.

  6. Now we will need bare repository in the C:\GitRepos\Project.git. Execute from C:\GitRepos these command:
    git clone --bare /c/working Project.git
    git remote rm origin # we don't need origin here
    

    If it was done correctly – you should have your repository accessible by HTTP through GitWebAccess.

  7. Now it’s the time to setup update in Working repository. The convention I embrace is that master branch of my repository will always follow TFS repository line. Thus, no one except update script should push it directly and it will never have changes conflicting with TFS data.
    You need to setup origin remote in the Working repo in order for script below to work:

    git remote add origin /c/GitRepos/Project.git
    

    Next, create file C:\Working\update and put these lines to it:

    #!/bin/sh
    
    check_err()
    {
    	# parameter 1 is last exit code
    	# parameter 2 is error message that should be shown if error code is not 0
    	if [ "${1}" -ne "0" ]; then
    		cat '~temp.log'
    		echo ${2}
    		rm -f '~temp.log' > /dev/null
    		exit ${1}
    	fi;
    	rm -f '~temp.log' > /dev/null
    }
    
    echo "$(date) Update launched"
    
    if [ ! -z "$(git status --porcelain)" ]; then
    	echo "$(date) Your status is not clean, can\'t update"
    	exit 1;
    fi;
    
    echo "$(date) Pulling from central repo first to avoid redundant round-trips to TFS..."
    git pull origin master:master > '~temp.log'
    check_err $? "Pulling from central repo failed"
    
    echo "$(date) Pulling from TFS..."
    git tfs pull -d > '~temp.log'
    check_err $? "Pulling from TFS resulted in error";
    
    local_commits_to_push="$(git rev-list master ^origin/master)"
    if [ -z "$local_commits_to_push" ]; then
    	echo "$(date) Central repo is up-to-date, nothing to push"
    else
    	echo "$(date) Pushing updates to central repo"
    	git push origin master > '~temp.log'
    	check_err $? "Push to central resulted in error";
    fi;
    

    You may check it – just execute update from bash console. Your latest updates should be fetched from TFS. In case there were some changes – they should be also pushed to Central‘s master branch.
    It is simplified version of the script from this post. Also there you can find how to create scheduler task to execute this update script every 5 minutes – do it now.

 

3. Create rcheckin and reap the benefits

Now, back to the developer’s environment. We want to use that auto-updating central repository effective, don’t we? So:

  • We don’t need to use git tfs fetch at all anymore: we may limit ourselves to git fetch because latest code is always in the Central git repository.
  • When git tfs rcheckin is executed – it fetches latest changes from TFS. We need to reuse as much as possible from already fetched changes. So, we need to fetch manually from Central before executing git tfs rcheckin and somehow make git-tfs mark it as correct TFS-fetched commits.
  • And finally, after we checkined something to TFS – we want to push it to Central, so auto-updater won’t fetch what we already have – it could have significant performance impact in case of large commits, or long line of commits with slow connection. In such case we may even want to disable auto-updater for some time until we push things to Central and then just re-enable it.

So, in order to achieve all these points – instead of using git tfs rcheckin that doesn’t know anything about our environment – we will use custom script rcheckin (do not forget to add it to .gitignore if needed):

#!/bin/sh

check_err()
{
	# parameter 1 is last exit code
	# parameter 2 is error message that should be shown if error code is not 0
	if [ "${1}" -ne "0" ]; then
		cat '~temp.log'
		echo ${2}
		rm -f '~temp.log' > /dev/null
		exit ${1}
	fi;
	rm -f '~temp.log' > /dev/null
}

if [ ! -z "$(git status --porcelain)" ]; then
	echo "Your status is not clean, can't continue"
	exit 1;
fi;

echo "Fetching origin..."
git fetch origin

if [ -n "`git rev-list HEAD..origin/master`" ]; then
	echo "origin/master has some TFS checkins that are conflicting with your branch. Please reabse first, then try to rcheckin again"
	exit 1;
fi;

echo "Marking latest TFS commit with bootstrap..."
git tfs bootstrap -d

git tfs rcheckin
check_err $? "rcheckin exited with error"

git push origin HEAD:master
check_err $? "Can't push HEAD to origin"

Well, that’s almost all.

 

4. Setup another developer environment

So, suppose DevB also needs to work with your server. Here is how he can start his work easily:

  1. git clone http://GitServer:8080/project.git project
    cd project
    git tfs bootstrap -d # make git-tfs mark what is required
    
  2. Get rcheckin script from previous section.
  3. Work!

 

5. My common workflow

So, I’ll try to describe how I work in general, supposing I’m in my dev repository, set up as described above.

  1. First, let’s check that no changes appeared while I was writing all of these: git fetch origin. OK, no changes.
  2. Working, making changes hard… committing several times, squashing, rewriting etc.
  3. Ready to push to TFS? Great! Let’s check again – maybe someone pushed something while I have been working: git fetch origin. Yeah, there’re some changes, origin/master has updates. Let’s rebase my work there (I don’t like merging my local work which is not seen by anyone except me – it just clutters commit graph): git rebase origin/master, maybe some conflict resolving. Hey, I’m ready to push: rcheckin.
  4. Yet some work.
  5. Working time is over and I have not finished my current task. I want it backed up on server, so my local machine failure can’t waste my work: git push origin HEAD:idanilov/temp. idanilov is my ‘private’ branch namespace, every developer has one. By convenience it is just backups and it is not considered published. For open-source projects you probably won’t do this, but corporative rules sometimes require some assurance that you won’t lost your work that is already paid for.

Sometimes I may switch branches, stash changes and do other git stuff, but it is not so frequent. Most often all of my workflow is fetch-rebase-rcheckin chain.

I wanted to explain yet some things – e.g., how to have a bunch of bash scripts in a way that it doesn’t get to TFS (it’s easy, just use .gitignore) and have a way to synchronize them across all developers using git (it’s much harder if you don’t want to bother running to every developer, copying new versions of scripts each time you want to correct one line in some script). Or how to automate git console, so you don’t need to type long commands every time which is pretty boring. But… this post is already pretty long, so I will do this later.

Have fun moving away from TFS with git and git-tfs! 😉

Advertisements

02/11/2011

Git-tfs bridge v0.12.1 released

Filed under: git-tfs — Ivan Danilov @ 15:12

Just FYI actually. Binaries could be downloaded here.

And list of changes is here.

It is bugfix release, hence only revision version changed.

If you have issues – please report to the issue tracker at github. Any non-issue information could be posted to the google group.

10/08/2011

Git-TFS bridge v0.12 released

Filed under: git-tfs — Ivan Danilov @ 16:00

New features in chronological order:

  • git tfs quick-clone -c 12345 option that allows you to clone not latest, but specified changeset. If project in TFS is large but you want to have part of the history – this will be handy;
  • git tfs rcheckin command implemented. See wiki for details. For short it allows you to checkin strainght series of commits from git to TFS. Also it supports simple merge-preserving but with significant limitations;
  • TFS authentication support. Now you can put auth info to repository configuration, pass it from command line, use your default windows user account or enter password each time;
  • Fetch/Pull commands (git-tfs’ ones of course) now checks your local repository up to HEAD to check if some TFS changesets were fetched with normal git fetch from other git repository. Also some minor changes that allows you to have central git repository and go to TFS only one time per changeset while downloading everything else from central repo.
  • git tfs unshelve and git tfs shelve-list commands that are allowing you to work with TFS’ shelves. Note that not everything is perfect there due to TFS crappiness. Refer for details here.
  • autotag option allows you to suppress creation of a tag for each TFS changeset if you don’t need this. To be precise: before 0.12 tag creation was default and non-optionable behavior. Since 0.12 it is optional and turned off by default.

And several bugs fixed, as always, of course.

In my plans there’s TFS’ branch support implementation, making git-tfs faster, and optionally moving to NDesk.Options (aka Mono.Options. NDesk site is obsolete as author mentioned. Still it has useful documentation) and libgit2sharp.

Also it would be great if someone could help us with keeping wiki up-to-date. It is somewhat behind.

P.S. Please note that I consider v0.12 still as release candidate. So if you have problems with it – feel free to use issue tracker or Google Group to let us know. Feedback is always appreciated!

18/07/2011

How to establish git central repository for working against TFS with git-tfs bridge

Filed under: git, git-tfs — Ivan Danilov @ 17:40

As I described here previously my goal was to establish central git repository and avoid redundant round-trips to TFS whenever possible.

To achieve this now you probably want to follow my advise in the above mentioned article about having user.name=<tfs account name without domain> and same user.email for everyone fetching changes from TFS.
In the latest git-tfs sources this was already fixed, so you don’t need to set this kind of things.

When I established central repository I faced some inconveniences: firstly pattern “checkout master, check central repository for changes already there but not in your local repository, pull new changes from TFS, push to central repository if anything new was pulled, checkout old working branch again, optionally rebase it onto master” was very boring and very repeatable. And the second inconvenience: this pattern should be repeated each time something new appears in TFS. I want to have some scheduled updating so I wouldn’t think about such things as when and how I should update latest version etc.

Well, it is perfect target for automation. So now I want to share my bash script that does these things and saves you several dozens of key-pressing each time you need get changes from TFS.

#!/bin/sh

check_err()
{
    # parameter 1 is last exit code
    # parameter 2 is error message that should be shown if error code is not 0
    if [ "${1}" -ne "0" ]; then
        cat '~temp.log'
        echo ${2}
        rm -f '~temp.log' &gt; /dev/null
        exit ${1}
    fi;
    rm -f '~temp.log' &gt; /dev/null
}

echo "$(date) Update launched"

if [ ! -z "$(git status --porcelain)" ]; then
    echo "$(date) Your status is not clean, can't update"
    exit 1;
fi;

branch_name="$(git symbolic-ref HEAD 2&gt;/dev/null)"
branch_name=${branch_name##refs/heads/}

if [ ! "$branch_name" == "master" ]; then
    git checkout master
fi;

echo "$(date) Pulling from central repo first to avoid redundant round-trips to TFS..."
git pull &gt; '~temp.log'
check_err $? "Pulling from central repo failed"

echo "$(date) Pulling from TFS..."
git tfs pull -d &gt; '~temp.log'
check_err $? "Pulling from TFS resulted in error";

remote_conflicting_commits="$(git rev-list origin/master ^master)"
if [ ! -z "$remote_conflicting_commits" ]; then
    echo "origin/master has conflicting commits, can't push"
    exit 1;
fi;

local_commits_to_push="$(git rev-list master ^origin/master)"
if [ -z "$local_commits_to_push" ]; then
    echo "$(date) Central repo is up-to-date, nothing to push"
else
    echo "$(date) Pushing updates to central repo"
    git push --tags origin master &gt; '~temp.log'
    check_err $? "Push to central resulted in error";
fi;

if [ ! "$branch_name" == "master" ]; then
	git checkout $branch_name
	if [ "$1" == "-r" ]; then
		echo "Rebasing $branch_name on master";
		git rebase master
	fi;
fi;

This script assumes you want master branch to mirror TFS and rebases your working branch onto new changes only if you have specified -r switch. And for the second inconvenience you have just to set-up some dedicated working copy and run this script each, say, five minutes in it. It is very easy to do with built-in Windows Task Scheduler:

  1. Go to Start -> Administrative tools -> Task Scheduler
  2. Click Create Task on the right side
  3. Check radio button ‘Run whether user is logged on or not’
  4. Select Triggers tab, click ‘New’. Set Start=’One time’ and ‘Repeat task every’ whatever, check ‘Enabled’, click OK.
  5. Go to Actions tab, ‘New’. Set Program/script to ‘cmd‘, Add arguments=’/c "sh.exe update >> log.txt"‘ and in the Start in field put path to your working copy, dedicated to central repository updating.
    1. That’s all. Now you should have updates regularly without any efforts. It is important to note though, that script assumes your remote named origin/master and noone pushes conflicting changes to central repo’s master branch.

      And not forget to add /log.txt to .gitignore. Otherwise your updates will just fail to start because of not clean working tree.

21/06/2011

GitTfs rebasing workflow. Is it possible?

Filed under: git, git-tfs — Ivan Danilov @ 06:55

Here is the issue on GitTfs about some complexities in usage. The point is GitTfs allows you to take a feature branch and put it into the TFS in single checkin. On the git side this action produces merge commit with two parents: one for previous commit fetched from TFS and another from your feature branch. If you want to have fine-grained history on TFS side – you have a problem.

So what could we do with it?

I’ve just finished a patch that allows to perform fine-grained workflow more-or-less painlessly. The idea is new option to checkin command, namely –rebase-workflow or just -r.

Lets name for simplicity commit being checked into TFS as ‘source’ and fetched afterwards as ‘result’. Source belongs to feature branch that we want to check into TFS commit-by-commit.

So what -r key is supposed to do? First and the most important result of -r is that it suppresses marking result as merge commit. Just skips assigning source as parent. So after checkin we’ll have two separate branches, and result will actually contain all changes from source but it won’t be shown in the graph.
See the diagrams below:

  A
[tfs]
     \
      \<-- B <--- C
               [branch]

becomes

  A <----- B'
    \    [tfs]
     \
      \<-- B <--- C
               [branch]

Here B’ has all changes from B.

Just to compare with default behavior:

              [tfs]
  A <---------- M
    \         /
     \       /
      \<-- B <----- C
                [branch]

M is merge commit and B is original. They are just the same change from A most of the time and it is confuses history very much.

So how could we going to turn two diverging branches into linear history? With a rebase of course.
We could take remaining of the local branch (so it is C actually) and rebase it to the B’. It will go smoothly as we are just applying clean patch essentially. Thus B becomes (most likely – if there were no third branches spawned from feature branch) orphan commit and will go away. Which is ok as we have all changes from it in the B’. And it is exactly what -r key tries to do for you.

I did some testing (simplest actually and with some conflicts/interfering with native TFS client) on my local TFS server and it seems working well and producing much more understandable history. But I’m not an expert in git so I could miss some cases. Currently I only check with rev-list that source doesn’t have parents which are not parents of HEAD thus we could apply source..HEAD to result smoothly.

Intended workflow is like that:

git checkout -b local
# make changes
git commit -m 'blah'
# make changes
git commit -m 'blah2'
git tfs fetch
git rebase tfs/default

git tfs checkin -r -m 'blah goes to tfs with rebase' HEAD^
# thus we are sending just first commit leaving history clean

git tfs checkin -r -m 'blah2 goes to tfs also'
# here source is HEAD so we can omit it

git checkout master
git merge tfs/default
git branch -D local

As a result you should have just a linear history in git consisting from TFS commits.

So the comments/objections/suggestions are welcome.

Branch with corresponding changes is here

UPDATE: currently rebase-workflow branch is integrated in mainline of the git-tfs project in form of rcheckin command. So the branch mentioned above is no longer exist.

16/06/2011

First git-tfs usage problems

Filed under: git, git-tfs — Ivan Danilov @ 21:45

[This post assumes reader is at least somewhat acknowledged with git and git-tfs bridge]

UPDATE: the bug described here is already fixed in git-tfs, so if you have latest version built from sources – it is not relevant for you. v0.11 still has the bug. Probably next version wouldn’t.

TFS doesn’t differ checkins based on author’s email. In fact it doesn’t even know author’s email. And that fact leads to some problems when working with git-tfs bridge as for git author’s email is crucial part of every commit influencing commit SHA-1 hash.

Goals

Suppose you have remote TFS server with slow connection (or – more likely today – it is not very reliable connection and could fail often) and you want to minimize network activity. And TFS server has some developers behind it of course. Naturally with DVCS like git it leads to such desired schema:

Dev1 \                                                     / TFS Dev1
      \        git         [   slow   ]                   /
Dev2 ------- Central ----- [  network ] ---- TFS Server ---- TFS Dev2
...   /     repository     [connection]                   \   ...
DevN /                                                     \ TFS DevN

So when TFS pulling is required – any developer on the left executes git tfs pull (or fetch), and pushes tfs/default branch to the Central so that every other developer on the left could get it without going to TFS.

That is the goal. In the ideal world it would work this way from the begining. Oh, wait! In the ideal world there wouldn’t have been TFS on the schema in the first place 🙂

First attempt

So lets return to reality. Just to test things when not everything going as expected I set up git repo called test-central:

test-central$ git tfs clone tfs_url test-central
--- and make it bare, like that
test-central $ cd test-central
test-central $ mv .git ..    # save .git somewhere (not important where really)
test-central $ rm -fr *      # remove everything in the folder
test-central $ mv ../.git .  # take .git back
test-central $ mv .git/* .   # get everything from .git to the repo's root
test-central $ rmdir .git    # delete .git
test-central $ git config --bool core.bare true  # tell git it is bare repo

Then I created test-dev-1 repository:

test-dev-1 $ git clone test-central test-dev-1
test-dev-1 $ cd test-dev-1
test-dev-1 $ git tfs bootstrap
test-dev-1 $ git config user.name dev-1
test-dev-1 $ git config user.email dev-1@email.com

And test-dev-2 repository:

test-dev-2 $ git clone test-central test-dev-2
test-dev-2 $ cd test-dev-2
test-dev-2 $ git tfs bootstrap
test-dev-2 $ git config user.name dev-2
test-dev-2 $ git config user.email dev-2@email.com

Both devs have the same TFS history cloned from test-central. Now tfs-dev-1 have some changes and sends them to TFS. test-dev-1 spots new changeset in TFS and decides to pull them:

test-dev-1 $ git tfs pull  # suppose fast-forward merge for simplicity

Now this changeset is stored in his local repository with author’s name tfs-dev-1 and (as TFS don’t have emails) author’s email dev-1@email.com. So he pushes this commit to test-central to share it with other developers:

test-dev-1 $ git push

At this time test-dev-2 also spots new changeset. He doesn’t know that dev-1 already got it (or just forgot to check) so he also decides to pull it from TFS:

test-dev-2 $ git tfs pull

His commits have author’s name also tfs-dev-1, but author’s email is dev-2@email.com this time! So his commit from git’s point of view is entirely different from dev-1’s commits. And so…

test-dev-2 $ git push

…results in a conflict.

That seems pretty bad. So to provide commits originated in TFS with ‘shareability’ they should have the same email, right? So probably git-tfs bridge should set email to some predefined value for every commit that originates from TFS changeset.
This way test-dev-1’s and test-dev-2’s commits will both have some identical fake value like TFS@email.com and SHA-1 hashes will be equal and everything will be great. Right?

Second attempt

Apparently it is not so easy (we’re already back to the real world, remember?)

Let me explain with an example a problem I’ve faced an hour ago. The most simple scenario. Single dev, single git local repository, just one new commit. As basic as possible.

At the start git repo is like that (tfs is tfs/default – just shortage):

   A <---- B
        [master]
         [tfs]

Then I make some changes and commit them to git:

   A <---- B <------ C
         [tfs]    [master]

Commit C is normal git commit so it has author='dev' and email='dev@email.com'.

After that I want to checkin my commit to TFS so I execute 'git tfs checkin'. Nothing changes within my git repo. 'git tfs fetch' gets back my commit from tfs. And weird things start to happen…

Commit that came from TFS when we did 'git tfs fetch' (lets call it D for clarity) has author='dev-1-tfs-account-name' and email='TFS@email.com' (as we agreed above). You're already know how graph will look like, yeah? 🙂

   A <---- B <------ C
           \      [master]
            \
             \<----- D
                   [tfs]

That doesn’t seem like fast-forward we were desiring from [tfs] branch… For the same reason as before commit D differs from C. But we want them to be equal! What we need for such outcome to become real, then?
Yeah, even more restrictive rule:

$ git config user.name dev-1-tfs-account-name
$ git config user.email TFS@email.com

Well, TFS@email.com was chosen absolutely arbitrary so you could set it to any fake value you like.

Conclusion

To work more-or-less comfortable with TFS every developer should have git’s user.name equal to TFS account name and all developers should share single email.

P.S. In the last example you could merge C with D, get some commit E (without any conflicts actually as B->C and B->D diffs are absolutely the same)… but than you’ll have even the simplest graph looks like DNA molecule. It’s not what I can call comfortable work.

Blog at WordPress.com.