A place for spare thoughts

30/08/2012

Visual Studio 2012: several rows of tabs

Filed under: VisualStudio — Ivan Danilov @ 16:50

There’s very useful ability in Visual Studio 2012 that is disabled be default – I don’t know why. It can be enabled through Tools -> Options -> Environment -> Tabs and Windows -> Show pinned tabs in separate row.

For example, this is VS window with default settings:

You can see that all tabs placed to single row. And some of them even hidden – to look at them – you should click drop down button at the right side of tabs row and select tab you’re looking for. Well, when you need more than four or five tabs at once (maybe a little more depending on your display size) – it quickly becomes very inconvenient experience.

After you turn on “Show pinned tabs in separate row” option… well, nothing changed. That’s because we haven’t pinned any tabs yet, of course. When you do – you’ll see something like that:

And even better: your pinned tabs will never hide under drop down button! When you pin more tabs than one row can take – another row appears:

It is almost as if Visual Studio 6.0 is back! Do you remember that VS 6.0 has multi-row tabs from the start and then in VS 7.0 this useful feature was removed for some reasons? Well, now it is back. And in conjunction with Preview Tab (you can find more info about it here) it shines even better than before.

Advertisements

29/08/2012

Bug in C#/IL optimizations logic? o_O

Filed under: c# — Ivan Danilov @ 16:59

To be short: after half a day of minimizing bug case I got this code:

    [TestFixture]
    public class Tests
    {
        private class Relay
        {
            public Action Do { get; set; }
        }
        
        [Test]
        public void OptimizerStrangeness()
        {
            var relay = new Relay();
            var indicator = 0;
            relay.Do = () => indicator++;
            var weak = new WeakReference(relay);
            
            GC.Collect();

            var relayNew = weak.Target as Relay;
            if (relayNew == null) Assert.Fail();
            relayNew.Do();
            Assert.AreEqual(1, indicator);
        }
    }

Here is StackOverflow question on this topic.

Compiled under Debug mode test passes. Under Release mode it fails on highlighted line. relayNew variable is null there. But we still have relay variable in scope and haven’t assigned null to it. We still have strong reference! So, what’s happening?

I can only speculate, but my assumption is that optimizer was designed to follow such logic:

  1. relay variable is not accessed after assigning weak variable on line 15 (it could be easily proven via analyzing IL);
  2. we are about to call GC.Collect() manually, so probably we want to reclaim as much memory as possible;
  3. optimizer can “help” us by marking relay as not accessible and GC it with other garbage;

But it changes the visible behavior of the compiled code, so I assume it is a bug.

I checked generated IL and it is the same for Debug and Release mode (at least in regard to variable eligibility for GC). The only difference seems to be assembly-level attribute:

  • Debug mode:
    [assembly: Debuggable(DebuggableAttribute.DebuggingModes.Default
                        | DebuggableAttribute.DebuggingModes.DisableOptimizations
                        | DebuggableAttribute.DebuggingModes.IgnoreSymbolStoreSequencePoints 
                        | DebuggableAttribute.DebuggingModes.EnableEditAndContinue)]
    
  • Release mode:
    [assembly: Debuggable(DebuggableAttribute.DebuggingModes.IgnoreSymbolStoreSequencePoints)]
    

According to documentation it is related to JIT-optimizations, so my final assumption is that JIT-compiler makes things he thinks of as safe, but which are not.

Another indirect proof is that if you change code like below (only highlighted line added), test will pass in both Debug and Release modes:

        [Test]
        public void OptimizerStrangeness()
        {
            var relay = new Relay();
            var indicator = 0;
            relay.Do = () => indicator++;
            var weak = new WeakReference(relay);
            
            GC.Collect();

            var relayNew = weak.Target as Relay;
            if (relayNew == null) Assert.Fail();
            relayNew.Do();
            Assert.AreEqual(1, indicator);
            Assert.IsNotNull(relay);
        }

Here, optimizer can’t deduce that relay‘s use is trivial and cannot optimize it away before last line passed.

The problem is reproducible on .NET 2.0, .NET 4.0 and .NET 4.5 equally.

P.S. Actually I found the problem in real code, where I have weak event handlers that treated this dead weak reference as a message that event’s target has been GCed already and were unsubscribing from event source. And on top of it some ObservableCollection<T> that were using these weak events. It wasn’t an easy task to localize the problem. In fact even reproducing test failure was not trivial, because I usually build in Debug mode at my box and good long time was wondering why tests pass on my machine, but were failing on three different build agents.
Phew. At least it can be fixed without re-architecturing half of the app. 🙂

Update. Here is the comment I found in .NET Framework sources for GC.KeepAlive(object obj) method:

    // This method DOES NOT DO ANYTHING in and of itself.  It's used to
    // prevent a finalizable object from losing any outstanding references 
    // a touch too early.  The JIT is very aggressive about keeping an 
    // object's lifetime to as small a window as possible, to the point
    // where a 'this' pointer isn't considered live in an instance method 
    // unless you read a value from the instance.  So for finalizable
    // objects that store a handle or pointer and provide a finalizer that
    // cleans them up, this can cause subtle ----s with the finalizer
    // thread.  This isn't just about handles - it can happen with just 
    // about any finalizable resource.
    // 
    // Users should insert a call to this method near the end of a 
    // method where they must keep an object alive for the duration of that
    // method, up until this method is called.  Here is an example: 
    //
    // "...all you really need is one object with a Finalize method, and a
    // second object with a Close/Dispose/Done method.  Such as the following
    // contrived example: 
    //
    // class Foo { 
    //    Stream stream = ...; 
    //    protected void Finalize() { stream.Close(); }
    //    void Problem() { stream.MethodThatSpansGCs(); } 
    //    static void Main() { new Foo().Problem(); }
    // }
    //
    // 
    // In this code, Foo will be finalized in the middle of
    // stream.MethodThatSpansGCs, thus closing a stream still in use." 
    // 
    // If we insert a call to GC.KeepAlive(this) at the end of Problem(), then
    // Foo doesn't get finalized and the stream says open. 
    [System.Security.SecuritySafeCritical]  // auto-generated
    [ResourceExposure(ResourceScope.None)]
    [MethodImplAttribute(MethodImplOptions.InternalCall)]
    [ReliabilityContract(Consistency.WillNotCorruptState, Cer.Success)] 
    public static extern void KeepAlive(Object obj);

Also you may be interested in this excerpt from C# Language Specification, section 3.9:

If the object, or any part of it, cannot be accessed by any possible continuation of execution, other than the running of destructors, the object is considered no longer in use, and it becomes eligible for destruction. The C# compiler and the garbage collector may choose to analyze code to determine which references to an object may be used in the future. For instance, if a local variable that is in scope is the only existing reference to an object, but that local variable is never referred to in any possible continuation of execution from the current execution point in the procedure, the garbage collector may (but is not required to) treat the object as no longer in use.

So, now I’m convinced now that it is intended behavior. Good for performance, bad for clarity. Well, now I know that in such situations I have to remember to put GC.KeepAlive(relay) in the end (instead of NotNull assertion, that is meaningless in production code of course). Better I won’t forget this 🙂

Hail the unit-tests and hail the build server running them in Release mode!

23/08/2012

My environment for day-to-day work with git-tfs

Filed under: git, git-tfs, tools — Ivan Danilov @ 16:22

I’ve been asked many times to describe how things are organized at our project with git-tfs, so it’s the time to write it down – repeating these things over and over isn’t effective at all.

So, below I will describe how things are arranged to make TFS2010, msysgit and git-tfs work together.

First, setup involves several physical boxes, so lets name them at once:

  1. DevA, DevB – two developer’s boxes;
  2. GitServer – Windows Server 2008 R2 box with IIS – hosts central git repository, exposes git repositories over HTTP and has scheduler task to synchronize central git repository with TFS;
  3. TfsServer – remote host with TFS Server 2010 (it is our corporate beast and I don’t have administrative access there);

I will assume you already installed git and git-tfs to your machines. So, let’s start.

 

1. Cloning TFS repository first time

First, you have to clone repository from TFS somewhere. On your DevA box open bash console, go to the folder where you want sources reside and execute this:

git tfs clone -d http://TfsServer:8080/tfs $/ProjectA/Src proj-git

or maybe, if you don’t interested in full history and want to have just last TFS sources state:

git tfs quick-clone -d http://TfsServer:8080/tfs $/ProjectA/Src proj-git

It may take up to several hours if you have large repository and slow connection. If everything is fine – folder proj-git will appear there. Inside are your working copy with latest TFS version and folder .git with configuration.

 

2. Setting up GitServer and auto-update from TFS

git-tfs doesn’t work with bare repositories (at least it was so at the time I followed its development), thus we will need two repositories for each TFS one: first will be bare, and accessible through HTTP; and second one would be private – there we will setup scheduled fetching from TFS server, each time new changes are found in TFS – we will push them to first repository. To distinguish these two I will refer to first as Central and second as Working.

  1. Install IIS server role
  2. Install GitWebAccess. As I wrote before last version seems broken, so you can download my working version. Unpack it to C:\inetpub\wwwroot\gwa, and follow this manual. I will suppose you created folder C:\GitRepos for your git repos as suggested in this manual.
  3. Create new folder for Working repo somewhere. For this example it would be C:\working
  4. Copy content of proj-git folder (where you’ve cloned TFS repository on 1st step before) to C:\working and execute
    git checkout -B master tfs/default
    

    thus master branch will be set at your last TFS commit

  5. Add C:\working\.gitignore file if it is not present with these lines:
    /update
    /log.txt
    

    If file is present – just add such lines there. update is the script we will write next and log.txt… well, it is log of updating. As updating will be executed by scheduler – it is the only way to know what’s happening.

  6. Now we will need bare repository in the C:\GitRepos\Project.git. Execute from C:\GitRepos these command:
    git clone --bare /c/working Project.git
    git remote rm origin # we don't need origin here
    

    If it was done correctly – you should have your repository accessible by HTTP through GitWebAccess.

  7. Now it’s the time to setup update in Working repository. The convention I embrace is that master branch of my repository will always follow TFS repository line. Thus, no one except update script should push it directly and it will never have changes conflicting with TFS data.
    You need to setup origin remote in the Working repo in order for script below to work:

    git remote add origin /c/GitRepos/Project.git
    

    Next, create file C:\Working\update and put these lines to it:

    #!/bin/sh
    
    check_err()
    {
    	# parameter 1 is last exit code
    	# parameter 2 is error message that should be shown if error code is not 0
    	if [ "${1}" -ne "0" ]; then
    		cat '~temp.log'
    		echo ${2}
    		rm -f '~temp.log' > /dev/null
    		exit ${1}
    	fi;
    	rm -f '~temp.log' > /dev/null
    }
    
    echo "$(date) Update launched"
    
    if [ ! -z "$(git status --porcelain)" ]; then
    	echo "$(date) Your status is not clean, can\'t update"
    	exit 1;
    fi;
    
    echo "$(date) Pulling from central repo first to avoid redundant round-trips to TFS..."
    git pull origin master:master > '~temp.log'
    check_err $? "Pulling from central repo failed"
    
    echo "$(date) Pulling from TFS..."
    git tfs pull -d > '~temp.log'
    check_err $? "Pulling from TFS resulted in error";
    
    local_commits_to_push="$(git rev-list master ^origin/master)"
    if [ -z "$local_commits_to_push" ]; then
    	echo "$(date) Central repo is up-to-date, nothing to push"
    else
    	echo "$(date) Pushing updates to central repo"
    	git push origin master > '~temp.log'
    	check_err $? "Push to central resulted in error";
    fi;
    

    You may check it – just execute update from bash console. Your latest updates should be fetched from TFS. In case there were some changes – they should be also pushed to Central‘s master branch.
    It is simplified version of the script from this post. Also there you can find how to create scheduler task to execute this update script every 5 minutes – do it now.

 

3. Create rcheckin and reap the benefits

Now, back to the developer’s environment. We want to use that auto-updating central repository effective, don’t we? So:

  • We don’t need to use git tfs fetch at all anymore: we may limit ourselves to git fetch because latest code is always in the Central git repository.
  • When git tfs rcheckin is executed – it fetches latest changes from TFS. We need to reuse as much as possible from already fetched changes. So, we need to fetch manually from Central before executing git tfs rcheckin and somehow make git-tfs mark it as correct TFS-fetched commits.
  • And finally, after we checkined something to TFS – we want to push it to Central, so auto-updater won’t fetch what we already have – it could have significant performance impact in case of large commits, or long line of commits with slow connection. In such case we may even want to disable auto-updater for some time until we push things to Central and then just re-enable it.

So, in order to achieve all these points – instead of using git tfs rcheckin that doesn’t know anything about our environment – we will use custom script rcheckin (do not forget to add it to .gitignore if needed):

#!/bin/sh

check_err()
{
	# parameter 1 is last exit code
	# parameter 2 is error message that should be shown if error code is not 0
	if [ "${1}" -ne "0" ]; then
		cat '~temp.log'
		echo ${2}
		rm -f '~temp.log' > /dev/null
		exit ${1}
	fi;
	rm -f '~temp.log' > /dev/null
}

if [ ! -z "$(git status --porcelain)" ]; then
	echo "Your status is not clean, can't continue"
	exit 1;
fi;

echo "Fetching origin..."
git fetch origin

if [ -n "`git rev-list HEAD..origin/master`" ]; then
	echo "origin/master has some TFS checkins that are conflicting with your branch. Please reabse first, then try to rcheckin again"
	exit 1;
fi;

echo "Marking latest TFS commit with bootstrap..."
git tfs bootstrap -d

git tfs rcheckin
check_err $? "rcheckin exited with error"

git push origin HEAD:master
check_err $? "Can't push HEAD to origin"

Well, that’s almost all.

 

4. Setup another developer environment

So, suppose DevB also needs to work with your server. Here is how he can start his work easily:

  1. git clone http://GitServer:8080/project.git project
    cd project
    git tfs bootstrap -d # make git-tfs mark what is required
    
  2. Get rcheckin script from previous section.
  3. Work!

 

5. My common workflow

So, I’ll try to describe how I work in general, supposing I’m in my dev repository, set up as described above.

  1. First, let’s check that no changes appeared while I was writing all of these: git fetch origin. OK, no changes.
  2. Working, making changes hard… committing several times, squashing, rewriting etc.
  3. Ready to push to TFS? Great! Let’s check again – maybe someone pushed something while I have been working: git fetch origin. Yeah, there’re some changes, origin/master has updates. Let’s rebase my work there (I don’t like merging my local work which is not seen by anyone except me – it just clutters commit graph): git rebase origin/master, maybe some conflict resolving. Hey, I’m ready to push: rcheckin.
  4. Yet some work.
  5. Working time is over and I have not finished my current task. I want it backed up on server, so my local machine failure can’t waste my work: git push origin HEAD:idanilov/temp. idanilov is my ‘private’ branch namespace, every developer has one. By convenience it is just backups and it is not considered published. For open-source projects you probably won’t do this, but corporative rules sometimes require some assurance that you won’t lost your work that is already paid for.

Sometimes I may switch branches, stash changes and do other git stuff, but it is not so frequent. Most often all of my workflow is fetch-rebase-rcheckin chain.

I wanted to explain yet some things – e.g., how to have a bunch of bash scripts in a way that it doesn’t get to TFS (it’s easy, just use .gitignore) and have a way to synchronize them across all developers using git (it’s much harder if you don’t want to bother running to every developer, copying new versions of scripts each time you want to correct one line in some script). Or how to automate git console, so you don’t need to type long commands every time which is pretty boring. But… this post is already pretty long, so I will do this later.

Have fun moving away from TFS with git and git-tfs! 😉

17/08/2012

Implicit closures in C# lambdas, part 2

Filed under: c# — Ivan Danilov @ 17:59

Wow. I’ve just found that not all I said in first part is true (I’ve already marked wrong place so readers would not be confused). Shame on me.

My mistake was to rely on ReSharper’s warnings totally. So, in the ending of last post I had changed code to make copy of variable for closure like that:

        public static void DoSomething(object a, object b, Observer observer, Receiver r)
        {
            var receiverCopy = r;
            EventHandler action1 = (s, e) => receiverCopy.Method(a);
            EventHandler action2 = (s, e) => r.Method(b);
            observer.X += action1;
            observer.X += action2;
        }

And assumed that it is enough to convince compiler to capture variables for two lambdas in two different compiler-generated classes – based on the fact that ReSharper warning is gone. It turned out (by experiencing very real memory leak in unit-test first and then looking at generated IL to understand) absence of warning means that ReSharper is just unable to find such implicit closure, not that implicit closure is gone. Compiler is still generates single class for all these closures. And at this time it is probably an optimization from compiler’s side – namely, try to decrease number of types in the assembly by generating one less.

What I ended up with is hand-writing explicit class very similar to that in previous post. This solved problem with memory leak, and looking at IL proved that it helped. But code is now ugly. And worse – I don’t know how to spot such problems easily without thorough analysis of the code.

UPD: Pretty much the same thing in another blog post.

UPD2: As Eric answered at SO, this behavior was implemented because it was easier (for compiler developers) and part about lambda rewriting was already pretty complicated at the time. And, hopefully with Roslyn we may have better.

Also, good alternative was offered as well.

Implicit closures in C# lambdas, part 1

Filed under: c# — Ivan Danilov @ 16:38

In my day-to day work I was alerted once by R# warning saying ‘Implicitly captured closure: variableName’. After checking actual IL generated by compiler I found that R# warning is pretty much correct. So I did some ground-work and wrote small sample demonstrating this:

    public class Observer
    {
        public event EventHandler X = delegate { };
    }

    public class Receiver
    {
        public void Method(object o) {}
    }

    public class Program
    {
        public static void DoSomething(object a, object b, Observer observer, Receiver r)
        {
            EventHandler action1 = (s, e) => r.Method(a); //Implicitly captured closure: b
            EventHandler action2 = (s, e) => r.Method(b); //Implicitly captured closure: a
            observer.X += action1;
            observer.X += action2;
        }

        public static void Main(string[] args)
        {
            var observer = new Observer();
            var receiver = new Receiver();
            DoSomething(new object(), new object(), observer, receiver);
        }
    }

It seems the cause is simple: when several lambdas are sharing variables-need-to-be-captured (in this case r is present in both) – compiler makes only one class with unspeakable name that contains both methods and all of their variables.

Below is generated IL code. It is there just for completeness, because below I will explain these in terms of equivalent C#, so you don’t need to understand IL to get the idea. It is fairly long, so you most probably don’t want to see it all.

.class public auto ansi beforefieldinit ConsoleApplication1.Program
	extends [mscorlib]System.Object
{
	// Nested Types
	.class nested private auto ansi sealed beforefieldinit '<>c__DisplayClass2'
		extends [mscorlib]System.Object
	{
		.custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = (
			01 00 00 00
		)
		// Fields
		.field public object a
		.field public object b
		.field public class ConsoleApplication1.Receiver r

		// Methods
		.method public hidebysig specialname rtspecialname 
			instance void .ctor () cil managed 
		{
			// Method begins at RVA 0x2104
			// Code size 7 (0x7)
			.maxstack 8

			IL_0000: ldarg.0
			IL_0001: call instance void [mscorlib]System.Object::.ctor()
			IL_0006: ret
		} // end of method '<>c__DisplayClass2'::.ctor

		.method public hidebysig 
			instance void '<DoSomething>b__0' (
				object s,
				class [mscorlib]System.EventArgs e
			) cil managed 
		{
			// Method begins at RVA 0x210c
			// Code size 19 (0x13)
			.maxstack 8

			IL_0000: ldarg.0
			IL_0001: ldfld class ConsoleApplication1.Receiver ConsoleApplication1.Program/'<>c__DisplayClass2'::r
			IL_0006: ldarg.0
			IL_0007: ldfld object ConsoleApplication1.Program/'<>c__DisplayClass2'::a
			IL_000c: callvirt instance void ConsoleApplication1.Receiver::Method(object)
			IL_0011: nop
			IL_0012: ret
		} // end of method '<>c__DisplayClass2'::'<DoSomething>b__0'

		.method public hidebysig 
			instance void '<DoSomething>b__1' (
				object s,
				class [mscorlib]System.EventArgs e
			) cil managed 
		{
			// Method begins at RVA 0x2120
			// Code size 19 (0x13)
			.maxstack 8

			IL_0000: ldarg.0
			IL_0001: ldfld class ConsoleApplication1.Receiver ConsoleApplication1.Program/'<>c__DisplayClass2'::r
			IL_0006: ldarg.0
			IL_0007: ldfld object ConsoleApplication1.Program/'<>c__DisplayClass2'::b
			IL_000c: callvirt instance void ConsoleApplication1.Receiver::Method(object)
			IL_0011: nop
			IL_0012: ret
		} // end of method '<>c__DisplayClass2'::'<DoSomething>b__1'

	} // end of class <>c__DisplayClass2


	// Methods
	.method public hidebysig static 
		void DoSomething (
			object a,
			object b,
			class ConsoleApplication1.Observer observer,
			class ConsoleApplication1.Receiver r
		) cil managed 
	{
		// Method begins at RVA 0x2134
		// Code size 72 (0x48)
		.maxstack 2
		.locals init (
			[0] class [mscorlib]System.EventHandler action1,
			[1] class [mscorlib]System.EventHandler action2,
			[2] class ConsoleApplication1.Program/'<>c__DisplayClass2' 'CS$<>8__locals3'
		)

		IL_0000: newobj instance void ConsoleApplication1.Program/'<>c__DisplayClass2'::.ctor()
		IL_0005: stloc.2
		IL_0006: ldloc.2
		IL_0007: ldarg.0
		IL_0008: stfld object ConsoleApplication1.Program/'<>c__DisplayClass2'::a
		IL_000d: ldloc.2
		IL_000e: ldarg.1
		IL_000f: stfld object ConsoleApplication1.Program/'<>c__DisplayClass2'::b
		IL_0014: ldloc.2
		IL_0015: ldarg.3
		IL_0016: stfld class ConsoleApplication1.Receiver ConsoleApplication1.Program/'<>c__DisplayClass2'::r
		IL_001b: nop
		IL_001c: ldloc.2
		IL_001d: ldftn instance void ConsoleApplication1.Program/'<>c__DisplayClass2'::'<DoSomething>b__0'(object, class [mscorlib]System.EventArgs)
		IL_0023: newobj instance void [mscorlib]System.EventHandler::.ctor(object, native int)
		IL_0028: stloc.0
		IL_0029: ldloc.2
		IL_002a: ldftn instance void ConsoleApplication1.Program/'<>c__DisplayClass2'::'<DoSomething>b__1'(object, class [mscorlib]System.EventArgs)
		IL_0030: newobj instance void [mscorlib]System.EventHandler::.ctor(object, native int)
		IL_0035: stloc.1
		IL_0036: ldarg.2
		IL_0037: ldloc.0
		IL_0038: callvirt instance void ConsoleApplication1.Observer::add_X(class [mscorlib]System.EventHandler)
		IL_003d: nop
		IL_003e: ldarg.2
		IL_003f: ldloc.1
		IL_0040: callvirt instance void ConsoleApplication1.Observer::add_X(class [mscorlib]System.EventHandler)
		IL_0045: nop
		IL_0046: nop
		IL_0047: ret
	} // end of method Program::DoSomething

	.method public hidebysig static 
		void Main (
			string[] args
		) cil managed 
	{
		// Method begins at RVA 0x2188
		// Code size 32 (0x20)
		.maxstack 4
		.entrypoint
		.locals init (
			[0] class ConsoleApplication1.Observer observer,
			[1] class ConsoleApplication1.Receiver receiver
		)

		IL_0000: nop
		IL_0001: newobj instance void ConsoleApplication1.Observer::.ctor()
		IL_0006: stloc.0
		IL_0007: newobj instance void ConsoleApplication1.Receiver::.ctor()
		IL_000c: stloc.1
		IL_000d: newobj instance void [mscorlib]System.Object::.ctor()
		IL_0012: newobj instance void [mscorlib]System.Object::.ctor()
		IL_0017: ldloc.0
		IL_0018: ldloc.1
		IL_0019: call void ConsoleApplication1.Program::DoSomething(object, object, class ConsoleApplication1.Observer, class ConsoleApplication1.Receiver)
		IL_001e: nop
		IL_001f: ret
	} // end of method Program::Main

	.method public hidebysig specialname rtspecialname 
		instance void .ctor () cil managed 
	{
		// Method begins at RVA 0x21b4
		// Code size 7 (0x7)
		.maxstack 8

		IL_0000: ldarg.0
		IL_0001: call instance void [mscorlib]System.Object::.ctor()
		IL_0006: ret
	} // end of method Program::.ctor

} // end of class ConsoleApplication1.Program

So, basically, what C# compiler did overall is something like this:

        // actually named <>c__DisplayClass2 in IL
        private class CompilerGeneratedClass 
        {
            public object a;
            public object b;
            public Receiver r;

            // actually named <DoSomething>b__0 in IL
            public void Method1(object s, EventArgs e)
            {
                 r.Method(a);
            }
            // actually named <DoSomething>b__1 in IL
            public void Method2(object s, EventArgs e)
            {
                r.Method(b);
            }
        }

        public static void DoSomething(object a, object b, Observer observer, Receiver r)
        {
            // actually named CS$<>8__locals3 in IL
            CompilerGeneratedClass helper = new CompilerGeneratedClass();
            helper.a = a;
            helper.b = b;
            helper.r = r;
            EventHandler action1 = helper.Method1;
            EventHandler action2 = helper.Method2;
            observer.X += action1;
            observer.X += action2;
        }

So, you see, no lambdas here. Compiler replaced every variable used in closure with nested class’ variable. That’s fine, but “what about subject and these implicit closures?” you may ask. Well, that’s simple from that point. In the original code you have two unconnected lambdas and might assume that as long as first one along with a object are not used anymore – they will be GCed regardless of current state of second lambda, right? But now it is clear that as long as CompilerGeneratedClass is alive – both a and b will live. And generated class will live at least as long as someone needs any of these two lambdas.

Essentially, we can count it as both lambdas have taken closure around both a and b variables. Thus ReSharper’s warning.

What problems can it bring to you as a developer? Suppose a is cheap object and action1 is passed to long-living service class; meanwhile b is costly object and action2 is used here once and forgotten. One assumes that latter one will be called and GCed along with costly b object almost immediately. But it will live as long as service would hold former lambda causing hard-to-find memory leak.

Why compiler can’t do better? I can’t say for certainty, so these are only my speculations on the subject. Nevertheless. Why compiler can’t create two generated classes so that one would capture r and a, while second would capture r and b? It could, and in this simple case it will be ok. But in more complicated cases it’ll lead to problems. These problems would be more severe that implicit captures. For one example, suppose method with lambdas is not only captures essentially read-only arguments, but also writes to them. How would you capture variable that has writes to it to two different generated classes? And even if you manage to do that – how would you synchronize these two values? And what about multi-threaded code? You see, a bunch of hard questions arises the very moment you allow one variable to be ‘distributed’ between several generated classes.

So, what can I do to avoid leaks and problems? Well, ReSharper has pretty good eye for these things, so watch out for its warnings. And if one arises – try to solve it. [The following advice is not correct! See part 2 for details] General solution is to copy captured variable to another variable manually – it is what compiler can’t do safely for you because of reasons above, but you know context better and in most cases can do this easily. For example, for my code it would be this:

        public static void DoSomething(object a, object b, Observer observer, Receiver r)
        {
            var receiverCopy = r;
            EventHandler action1 = (s, e) => receiverCopy.Method(a);
            EventHandler action2 = (s, e) => r.Method(b);
            observer.X += action1;
            observer.X += action2;
        }

That’s it. In some cases, though, you can eliminate some captures without much efforts at all. For example, sender in events references what you need – use it, not captures if it helps!

16/08/2012

T4 and FxCop: ignoring auto-generated code checks

Filed under: t4, tools — Tags: — Ivan Danilov @ 19:39

Just a quick note: if you write some T4 templates and do not want FxCop to complain about Hungarian notation, underscores in the names and other stuff – be sure to mark your generated classes with something like that

[global::System.CodeDom.Compiler.GeneratedCode("MyGeneratorTool", "1.0.0.0")]

Git-Web-Access: access git through IIS WebApp

Filed under: git — Ivan Danilov @ 05:00

I’m using subject to host my repositories and in general it works well. It was the only really working git HTTP server on windows at the time (at least I hadn’t found others then).

Today I was installing GWA on another server and was suddenly struck with errors. After quick check I found that binaries from codeplex are not correspond to latest sources (namely, ILSpy shown that GitHandler.cs was without line context.Response.Charset = null; which is critical for working with JGit-based clients including TeamCity or Eclipse’s egit plugin).

And after I downloaded sources and built them… well, it wasn’t working at all:

$ git fetch
fatal: protocol error: bad line length character:
00a

Same it TeamCity. So, I reverted to my own last commit and rebuilt. At that point I got working tool, at last.

It seems GWA is abandoned – undeservedly so, IMO. In any case – if you want Git HTTP Server on Windows – either build it from working sources or download here.

Weakly referenced Decorator: meet the ephemerons in .NET

Filed under: patterns — Ivan Danilov @ 01:07

Sometimes we have long-living class that should notify other classes with potentially short lifetime about some event. In such case WeakReference is the obvious solution to prevent memory leaks. Well, something like that, supposing long-lived class named Service and short-living is Client (for simplicity sake I’ve limited Service‘s handling abilities to just one client):

    public interface IClient {
        void Notify();
    }

    public class Service {
        private WeakReference _clientRef;

        public void AddClient(IClient client) {
            _clientRef = new WeakReference(client);
        }

        public void NotifyClientIfAlive() {
            var client = (IClient)_clientRef.Target;
            if (client != null) client.Notify();
        }
    }

    public class Client : IClient {
        public void Notify() { /* not important */ }
    }

So far, so good. You’ve wrote such code and it’s working beautifully. But now yet one task comes: suppose it is logging, or something similar and you decide to implement that via the Decorator pattern. Well, it is very handy we have that IClient interface, isn’t it? Just a piece of cake:

    public class ClientDecorator : IClient {
        private readonly IClient _realClient;

        public ClientDecorator(IClient realClient) {
            _realClient = realClient;
        }

        public void Notify() {
            // do what is needed
            _realClient.Notify();
        }
    }

Now we can just pass instance of ClientDecorator to the Service and be off.
Well, not really. If you will construct IClient just from the start with decorator and work with decorated instance all the time – it is ok.
But what if you want to implement decoration of calls from this service only? Or, you receive already-made instance of IClient from somewhere. Then you have a problem: if you wrap your instance and ‘forget’ strong reference to newly created decorator – it would be garbage collected at the very next GC pass. Moreover, it will break not only your logging, but the message will never reach real client in the first place. Let me illustrate the problem with small unit test:

        [Test]
        public void DecoratorIsNotGCedWhenRefsToRealClientExist() {
            var client = new Client();
            var decorator = new ClientDecorator(client);
            var service = new Service();
            service.AddClient(decorator);

            // create weak refs to watch if these are alive
            var weakClient = new WeakReference(client);
            var weakDecorator = new WeakReference(decorator);

            // nullify strong ref to decorator, but leave to client
            decorator = null;

            GC.Collect();
            GC.WaitForPendingFinalizers();
            GC.Collect();

            Assert.IsTrue(weakClient.IsAlive);
            Assert.IsTrue(weakDecorator.IsAlive);
        }

Oops. Red test. Not good. Well, of course, we could have made Service‘s refs strong, but than both client and decorator will live at least as long as the Service, i.e. we will introduce memory leak this way. Let’s write another unit test and requirement:

        [Test]
        public void ClientDecoratorIsGCedWhenNoClientRefsAreAlive() {
            var client = new Client();
            var decorator = new ClientDecorator(client);
            var service = new Service();
            service.AddClient(decorator);

            // create weak refs to watch if these are alive
            var weakClient = new WeakReference(client);
            var weakDecorator = new WeakReference(decorator);

            // nullify strong refererences
            client = null;
            decorator = null;

            GC.Collect();
            GC.WaitForPendingFinalizers();
            GC.Collect();

            Assert.IsFalse(weakClient.IsAlive);
            Assert.IsFalse(weakDecorator.IsAlive);
        }

It is almost the same except that both strong refs to decorator and client dying, and after GC – they should be swept, simulating client death when service is still alive and demonstrating that no memory leaks is present.

Another option is to make service aware of decorator’s presence: then service can have strong ref to decorator and weak – to real client, decorator will have weak reference to client and when notification is there – service can check real client and remove decorator as well. It is complex even to read, not only to understand. And totally defeats purpose of decorator. And if you will have to add two, three or four decorators to the single client… oops, you have a real problem in this case with such an approach.

So, can we do better? Yes, we can. What we need is to invent a way to keep decorator’s instance alive not as long as someone have strong refs to itself, but as long as someone have strong refs to other instance, namely to real client instance. That task is precise description of ephemeron. Since .NET 4.0 they are supported by CLR in the form of ConditionalWeakTable<TKey, TValue> class.

Here is the code:

    public class ClientDecorator : IClient {
        private static readonly ConditionalWeakTable<IClient, ClientDecorator> _keepaliveMap = new ConditionalWeakTable<IClient, ClientDecorator>();
        private readonly IClient _realClient;

        public ClientDecorator(IClient realClient) {
            _realClient = realClient;
            _keepaliveMap.Add(realClient, this);
        }

        public void Notify() {
            // do what is needed
            _realClient.Notify();
        }
    }

Basically adding to _keepaliveMap is equivalent to making this instance non-eligible for GC while realClient is not going to be GCed.

Both tests are green, hooray! But we can do even better, we can move out required logic to base class and enable any decorator to use it without code duplication, like that:

    public class WeaklyReferencedDecorator<T> where T : class {
        private static readonly ConditionalWeakTable<T, WeaklyReferencedDecorator<T>> _keepaliveMap = new ConditionalWeakTable<T, WeaklyReferencedDecorator<T>>();
        protected readonly T Instance;

        public WeaklyReferencedDecorator(T instance) {
            Instance = instance;
            _keepaliveMap.Add(instance, this);
        }
    }

    public class ClientDecorator : WeaklyReferencedDecorator<IClient>, IClient {
        public ClientDecorator(IClient realClient) : base(realClient) {}

        public void Notify() {
            // do what is needed
            Instance.Notify();
        }
    }

It seems, with base class amount of code… huh, LOCs number even decreased as we have Instance variable on base class now. Looks good, isn’t it?

Just to be sure everything is OK, lets write tests to simulate several stacking decorators one on top of another:

        [Test]
        public void StackingTwoDecoratorsGCedCorrectly() {
            var client = new Client();
            var service = new Service();
            var clientDecorator1 = new ClientDecorator(client);
            var clientDecorator2 = new ClientDecorator(clientDecorator1);
            service.AddClient(clientDecorator2);

            var weakDecorator1 = new WeakReference(clientDecorator1);
            var weakDecorator2 = new WeakReference(clientDecorator1);
            var weakClient = new WeakReference(client);
            clientDecorator1 = clientDecorator2 = null;
            client = null;

            GC.Collect();
            GC.WaitForPendingFinalizers();
            GC.Collect();

            Assert.IsFalse(weakClient.IsAlive);
            Assert.IsFalse(weakDecorator1.IsAlive);
            Assert.IsFalse(weakDecorator2.IsAlive);
        }

        [Test]
        public void StackingTwoDecoratorsNotGCedIfRefToRealClientExist() {
            var client = new Client();
            var service = new Service();
            var clientDecorator1 = new ClientDecorator(client);
            var clientDecorator2 = new ClientDecorator(clientDecorator1);
            service.AddClient(clientDecorator2);

            var weakDecorator1 = new WeakReference(clientDecorator1);
            var weakDecorator2 = new WeakReference(clientDecorator1);
            var weakClient = new WeakReference(client);
            clientDecorator1 = clientDecorator2 = null;

            GC.Collect();
            GC.WaitForPendingFinalizers();
            GC.Collect();

            Assert.IsTrue(weakClient.IsAlive);
            Assert.IsTrue(weakDecorator1.IsAlive);
            Assert.IsTrue(weakDecorator2.IsAlive);
        }

All of the tests pass, so it seems we’re done.

And a last word about where it could be possibly useful except mentioned scenario with some notifying service (which will probably have a collection of WeakReference’s is real world). Here is my answer to related question. In short, with addition of collectible assemblies for dynamic type generation in .NET 4.0, you can’t safely cache reflection data in something alike Dictionary<Type, MyReflectionResult> because doing so will prevent GC from collecting such dynamic types even if no one needs them anymore because your cache will keep them alive. Bad thing, especially if you’re writing some general-purpose lib or framework.

Hope this post was interesting to someone 🙂

Blog at WordPress.com.