A place for spare thoughts

29/08/2012

Bug in C#/IL optimizations logic? o_O

Filed under: c# — Ivan Danilov @ 16:59

To be short: after half a day of minimizing bug case I got this code:

    [TestFixture]
    public class Tests
    {
        private class Relay
        {
            public Action Do { get; set; }
        }
        
        [Test]
        public void OptimizerStrangeness()
        {
            var relay = new Relay();
            var indicator = 0;
            relay.Do = () => indicator++;
            var weak = new WeakReference(relay);
            
            GC.Collect();

            var relayNew = weak.Target as Relay;
            if (relayNew == null) Assert.Fail();
            relayNew.Do();
            Assert.AreEqual(1, indicator);
        }
    }

Here is StackOverflow question on this topic.

Compiled under Debug mode test passes. Under Release mode it fails on highlighted line. relayNew variable is null there. But we still have relay variable in scope and haven’t assigned null to it. We still have strong reference! So, what’s happening?

I can only speculate, but my assumption is that optimizer was designed to follow such logic:

  1. relay variable is not accessed after assigning weak variable on line 15 (it could be easily proven via analyzing IL);
  2. we are about to call GC.Collect() manually, so probably we want to reclaim as much memory as possible;
  3. optimizer can “help” us by marking relay as not accessible and GC it with other garbage;

But it changes the visible behavior of the compiled code, so I assume it is a bug.

I checked generated IL and it is the same for Debug and Release mode (at least in regard to variable eligibility for GC). The only difference seems to be assembly-level attribute:

  • Debug mode:
    [assembly: Debuggable(DebuggableAttribute.DebuggingModes.Default
                        | DebuggableAttribute.DebuggingModes.DisableOptimizations
                        | DebuggableAttribute.DebuggingModes.IgnoreSymbolStoreSequencePoints 
                        | DebuggableAttribute.DebuggingModes.EnableEditAndContinue)]
    
  • Release mode:
    [assembly: Debuggable(DebuggableAttribute.DebuggingModes.IgnoreSymbolStoreSequencePoints)]
    

According to documentation it is related to JIT-optimizations, so my final assumption is that JIT-compiler makes things he thinks of as safe, but which are not.

Another indirect proof is that if you change code like below (only highlighted line added), test will pass in both Debug and Release modes:

        [Test]
        public void OptimizerStrangeness()
        {
            var relay = new Relay();
            var indicator = 0;
            relay.Do = () => indicator++;
            var weak = new WeakReference(relay);
            
            GC.Collect();

            var relayNew = weak.Target as Relay;
            if (relayNew == null) Assert.Fail();
            relayNew.Do();
            Assert.AreEqual(1, indicator);
            Assert.IsNotNull(relay);
        }

Here, optimizer can’t deduce that relay‘s use is trivial and cannot optimize it away before last line passed.

The problem is reproducible on .NET 2.0, .NET 4.0 and .NET 4.5 equally.

P.S. Actually I found the problem in real code, where I have weak event handlers that treated this dead weak reference as a message that event’s target has been GCed already and were unsubscribing from event source. And on top of it some ObservableCollection<T> that were using these weak events. It wasn’t an easy task to localize the problem. In fact even reproducing test failure was not trivial, because I usually build in Debug mode at my box and good long time was wondering why tests pass on my machine, but were failing on three different build agents.
Phew. At least it can be fixed without re-architecturing half of the app. 🙂

Update. Here is the comment I found in .NET Framework sources for GC.KeepAlive(object obj) method:

    // This method DOES NOT DO ANYTHING in and of itself.  It's used to
    // prevent a finalizable object from losing any outstanding references 
    // a touch too early.  The JIT is very aggressive about keeping an 
    // object's lifetime to as small a window as possible, to the point
    // where a 'this' pointer isn't considered live in an instance method 
    // unless you read a value from the instance.  So for finalizable
    // objects that store a handle or pointer and provide a finalizer that
    // cleans them up, this can cause subtle ----s with the finalizer
    // thread.  This isn't just about handles - it can happen with just 
    // about any finalizable resource.
    // 
    // Users should insert a call to this method near the end of a 
    // method where they must keep an object alive for the duration of that
    // method, up until this method is called.  Here is an example: 
    //
    // "...all you really need is one object with a Finalize method, and a
    // second object with a Close/Dispose/Done method.  Such as the following
    // contrived example: 
    //
    // class Foo { 
    //    Stream stream = ...; 
    //    protected void Finalize() { stream.Close(); }
    //    void Problem() { stream.MethodThatSpansGCs(); } 
    //    static void Main() { new Foo().Problem(); }
    // }
    //
    // 
    // In this code, Foo will be finalized in the middle of
    // stream.MethodThatSpansGCs, thus closing a stream still in use." 
    // 
    // If we insert a call to GC.KeepAlive(this) at the end of Problem(), then
    // Foo doesn't get finalized and the stream says open. 
    [System.Security.SecuritySafeCritical]  // auto-generated
    [ResourceExposure(ResourceScope.None)]
    [MethodImplAttribute(MethodImplOptions.InternalCall)]
    [ReliabilityContract(Consistency.WillNotCorruptState, Cer.Success)] 
    public static extern void KeepAlive(Object obj);

Also you may be interested in this excerpt from C# Language Specification, section 3.9:

If the object, or any part of it, cannot be accessed by any possible continuation of execution, other than the running of destructors, the object is considered no longer in use, and it becomes eligible for destruction. The C# compiler and the garbage collector may choose to analyze code to determine which references to an object may be used in the future. For instance, if a local variable that is in scope is the only existing reference to an object, but that local variable is never referred to in any possible continuation of execution from the current execution point in the procedure, the garbage collector may (but is not required to) treat the object as no longer in use.

So, now I’m convinced now that it is intended behavior. Good for performance, bad for clarity. Well, now I know that in such situations I have to remember to put GC.KeepAlive(relay) in the end (instead of NotNull assertion, that is meaningless in production code of course). Better I won’t forget this 🙂

Hail the unit-tests and hail the build server running them in Release mode!

23/08/2012

My environment for day-to-day work with git-tfs

Filed under: git, git-tfs, tools — Ivan Danilov @ 16:22

I’ve been asked many times to describe how things are organized at our project with git-tfs, so it’s the time to write it down – repeating these things over and over isn’t effective at all.

So, below I will describe how things are arranged to make TFS2010, msysgit and git-tfs work together.

First, setup involves several physical boxes, so lets name them at once:

  1. DevA, DevB – two developer’s boxes;
  2. GitServer – Windows Server 2008 R2 box with IIS – hosts central git repository, exposes git repositories over HTTP and has scheduler task to synchronize central git repository with TFS;
  3. TfsServer – remote host with TFS Server 2010 (it is our corporate beast and I don’t have administrative access there);

I will assume you already installed git and git-tfs to your machines. So, let’s start.

 

1. Cloning TFS repository first time

First, you have to clone repository from TFS somewhere. On your DevA box open bash console, go to the folder where you want sources reside and execute this:

git tfs clone -d http://TfsServer:8080/tfs $/ProjectA/Src proj-git

or maybe, if you don’t interested in full history and want to have just last TFS sources state:

git tfs quick-clone -d http://TfsServer:8080/tfs $/ProjectA/Src proj-git

It may take up to several hours if you have large repository and slow connection. If everything is fine – folder proj-git will appear there. Inside are your working copy with latest TFS version and folder .git with configuration.

 

2. Setting up GitServer and auto-update from TFS

git-tfs doesn’t work with bare repositories (at least it was so at the time I followed its development), thus we will need two repositories for each TFS one: first will be bare, and accessible through HTTP; and second one would be private – there we will setup scheduled fetching from TFS server, each time new changes are found in TFS – we will push them to first repository. To distinguish these two I will refer to first as Central and second as Working.

  1. Install IIS server role
  2. Install GitWebAccess. As I wrote before last version seems broken, so you can download my working version. Unpack it to C:\inetpub\wwwroot\gwa, and follow this manual. I will suppose you created folder C:\GitRepos for your git repos as suggested in this manual.
  3. Create new folder for Working repo somewhere. For this example it would be C:\working
  4. Copy content of proj-git folder (where you’ve cloned TFS repository on 1st step before) to C:\working and execute
    git checkout -B master tfs/default
    

    thus master branch will be set at your last TFS commit

  5. Add C:\working\.gitignore file if it is not present with these lines:
    /update
    /log.txt
    

    If file is present – just add such lines there. update is the script we will write next and log.txt… well, it is log of updating. As updating will be executed by scheduler – it is the only way to know what’s happening.

  6. Now we will need bare repository in the C:\GitRepos\Project.git. Execute from C:\GitRepos these command:
    git clone --bare /c/working Project.git
    git remote rm origin # we don't need origin here
    

    If it was done correctly – you should have your repository accessible by HTTP through GitWebAccess.

  7. Now it’s the time to setup update in Working repository. The convention I embrace is that master branch of my repository will always follow TFS repository line. Thus, no one except update script should push it directly and it will never have changes conflicting with TFS data.
    You need to setup origin remote in the Working repo in order for script below to work:

    git remote add origin /c/GitRepos/Project.git
    

    Next, create file C:\Working\update and put these lines to it:

    #!/bin/sh
    
    check_err()
    {
    	# parameter 1 is last exit code
    	# parameter 2 is error message that should be shown if error code is not 0
    	if [ "${1}" -ne "0" ]; then
    		cat '~temp.log'
    		echo ${2}
    		rm -f '~temp.log' > /dev/null
    		exit ${1}
    	fi;
    	rm -f '~temp.log' > /dev/null
    }
    
    echo "$(date) Update launched"
    
    if [ ! -z "$(git status --porcelain)" ]; then
    	echo "$(date) Your status is not clean, can\'t update"
    	exit 1;
    fi;
    
    echo "$(date) Pulling from central repo first to avoid redundant round-trips to TFS..."
    git pull origin master:master > '~temp.log'
    check_err $? "Pulling from central repo failed"
    
    echo "$(date) Pulling from TFS..."
    git tfs pull -d > '~temp.log'
    check_err $? "Pulling from TFS resulted in error";
    
    local_commits_to_push="$(git rev-list master ^origin/master)"
    if [ -z "$local_commits_to_push" ]; then
    	echo "$(date) Central repo is up-to-date, nothing to push"
    else
    	echo "$(date) Pushing updates to central repo"
    	git push origin master > '~temp.log'
    	check_err $? "Push to central resulted in error";
    fi;
    

    You may check it – just execute update from bash console. Your latest updates should be fetched from TFS. In case there were some changes – they should be also pushed to Central‘s master branch.
    It is simplified version of the script from this post. Also there you can find how to create scheduler task to execute this update script every 5 minutes – do it now.

 

3. Create rcheckin and reap the benefits

Now, back to the developer’s environment. We want to use that auto-updating central repository effective, don’t we? So:

  • We don’t need to use git tfs fetch at all anymore: we may limit ourselves to git fetch because latest code is always in the Central git repository.
  • When git tfs rcheckin is executed – it fetches latest changes from TFS. We need to reuse as much as possible from already fetched changes. So, we need to fetch manually from Central before executing git tfs rcheckin and somehow make git-tfs mark it as correct TFS-fetched commits.
  • And finally, after we checkined something to TFS – we want to push it to Central, so auto-updater won’t fetch what we already have – it could have significant performance impact in case of large commits, or long line of commits with slow connection. In such case we may even want to disable auto-updater for some time until we push things to Central and then just re-enable it.

So, in order to achieve all these points – instead of using git tfs rcheckin that doesn’t know anything about our environment – we will use custom script rcheckin (do not forget to add it to .gitignore if needed):

#!/bin/sh

check_err()
{
	# parameter 1 is last exit code
	# parameter 2 is error message that should be shown if error code is not 0
	if [ "${1}" -ne "0" ]; then
		cat '~temp.log'
		echo ${2}
		rm -f '~temp.log' > /dev/null
		exit ${1}
	fi;
	rm -f '~temp.log' > /dev/null
}

if [ ! -z "$(git status --porcelain)" ]; then
	echo "Your status is not clean, can't continue"
	exit 1;
fi;

echo "Fetching origin..."
git fetch origin

if [ -n "`git rev-list HEAD..origin/master`" ]; then
	echo "origin/master has some TFS checkins that are conflicting with your branch. Please reabse first, then try to rcheckin again"
	exit 1;
fi;

echo "Marking latest TFS commit with bootstrap..."
git tfs bootstrap -d

git tfs rcheckin
check_err $? "rcheckin exited with error"

git push origin HEAD:master
check_err $? "Can't push HEAD to origin"

Well, that’s almost all.

 

4. Setup another developer environment

So, suppose DevB also needs to work with your server. Here is how he can start his work easily:

  1. git clone http://GitServer:8080/project.git project
    cd project
    git tfs bootstrap -d # make git-tfs mark what is required
    
  2. Get rcheckin script from previous section.
  3. Work!

 

5. My common workflow

So, I’ll try to describe how I work in general, supposing I’m in my dev repository, set up as described above.

  1. First, let’s check that no changes appeared while I was writing all of these: git fetch origin. OK, no changes.
  2. Working, making changes hard… committing several times, squashing, rewriting etc.
  3. Ready to push to TFS? Great! Let’s check again – maybe someone pushed something while I have been working: git fetch origin. Yeah, there’re some changes, origin/master has updates. Let’s rebase my work there (I don’t like merging my local work which is not seen by anyone except me – it just clutters commit graph): git rebase origin/master, maybe some conflict resolving. Hey, I’m ready to push: rcheckin.
  4. Yet some work.
  5. Working time is over and I have not finished my current task. I want it backed up on server, so my local machine failure can’t waste my work: git push origin HEAD:idanilov/temp. idanilov is my ‘private’ branch namespace, every developer has one. By convenience it is just backups and it is not considered published. For open-source projects you probably won’t do this, but corporative rules sometimes require some assurance that you won’t lost your work that is already paid for.

Sometimes I may switch branches, stash changes and do other git stuff, but it is not so frequent. Most often all of my workflow is fetch-rebase-rcheckin chain.

I wanted to explain yet some things – e.g., how to have a bunch of bash scripts in a way that it doesn’t get to TFS (it’s easy, just use .gitignore) and have a way to synchronize them across all developers using git (it’s much harder if you don’t want to bother running to every developer, copying new versions of scripts each time you want to correct one line in some script). Or how to automate git console, so you don’t need to type long commands every time which is pretty boring. But… this post is already pretty long, so I will do this later.

Have fun moving away from TFS with git and git-tfs! 😉

17/08/2012

Implicit closures in C# lambdas, part 2

Filed under: c# — Ivan Danilov @ 17:59

Wow. I’ve just found that not all I said in first part is true (I’ve already marked wrong place so readers would not be confused). Shame on me.

My mistake was to rely on ReSharper’s warnings totally. So, in the ending of last post I had changed code to make copy of variable for closure like that:

        public static void DoSomething(object a, object b, Observer observer, Receiver r)
        {
            var receiverCopy = r;
            EventHandler action1 = (s, e) => receiverCopy.Method(a);
            EventHandler action2 = (s, e) => r.Method(b);
            observer.X += action1;
            observer.X += action2;
        }

And assumed that it is enough to convince compiler to capture variables for two lambdas in two different compiler-generated classes – based on the fact that ReSharper warning is gone. It turned out (by experiencing very real memory leak in unit-test first and then looking at generated IL to understand) absence of warning means that ReSharper is just unable to find such implicit closure, not that implicit closure is gone. Compiler is still generates single class for all these closures. And at this time it is probably an optimization from compiler’s side – namely, try to decrease number of types in the assembly by generating one less.

What I ended up with is hand-writing explicit class very similar to that in previous post. This solved problem with memory leak, and looking at IL proved that it helped. But code is now ugly. And worse – I don’t know how to spot such problems easily without thorough analysis of the code.

UPD: Pretty much the same thing in another blog post.

UPD2: As Eric answered at SO, this behavior was implemented because it was easier (for compiler developers) and part about lambda rewriting was already pretty complicated at the time. And, hopefully with Roslyn we may have better.

Also, good alternative was offered as well.

Implicit closures in C# lambdas, part 1

Filed under: c# — Ivan Danilov @ 16:38

In my day-to day work I was alerted once by R# warning saying ‘Implicitly captured closure: variableName’. After checking actual IL generated by compiler I found that R# warning is pretty much correct. So I did some ground-work and wrote small sample demonstrating this:

    public class Observer
    {
        public event EventHandler X = delegate { };
    }

    public class Receiver
    {
        public void Method(object o) {}
    }

    public class Program
    {
        public static void DoSomething(object a, object b, Observer observer, Receiver r)
        {
            EventHandler action1 = (s, e) => r.Method(a); //Implicitly captured closure: b
            EventHandler action2 = (s, e) => r.Method(b); //Implicitly captured closure: a
            observer.X += action1;
            observer.X += action2;
        }

        public static void Main(string[] args)
        {
            var observer = new Observer();
            var receiver = new Receiver();
            DoSomething(new object(), new object(), observer, receiver);
        }
    }

It seems the cause is simple: when several lambdas are sharing variables-need-to-be-captured (in this case r is present in both) – compiler makes only one class with unspeakable name that contains both methods and all of their variables.

Below is generated IL code. It is there just for completeness, because below I will explain these in terms of equivalent C#, so you don’t need to understand IL to get the idea. It is fairly long, so you most probably don’t want to see it all.

.class public auto ansi beforefieldinit ConsoleApplication1.Program
	extends [mscorlib]System.Object
{
	// Nested Types
	.class nested private auto ansi sealed beforefieldinit '<>c__DisplayClass2'
		extends [mscorlib]System.Object
	{
		.custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = (
			01 00 00 00
		)
		// Fields
		.field public object a
		.field public object b
		.field public class ConsoleApplication1.Receiver r

		// Methods
		.method public hidebysig specialname rtspecialname 
			instance void .ctor () cil managed 
		{
			// Method begins at RVA 0x2104
			// Code size 7 (0x7)
			.maxstack 8

			IL_0000: ldarg.0
			IL_0001: call instance void [mscorlib]System.Object::.ctor()
			IL_0006: ret
		} // end of method '<>c__DisplayClass2'::.ctor

		.method public hidebysig 
			instance void '<DoSomething>b__0' (
				object s,
				class [mscorlib]System.EventArgs e
			) cil managed 
		{
			// Method begins at RVA 0x210c
			// Code size 19 (0x13)
			.maxstack 8

			IL_0000: ldarg.0
			IL_0001: ldfld class ConsoleApplication1.Receiver ConsoleApplication1.Program/'<>c__DisplayClass2'::r
			IL_0006: ldarg.0
			IL_0007: ldfld object ConsoleApplication1.Program/'<>c__DisplayClass2'::a
			IL_000c: callvirt instance void ConsoleApplication1.Receiver::Method(object)
			IL_0011: nop
			IL_0012: ret
		} // end of method '<>c__DisplayClass2'::'<DoSomething>b__0'

		.method public hidebysig 
			instance void '<DoSomething>b__1' (
				object s,
				class [mscorlib]System.EventArgs e
			) cil managed 
		{
			// Method begins at RVA 0x2120
			// Code size 19 (0x13)
			.maxstack 8

			IL_0000: ldarg.0
			IL_0001: ldfld class ConsoleApplication1.Receiver ConsoleApplication1.Program/'<>c__DisplayClass2'::r
			IL_0006: ldarg.0
			IL_0007: ldfld object ConsoleApplication1.Program/'<>c__DisplayClass2'::b
			IL_000c: callvirt instance void ConsoleApplication1.Receiver::Method(object)
			IL_0011: nop
			IL_0012: ret
		} // end of method '<>c__DisplayClass2'::'<DoSomething>b__1'

	} // end of class <>c__DisplayClass2


	// Methods
	.method public hidebysig static 
		void DoSomething (
			object a,
			object b,
			class ConsoleApplication1.Observer observer,
			class ConsoleApplication1.Receiver r
		) cil managed 
	{
		// Method begins at RVA 0x2134
		// Code size 72 (0x48)
		.maxstack 2
		.locals init (
			[0] class [mscorlib]System.EventHandler action1,
			[1] class [mscorlib]System.EventHandler action2,
			[2] class ConsoleApplication1.Program/'<>c__DisplayClass2' 'CS$<>8__locals3'
		)

		IL_0000: newobj instance void ConsoleApplication1.Program/'<>c__DisplayClass2'::.ctor()
		IL_0005: stloc.2
		IL_0006: ldloc.2
		IL_0007: ldarg.0
		IL_0008: stfld object ConsoleApplication1.Program/'<>c__DisplayClass2'::a
		IL_000d: ldloc.2
		IL_000e: ldarg.1
		IL_000f: stfld object ConsoleApplication1.Program/'<>c__DisplayClass2'::b
		IL_0014: ldloc.2
		IL_0015: ldarg.3
		IL_0016: stfld class ConsoleApplication1.Receiver ConsoleApplication1.Program/'<>c__DisplayClass2'::r
		IL_001b: nop
		IL_001c: ldloc.2
		IL_001d: ldftn instance void ConsoleApplication1.Program/'<>c__DisplayClass2'::'<DoSomething>b__0'(object, class [mscorlib]System.EventArgs)
		IL_0023: newobj instance void [mscorlib]System.EventHandler::.ctor(object, native int)
		IL_0028: stloc.0
		IL_0029: ldloc.2
		IL_002a: ldftn instance void ConsoleApplication1.Program/'<>c__DisplayClass2'::'<DoSomething>b__1'(object, class [mscorlib]System.EventArgs)
		IL_0030: newobj instance void [mscorlib]System.EventHandler::.ctor(object, native int)
		IL_0035: stloc.1
		IL_0036: ldarg.2
		IL_0037: ldloc.0
		IL_0038: callvirt instance void ConsoleApplication1.Observer::add_X(class [mscorlib]System.EventHandler)
		IL_003d: nop
		IL_003e: ldarg.2
		IL_003f: ldloc.1
		IL_0040: callvirt instance void ConsoleApplication1.Observer::add_X(class [mscorlib]System.EventHandler)
		IL_0045: nop
		IL_0046: nop
		IL_0047: ret
	} // end of method Program::DoSomething

	.method public hidebysig static 
		void Main (
			string[] args
		) cil managed 
	{
		// Method begins at RVA 0x2188
		// Code size 32 (0x20)
		.maxstack 4
		.entrypoint
		.locals init (
			[0] class ConsoleApplication1.Observer observer,
			[1] class ConsoleApplication1.Receiver receiver
		)

		IL_0000: nop
		IL_0001: newobj instance void ConsoleApplication1.Observer::.ctor()
		IL_0006: stloc.0
		IL_0007: newobj instance void ConsoleApplication1.Receiver::.ctor()
		IL_000c: stloc.1
		IL_000d: newobj instance void [mscorlib]System.Object::.ctor()
		IL_0012: newobj instance void [mscorlib]System.Object::.ctor()
		IL_0017: ldloc.0
		IL_0018: ldloc.1
		IL_0019: call void ConsoleApplication1.Program::DoSomething(object, object, class ConsoleApplication1.Observer, class ConsoleApplication1.Receiver)
		IL_001e: nop
		IL_001f: ret
	} // end of method Program::Main

	.method public hidebysig specialname rtspecialname 
		instance void .ctor () cil managed 
	{
		// Method begins at RVA 0x21b4
		// Code size 7 (0x7)
		.maxstack 8

		IL_0000: ldarg.0
		IL_0001: call instance void [mscorlib]System.Object::.ctor()
		IL_0006: ret
	} // end of method Program::.ctor

} // end of class ConsoleApplication1.Program

So, basically, what C# compiler did overall is something like this:

        // actually named <>c__DisplayClass2 in IL
        private class CompilerGeneratedClass 
        {
            public object a;
            public object b;
            public Receiver r;

            // actually named <DoSomething>b__0 in IL
            public void Method1(object s, EventArgs e)
            {
                 r.Method(a);
            }
            // actually named <DoSomething>b__1 in IL
            public void Method2(object s, EventArgs e)
            {
                r.Method(b);
            }
        }

        public static void DoSomething(object a, object b, Observer observer, Receiver r)
        {
            // actually named CS$<>8__locals3 in IL
            CompilerGeneratedClass helper = new CompilerGeneratedClass();
            helper.a = a;
            helper.b = b;
            helper.r = r;
            EventHandler action1 = helper.Method1;
            EventHandler action2 = helper.Method2;
            observer.X += action1;
            observer.X += action2;
        }

So, you see, no lambdas here. Compiler replaced every variable used in closure with nested class’ variable. That’s fine, but “what about subject and these implicit closures?” you may ask. Well, that’s simple from that point. In the original code you have two unconnected lambdas and might assume that as long as first one along with a object are not used anymore – they will be GCed regardless of current state of second lambda, right? But now it is clear that as long as CompilerGeneratedClass is alive – both a and b will live. And generated class will live at least as long as someone needs any of these two lambdas.

Essentially, we can count it as both lambdas have taken closure around both a and b variables. Thus ReSharper’s warning.

What problems can it bring to you as a developer? Suppose a is cheap object and action1 is passed to long-living service class; meanwhile b is costly object and action2 is used here once and forgotten. One assumes that latter one will be called and GCed along with costly b object almost immediately. But it will live as long as service would hold former lambda causing hard-to-find memory leak.

Why compiler can’t do better? I can’t say for certainty, so these are only my speculations on the subject. Nevertheless. Why compiler can’t create two generated classes so that one would capture r and a, while second would capture r and b? It could, and in this simple case it will be ok. But in more complicated cases it’ll lead to problems. These problems would be more severe that implicit captures. For one example, suppose method with lambdas is not only captures essentially read-only arguments, but also writes to them. How would you capture variable that has writes to it to two different generated classes? And even if you manage to do that – how would you synchronize these two values? And what about multi-threaded code? You see, a bunch of hard questions arises the very moment you allow one variable to be ‘distributed’ between several generated classes.

So, what can I do to avoid leaks and problems? Well, ReSharper has pretty good eye for these things, so watch out for its warnings. And if one arises – try to solve it. [The following advice is not correct! See part 2 for details] General solution is to copy captured variable to another variable manually – it is what compiler can’t do safely for you because of reasons above, but you know context better and in most cases can do this easily. For example, for my code it would be this:

        public static void DoSomething(object a, object b, Observer observer, Receiver r)
        {
            var receiverCopy = r;
            EventHandler action1 = (s, e) => receiverCopy.Method(a);
            EventHandler action2 = (s, e) => r.Method(b);
            observer.X += action1;
            observer.X += action2;
        }

That’s it. In some cases, though, you can eliminate some captures without much efforts at all. For example, sender in events references what you need – use it, not captures if it helps!

16/08/2012

T4 and FxCop: ignoring auto-generated code checks

Filed under: t4, tools — Tags: — Ivan Danilov @ 19:39

Just a quick note: if you write some T4 templates and do not want FxCop to complain about Hungarian notation, underscores in the names and other stuff – be sure to mark your generated classes with something like that

[global::System.CodeDom.Compiler.GeneratedCode("MyGeneratorTool", "1.0.0.0")]

Git-Web-Access: access git through IIS WebApp

Filed under: git — Ivan Danilov @ 05:00

I’m using subject to host my repositories and in general it works well. It was the only really working git HTTP server on windows at the time (at least I hadn’t found others then).

Today I was installing GWA on another server and was suddenly struck with errors. After quick check I found that binaries from codeplex are not correspond to latest sources (namely, ILSpy shown that GitHandler.cs was without line context.Response.Charset = null; which is critical for working with JGit-based clients including TeamCity or Eclipse’s egit plugin).

And after I downloaded sources and built them… well, it wasn’t working at all:

$ git fetch
fatal: protocol error: bad line length character:
00a

Same it TeamCity. So, I reverted to my own last commit and rebuilt. At that point I got working tool, at last.

It seems GWA is abandoned – undeservedly so, IMO. In any case – if you want Git HTTP Server on Windows – either build it from working sources or download here.

Weakly referenced Decorator: meet the ephemerons in .NET

Filed under: patterns — Ivan Danilov @ 01:07

Sometimes we have long-living class that should notify other classes with potentially short lifetime about some event. In such case WeakReference is the obvious solution to prevent memory leaks. Well, something like that, supposing long-lived class named Service and short-living is Client (for simplicity sake I’ve limited Service‘s handling abilities to just one client):

    public interface IClient {
        void Notify();
    }

    public class Service {
        private WeakReference _clientRef;

        public void AddClient(IClient client) {
            _clientRef = new WeakReference(client);
        }

        public void NotifyClientIfAlive() {
            var client = (IClient)_clientRef.Target;
            if (client != null) client.Notify();
        }
    }

    public class Client : IClient {
        public void Notify() { /* not important */ }
    }

So far, so good. You’ve wrote such code and it’s working beautifully. But now yet one task comes: suppose it is logging, or something similar and you decide to implement that via the Decorator pattern. Well, it is very handy we have that IClient interface, isn’t it? Just a piece of cake:

    public class ClientDecorator : IClient {
        private readonly IClient _realClient;

        public ClientDecorator(IClient realClient) {
            _realClient = realClient;
        }

        public void Notify() {
            // do what is needed
            _realClient.Notify();
        }
    }

Now we can just pass instance of ClientDecorator to the Service and be off.
Well, not really. If you will construct IClient just from the start with decorator and work with decorated instance all the time – it is ok.
But what if you want to implement decoration of calls from this service only? Or, you receive already-made instance of IClient from somewhere. Then you have a problem: if you wrap your instance and ‘forget’ strong reference to newly created decorator – it would be garbage collected at the very next GC pass. Moreover, it will break not only your logging, but the message will never reach real client in the first place. Let me illustrate the problem with small unit test:

        [Test]
        public void DecoratorIsNotGCedWhenRefsToRealClientExist() {
            var client = new Client();
            var decorator = new ClientDecorator(client);
            var service = new Service();
            service.AddClient(decorator);

            // create weak refs to watch if these are alive
            var weakClient = new WeakReference(client);
            var weakDecorator = new WeakReference(decorator);

            // nullify strong ref to decorator, but leave to client
            decorator = null;

            GC.Collect();
            GC.WaitForPendingFinalizers();
            GC.Collect();

            Assert.IsTrue(weakClient.IsAlive);
            Assert.IsTrue(weakDecorator.IsAlive);
        }

Oops. Red test. Not good. Well, of course, we could have made Service‘s refs strong, but than both client and decorator will live at least as long as the Service, i.e. we will introduce memory leak this way. Let’s write another unit test and requirement:

        [Test]
        public void ClientDecoratorIsGCedWhenNoClientRefsAreAlive() {
            var client = new Client();
            var decorator = new ClientDecorator(client);
            var service = new Service();
            service.AddClient(decorator);

            // create weak refs to watch if these are alive
            var weakClient = new WeakReference(client);
            var weakDecorator = new WeakReference(decorator);

            // nullify strong refererences
            client = null;
            decorator = null;

            GC.Collect();
            GC.WaitForPendingFinalizers();
            GC.Collect();

            Assert.IsFalse(weakClient.IsAlive);
            Assert.IsFalse(weakDecorator.IsAlive);
        }

It is almost the same except that both strong refs to decorator and client dying, and after GC – they should be swept, simulating client death when service is still alive and demonstrating that no memory leaks is present.

Another option is to make service aware of decorator’s presence: then service can have strong ref to decorator and weak – to real client, decorator will have weak reference to client and when notification is there – service can check real client and remove decorator as well. It is complex even to read, not only to understand. And totally defeats purpose of decorator. And if you will have to add two, three or four decorators to the single client… oops, you have a real problem in this case with such an approach.

So, can we do better? Yes, we can. What we need is to invent a way to keep decorator’s instance alive not as long as someone have strong refs to itself, but as long as someone have strong refs to other instance, namely to real client instance. That task is precise description of ephemeron. Since .NET 4.0 they are supported by CLR in the form of ConditionalWeakTable<TKey, TValue> class.

Here is the code:

    public class ClientDecorator : IClient {
        private static readonly ConditionalWeakTable<IClient, ClientDecorator> _keepaliveMap = new ConditionalWeakTable<IClient, ClientDecorator>();
        private readonly IClient _realClient;

        public ClientDecorator(IClient realClient) {
            _realClient = realClient;
            _keepaliveMap.Add(realClient, this);
        }

        public void Notify() {
            // do what is needed
            _realClient.Notify();
        }
    }

Basically adding to _keepaliveMap is equivalent to making this instance non-eligible for GC while realClient is not going to be GCed.

Both tests are green, hooray! But we can do even better, we can move out required logic to base class and enable any decorator to use it without code duplication, like that:

    public class WeaklyReferencedDecorator<T> where T : class {
        private static readonly ConditionalWeakTable<T, WeaklyReferencedDecorator<T>> _keepaliveMap = new ConditionalWeakTable<T, WeaklyReferencedDecorator<T>>();
        protected readonly T Instance;

        public WeaklyReferencedDecorator(T instance) {
            Instance = instance;
            _keepaliveMap.Add(instance, this);
        }
    }

    public class ClientDecorator : WeaklyReferencedDecorator<IClient>, IClient {
        public ClientDecorator(IClient realClient) : base(realClient) {}

        public void Notify() {
            // do what is needed
            Instance.Notify();
        }
    }

It seems, with base class amount of code… huh, LOCs number even decreased as we have Instance variable on base class now. Looks good, isn’t it?

Just to be sure everything is OK, lets write tests to simulate several stacking decorators one on top of another:

        [Test]
        public void StackingTwoDecoratorsGCedCorrectly() {
            var client = new Client();
            var service = new Service();
            var clientDecorator1 = new ClientDecorator(client);
            var clientDecorator2 = new ClientDecorator(clientDecorator1);
            service.AddClient(clientDecorator2);

            var weakDecorator1 = new WeakReference(clientDecorator1);
            var weakDecorator2 = new WeakReference(clientDecorator1);
            var weakClient = new WeakReference(client);
            clientDecorator1 = clientDecorator2 = null;
            client = null;

            GC.Collect();
            GC.WaitForPendingFinalizers();
            GC.Collect();

            Assert.IsFalse(weakClient.IsAlive);
            Assert.IsFalse(weakDecorator1.IsAlive);
            Assert.IsFalse(weakDecorator2.IsAlive);
        }

        [Test]
        public void StackingTwoDecoratorsNotGCedIfRefToRealClientExist() {
            var client = new Client();
            var service = new Service();
            var clientDecorator1 = new ClientDecorator(client);
            var clientDecorator2 = new ClientDecorator(clientDecorator1);
            service.AddClient(clientDecorator2);

            var weakDecorator1 = new WeakReference(clientDecorator1);
            var weakDecorator2 = new WeakReference(clientDecorator1);
            var weakClient = new WeakReference(client);
            clientDecorator1 = clientDecorator2 = null;

            GC.Collect();
            GC.WaitForPendingFinalizers();
            GC.Collect();

            Assert.IsTrue(weakClient.IsAlive);
            Assert.IsTrue(weakDecorator1.IsAlive);
            Assert.IsTrue(weakDecorator2.IsAlive);
        }

All of the tests pass, so it seems we’re done.

And a last word about where it could be possibly useful except mentioned scenario with some notifying service (which will probably have a collection of WeakReference’s is real world). Here is my answer to related question. In short, with addition of collectible assemblies for dynamic type generation in .NET 4.0, you can’t safely cache reflection data in something alike Dictionary<Type, MyReflectionResult> because doing so will prevent GC from collecting such dynamic types even if no one needs them anymore because your cache will keep them alive. Bad thing, especially if you’re writing some general-purpose lib or framework.

Hope this post was interesting to someone 🙂

17/04/2012

Binding to WPF TextBox seems to be broken in .NET 4.5 Beta

Filed under: wpf — Ivan Danilov @ 15:18

Recently I found that update from .NET 4.0 to .NET 4.5 Beta brings one nasty bug with it to the WPF. As Scott mentioned, .NET 4.5 is in-place upgrade to .NET 4.0, thus now I experience the same problem in any app targeted at .NET 4.0 as well.

OK, let’s go closer to the actual problem.

Create new WPF application, in MainWindow.xaml put the following:

<Window x:Class="WpfApplication1.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        Title="MainWindow" Height="350" Width="525"
        DataContext="{Binding RelativeSource={RelativeSource Self}}">
  <Grid>
    <TextBox Text="{Binding Path=Prop, UpdateSourceTrigger=PropertyChanged}" />
  </Grid>
</Window>

And here is MainWindow.xaml.cs:

using System.Windows.Controls;

namespace WpfApplication1
{
    /// <summary>
    /// Interaction logic for MainWindow.xaml
    /// </summary>
    public partial class MainWindow : Window
    {
        public MainWindow()
        {
            InitializeComponent();
        }

        public decimal Prop { get; set; }
    }
}

That’s all. Now, run the application. You will have TextBox stretched to the entire window with ‘0’ inside. Try to enter decimal dot (or whatever you have on your current culture for separating integer part from fractional). Nothing happens.

Now go to project properties and switch target framework to .NET 3.5 (you have to delete some references that do not exist in that framework, but I didn’t use them here anyway, so it is OK). Run the app – everything will work fine, you can enter decimal dot.

I suppose the problem is somewhere in property changing/coercion mechanism: WPF tries to parse value ‘0.’ as decimal, it succeeds and TextBox’s contents changed back to ‘0’ which is decimal value converted to string. It is also indirectly proven by the fact that if you have ‘0’ and press ‘1’ – you will have ‘1’ instead of ’01’ which one could expect. Similarly, you can’t put ‘+’ sign before number, but can put ‘-‘ sign before non-zero number. And you can’t put any sign before zero. And even more vivid proof – if you have any invalid value (i.e. that can’t be parsed as decimal) – you can do any edits you like and they are handled correctly.

Here is feedback at MS connect site about the problem.

UPD: At connect MS replied that 4.0 behavior was actually a bug and they fixed it in 4.5. In fact the fix is worse from usability point than bug was as it seems to me… see the discussion at MS Connect site (link above) for details.

27/02/2012

Weak delegates for non-standard event handlers (e.g. WPF)

Filed under: wpf — Ivan Danilov @ 14:04

About Weak delegates there’s many articles lurking around the Internet. The most beautiful IMO is this one. So, I assume that you already read it and won’t repeat that info.

So, some time ago (pretty much time, actually, just hadn’t time to write earlier), inspired by the beauty of the article I happily copied MakeWeak extension method to our codebase… just to find out that the most frequently used WPF event is declared as

event PropertyChangedEventHandler PropertyChanged;

public delegate void PropertyChangedEventHandler(object sender,
                                                 PropertyChangedEventArgs e);

Remember Dustin’s complaints about MS not following their own recommendations? That’s it. Problem. Why is it not declared as that?

event EventHandler<PropertyChangedEventArgs> PropertyChanged;

Who knows? I’m interested too. Can’t see any problems and one type to load less for CLR. But anyway, if I want to work with WPF – I should invent something about the problem.

And so the saga begins… oops, it is not from this story, sorry 🙂

So, lets define first what our weak events MUST HAVE (i.e. without what they wouldn’t be useful at all). For me these requirements include at least:

  • It should be weak and should be working! (surprise, eh?)
  • I shouldn’t specify type parameters explicitly, they should be fully inferred from usage;
  • I want static check that actual unsubscription is correctly typed and I don’t unsubscribe delegate of wrong type (so, basically I want be at least minimally defended against InvalidCaseException at runtime);
  • Performance should be at least the same order as with normal delegates, i.e. no hardcore reflection on every event invocation.

The very first problem is to obtain original event actually. Originally MakeWeak defined as

public static EventHandler<TEventArgs> MakeWeak<TEventArgs>(
        this EventHandler<TEventArgs> eventHandler,
        UnregisterCallback<TEventArgs> unregister)
    where TEventArgs : EventArgs

See the problem? With non-standard delegates we don’t have EventHandler<TArgs>. And we have to save that delegate later after ‘casting’ it to OpenEventHandler in the WeakEventHandler class – there we need MethodInfo instance that originally obtained via eventHandler.Method property. So the idea – lets parametrize this method with one yet type parameter, like that:

public static EventHandler<TEventArgs> MyMakeWeak<THandler, TEventArgs>(
        this EventHandler<TEventArgs> eventHandler,
        UnregisterCallback<TEventArgs> unregister)
    where THandler : Delegate
    where TEventArgs : EventArgs

So we pass eventHandler and Delegate has Method property as well, so it should be fine… nope, it’s not. If you try to compile that you have “Constraint cannot be special class ‘System.Delegate'” from compiler. Eric Lippert said that there’s no technical reason to not support such syntax, but it is just not needed often, so it is pretty low on priority list. No luck! I have to thought out something else.

OK, let’s sacrifice type safety here and make extension method accept any type T:

TEventHandler MyMakeWeakTEventHandler>(this TEventHandler eventHandler, 
                                       Action<TEventHandler> unregister)

It is not a big deal and doesn’t break our requirement – remember, we will use this method only with that syntax:

obj.PropertyChanged += new PropertyChangedEventHandler(MyMethod)
    .MyMakeWeak(h => obj.PropertyChanged -= h);

In other words, if we try to pass not PropertyChangedEventHandler but some wrong class there, it will violate types at += operator because we’re returning exactly that type that was passed to our extension method.

So, what will we have then? We can rely on the fact that TEventHandler in reality is derived from Delegate class and can be cast to it. Yes, it is possible to misuse that method and get exception at runtime only, but it is hard to do unintentionally, so I’m fine with that.

The next problem: we don’t know argument type from the start. Before it was TEventArgs : EventArgs, but now we don’t have such thing. Well, if we can cast our param to Delegate – we can know list of its parameters via Delegate.Method.GetParameters()[index].ParameterType. Not very pretty and we don’t know concrete signature of the delegate – even number of parameters… Wait, we do know number of parameters almost always – fortunately, WPF team didn’t abandon guidelines totally and their delegates have normal void (object sender, TEventArgs e) signature everywhere. That’s something. Yep, again, it won’t work absolutely everywhere, but in worst case I can extend this approach in the same ugly way as Func can have various number of type arguments (no, I will not do this here because I don’t need this – at least I haven’t so far).

So, MyWeakEventHandler will have one generic type parameter more as it needs to know not only receiver type and event args type, but exact delegate type as well.

So far, so good. It seems everything is pretty much clear with MyMakeWeak method – at least if it will work in the end (remember first requirement?). And one last stroke: instead of strange MyMakeWeak I finally named it MakeWeakSpecial. While I could name it MakeWeak too as original one (which stays untouched) it would bring to scene C#’s overload resolution rules which are insanely complex and sometimes very confusing in the presence of generics. So I’ve decided to name it differently just to be sure what I’m using. Here’s what we have:

public static TEventHandler MakeWeakSpecial<TEventHandler>(this TEventHandler eventHandler, Action<TEventHandler> unregister)
{
    if (eventHandler == null)
        throw new ArgumentNullException("eventHandler");

    var ehDelegate = (Delegate) (object) eventHandler;
    var eventArgsType = ehDelegate.Method.GetParameters()[1].ParameterType;
    Type wehType = typeof(WeakEventHandlerSpecial<,,>)
        .MakeGenericType(ehDelegate.Method.DeclaringType, 
                         typeof(TEventHandler), 
                         eventArgsType);

    ConstructorInfo wehConstructor = wehType.GetConstructor(new[]
                                                            {
                                                                typeof(Delegate), 
                                                                typeof(Action<object>) 
                                                            });

    Debug.Assert(wehConstructor != null, "Something went wrong. There should be constructor with these types");

    var weh = (IWeakEventHandlerSpecial<TEventHandler>)wehConstructor
        .Invoke(new object[] 
                {
                    eventHandler, 
                    (Action<object>)(o => unregister((TEventHandler) o)) 
                });

    return weh.Handler;
}

Well, lets go to WeakEventHandlerSpecial<T, TEventHandler, TEventArgs>. The only problem we have there is that TEventHandler has no relation to TEventArgs. So, original concise _handler = Invoke; is not applicable, as _handler has type TEventHandler and Invoke is method group with (object, TEventArgs) signature. It is not hard problem: we just need to obtain current instance’s type, find its method "Invoke" and dynamically create delegate of type TEventHandler with Delegate.CreateDelegate. And what about performance, you could ask? Well, it is another costly operation. But we’re still in the code region that will be executed once for a subscription, NOT once per event occurring. So I deem it reasonable (Again, in the worst case if I need to optimize these at some point – nobody prevents me from some caching strategy or creating compiled Expression or any other mechanism – thus, from ‘once per subscription’ it will be ‘once per given three types’).

And so, finally, here you can get results (WeakDelegates.zip) with several unit tests: link.

And in case you don’t want to download something and want just a quick-look at what I have as result – you have this gist.

And regarding requirements:

1. It’s working. Unit tests are swearing that it is so.
2. While using I’m not forced to specify anything.
3. In the only real-world usage types are checked. Other cases are not, but it is ok.
4. There’s somewhat more reflection than it was in the original weak events, but only slightly.

I think my own requirements are pretty much fulfilled. 🙂

22/02/2012

Visual Studio usability gem – Perspectives extension

Filed under: VisualStudio — Ivan Danilov @ 03:31

Back in VS 6.0 days there was a feature to manage environments of the Studio: what tool windows you have opened, their layout, toolbar configuration and position etc. It was very handy because when you write code – you don’t need bunch of windows dedicated to debug process like threads, watches, call stacks and the like. After VS 6.0 this feature was silently buried. Instead what you have now is just two configurations of layout: one for debug and one for non-debug times. And VS switches them itself without your intervention.

While these two could be enough sometimes, it is not always the case apparently. In the multi-monitor days it becomes even more true. If you’re actively writing code and looking at connected parts of codebase constantly – you want as much space for your editor windows as possible. Even on several monitors at once. If you have MSDN or stackoverflow opened in your browser and you’re trying to figure out how could they be applied to your situation – you want probably to see browser on the second screen, don’t you? But VS’ toolwindow will readily go on top of the browser instantly as you activate IDE. Arghhh! What can you do? Close toolwindow or dock it somewhere else (just to return it back manually when you no longer need browser). And there are several such scenarios that are forcing to shuffle windows forth and back constantly.

Today I found Perspectives. Basically it is what VS 6.0 had a decade ago – layout manager. It is somewhat buggy, but still can apply and save layouts wonderfully. With that functionality I definitely can tolerate several minor UI issues.

« Newer PostsOlder Posts »

Blog at WordPress.com.