Friday, February 27, 2009

There is no more free lunch! The Moore's law is over.

Presentation on Parallel Extensions @ Quebec user group

My presentation on Parallel Extensions is announced on their web site.

http://www.cunq.org/Evenements+de+la+communaute/378.aspx

La présentation sera en français.

If we want more power we need to cross the processor barrier and do work in parallel. In June 2008, Microsoft released its second CTP of Parallel Extensions library. Come with me to see how easy it will be to make that leap of faith into the world of parallel processing. We will see how Task, concurrent collections, lazy initialization, parallel Linq and other tools can help us in this endeavour.

You can find more information about that library on my other posts:

http://blog.decarufel.net/2009/02/refactoring-single-threaded-code-to.html

http://blog.decarufel.net/2009/02/why-operator-is-not-thread-safe.html

http://blog.decarufel.net/2009/01/managed-parallel-computing-with.html

Monday, February 23, 2009

Extension methods series: Extension points

As mentioned earlier (the basics, managing the scope, use interfaces) it is important to manage the scope of your extension methods. One other way to do that is to use extension point concept. An extension point is itself an extension method which is only purpose is to transform your object into another type on which you have define plenty of extensions.

The following sample comes from an open source project called Umbrella.

Let’s first try to wrap this concept into an interface:

public interface IExtensionPoint
{
  object ExtendedValue { get; }
  Type ExtendedType { get; }
}

This interface define the basis of an extension point. “ExtendedValue” will hold the source object reference and “ExtendedType” the type of the extended object. Now here is it generic base implementation:

public class ExtensionPoint<T> : IExtensionPoint<T>
{
  private readonly Type type;
  private readonly T value;

  public ExtensionPoint(T value)
  {
      this.value = value;
  }

  public ExtensionPoint(Type type)
  {
      this.type = type;
  }

  #region IExtensionPoint<T> Members

  public T ExtendedValue
  {
      get { return value; }
  }

  object IExtensionPoint.ExtendedValue
  {
      get { return value; }
  }

  public Type ExtendedType
  {
      get { return type ?? (value == null ? typeof (T) : value.GetType()); }
  }

  #endregion
}

This generic class will be used as a base class for all extension points. It contains all the logic to store the value and some read-only properties to get information about it.

Let’s say we want to build some xml serialization extensions, we can start by creating our xml serialization extension point:

public class SerializationExtensionPoint<T> : ExtensionPoint<T>
{
  public SerializationExtensionPoint(T value)
      : base(value)
  {
  }

  public SerializationExtensionPoint(Type type)
      : base(type)
  {
  }
}

Like I said earlier, this class doesn’t do a lot. It’s purpose is only to convert an extension point of T into a serialization extension point of T. To use this we must have a converter extension method in scope:

public static class SerializationExtensions
{
  public static SerializationExtensionPoint<T> Serialize<T>(this T value)
  {
      return new SerializationExtensionPoint<T>(value);
  }
}

The “Serialization” extension method is called to get access to all other serialization extension methods.

Somewhere in you code you will have this method. This method can be applied to any type because it takes T as a source. The last step is to define an extension method on SerializationExtensionPoint:

public static string ToXml<T>(this SerializationExtensionPoint<T> extensionPoint)
{
  using (var stream = new MemoryStream())
  {
      Xml(extensionPoint, stream, extensionPoint.ExtendedValue);

      stream.Position = 0;
      StreamReader reader = new StreamReader(stream);

      return reader.ReadToEnd();
  }
}

This method will convert any object into XML. Look how easy it is to read this: “source serialize to xml”.

var source = new List<string>();
source.Add("Test1");
source.Add("Test2");
source.Add("Test3");
var xml = source.Serialize().ToXml();

You will get:

<?xml version="1.0" ?>
<ArrayOfString xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<string>Test1</string>
<string>Test2</string>
<string>Test3</string>
</ArrayOfString>

Like any API development, one good practice to follow is to start by writing how you will use it. For example, in the last sample we could have started by writing “source.Serialize().ToXml();” and then start implementing the whole thing to make it work. The result of this is improved readability and reusability. Two things that helps a lot when someone else (or even you) have to modify your code later.

Thursday, February 19, 2009

Extension methods series: Managing the scope

Ok, now that we know how to use extension methods we can go a bit further than that.

The biggest problem of extension methods is their scope. As soon as you have an extension class in scope all its methods are available. We saw previously how to use basic type and interfaces to reduce the scope. Another way to limit the scope is to use namespaces. You namespace must clearly state what type of extension it contains. Usually extensions will be grouped by cross cutting concerns like Collections, Equality, Reflection, Security, Validation, Threading, … and so on. So somehow your namespace should contain one of those words.

namespace Extensions.Validation
{
    public static class RangeValidation
    {
        public static bool IsInRange<T>(this T source, T minIncl, T maxIncl) 
			where T : IComparable<T>
        {
            return minIncl.CompareTo(source) >= 0 
				&& source.CompareTo(maxIncl) >= 0;
        }
    }
}

You can put all your extension methods in the same assembly but in different namespaces. This will help reduce the amount of extension methods you will see in intellisense.

Wednesday, February 18, 2009

Bloggerize you life

First of all I’d like to thanks Max for giving me the motivation to really start blogging again and more than ever.

I always thought the best way to understand something is to explain it to someone else. How often did I find solution to a problem just by walking someone through my code to show him what goes wrong. I think blogging is almost the same. When you try to explain something you have to try it for yourself. When you do that, you usually understand something you were just assuming before. So blogging is a good learning process.

That have been said, I hope you learn as much as I do because there’s always more to learn.

If I know only one thing is, I know just enough to know that I don’t know everything.

Tuesday, February 17, 2009

Extension methods series: use interfaces

In my previous post I talked about the basics of extension method. Obviously you can define an extension method on any type but be careful. Because extension method are public member of public class their scope is very wide. After a while you can end up so many added method to the base type that it would make the intellisense unusable.

One way to control the scope is to target the type or interface that will best suit the need. In fact it is usually better to use interface than concrete class whenever possible. Of course this will probably extend the number of types on which the extension apply but it will be based on its capabilities instead of its implementation. For example linq use this concept a lot by extending IEneumrable<T> interface. Take for example the “Where” extension method:

public static class Enumerable
{
	// ...
	public static IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
	{
		// ...
	}
	// ...
}
The power of this concept is the possibility to build a chain of simple method to resolve complex problem. For example look at this code:
var processes = System.Diagnostics.Process.GetProcesses();

var query = processes
    .Where(proc => proc.Modules.Count > 3)
    .OrderByDescending(proc => proc.VirtualMemorySize64)
    .Take(10)
    .Select(proc => proc);
Just by reading the code its easy to understand what it does. The order of instructions is important. It wouldn’t have the same meaning if lines 4 and and 5 were swapped.

All that to say that it is better to use interfaces as a source type for extension method. We can use extension method to transform a na interface into another one. For example the “ToLookup” extension method takes an “IEnumerable<TSource>” and convert it to an “ILookup<TKey, TSource>”. This transformation adds an indexer based on a key for any IEnumerable types.

var processes = Process.GetProcesses();

var idxProcesses = processes.ToLookup(proc => proc.Id);

using (var proc = idxProcesses[3119].SingleOrDefault())
{
    Console.WriteLine(proc.ProcessName);
}
For example in this sample I get the list of all running processes, promote the Id as the key and use it with an indexer to find the process number 3119.

As you can see extension methods provides endless possibilities if used carefully.

How to find Windows path

Once again Microsoft forgot something in the .NET Framework API. When it comes time to find the path of any special folder you can rely on “Environement.SpecialFolder” enumeration. Any? not quite. There is one missing: the Windows directory usually located at “C:\Windows”. See for yourself:

public enum SpecialFolder
{
    ApplicationData = 0x1a,
    CommonApplicationData = 0x23,
    CommonProgramFiles = 0x2b,
    Cookies = 0x21,
    Desktop = 0,
    DesktopDirectory = 0x10,
    Favorites = 6,
    History = 0x22,
    InternetCache = 0x20,
    LocalApplicationData = 0x1c,
    MyComputer = 0x11,
    MyDocuments = 5,
    MyMusic = 13,
    MyPictures = 0x27,
    Personal = 5,
    ProgramFiles = 0x26,
    Programs = 2,
    Recent = 8,
    SendTo = 9,
    StartMenu = 11,
    Startup = 7,
    System = 0x25,
    Templates = 0x15
}

Of course we can use environment variable “windir” to get it.

string windowsPath = Environment.GetEnvironmentVariable("windir");

But I don’t like to rely on hardcoded strings in my code. So looking with Reflector in the framework itself how Microsoft do that, I found:

[MethodImpl(MethodImplOptions.InternalCall)]
internal static extern string nativeGetWindowsDirectory();

static void Main(string[] args)
{
    string windowsPath = nativeGetWindowsDirectory();
	// ..
}

Hope this can help you avoid using stings in you code to get something as simple as a windows folder.

Extension methods series: the basics

One the new thing in C# is extension methods. Extension methods are static method that works pretty much like an helper function but instead of passing the instance to act on as an argument the instance is a prefix to the method. It looks exactly like if the method is part of the type itself.

Of course because C# is a statically typed language we are not really adding new method to a type. Here is the trick. The following example is a classic case of helper function:

if (String.IsNullOrEmpty(instance))
{
	// ...
}

The method “IsNullOrEmpty” is static a member of type “String”. To use it we must call it by its type and then pass it an instance of that type. Here is an easy way transform this helper function into an extension method.

public static class StringExtensions
{
	public static bool IsNullOrEmpty(this string instance)
	{
	    return String.IsNullOrEmpty(instance);
	}	
}

One of the requirement to make extension methods is the class must be static. Notice the “this” keyword used on the first argument of the method. This tell the compiler which type this method is extending and what type it should be.

Now if we use this code it will look like this:

if (instance.IsNullOrEmpty())
{
	// ...
}

At compile time the condition will be replaced by “StringExtension.IsNullOrEmpty(instance)”. So extension method is just a compiler trick to facilitate the uses of helper function. Beside the fact that it is easier to use it is also more readable. That is the starting point to fluent interfaces.

Monday, February 16, 2009

Thursday, February 12, 2009

How many layer does it take to abstract everything, and should we?

Architecture is about building a structure just strong enough to fulfill the requirement but not to much to keep the cost as low as possible. But in every project there is two kind of requirement: functional and non-functional. We think about functional requirement when they are explicitly express by the client.

I want a car that can take me from home to work and back.

The architect’s job is to make sure the car will be powerful enough to do the job, that it will be big enough to hold at least one person and to be able to carry enough fuel for the whole trip. If we stick just to those requirement we will end up with a single passenger car mostly made of plastic and with a gas tank of 1 L. Is this enough? That meets the initial requirements, but of course we have to take care of more than that. That’s were we have to think about non-functional requirements like security, standard, extensibility, maintainability, durability, usability and so on. Moreover, when a client ask for a car to go from home to work and back, he actually means he want to be able to go on vacation trip with it too, but you won’t know that at first.

To find a solution to all these requirements we have to do like we do when we face a complex problem; add abstraction layers. For a car that mean we have to respect some industry standard rules. For example we have make sure we can buy wheels, wipers, oil filters or cd player knowing it will fit on our car even if the combination of two specific model has never been tried before. As long as it respect standard like dimension it will work.

Is software development we have to set those standards for databases, components, UI interfaces, Logging, Error handling, and so on. We doo that usually with interfaces and abstract classes. Those abstraction will act as specifications anybody can use to build compatible pieces that will work on our system. If a car doesn’t expect to have more than 6 speakers it doesn’t make sense to build a cd player that can use 7 speakers. Likewise, is software, once we publish an interface, it doesn’t make sense to add other methods than the ones supported, they won’t be called anyway.

Now get back the initial question. How many layer does it take to abstract everything, and should we? If your are a purist you will try to add a layer of abstract between your code and all other code base you don’t have control over. But how far will you go? Is it a good thing to build your own control by deriving from the framework ones? Will you push the abstraction to the level of any part layer of your code can live independent of all others?

Remember what I said at the beginning? Just enough but not too much. So my answer is it depends but I like to leave me open doors so if later I want to split two layer it wont be too much difficult to do. Today with refactoring tools and test frameworks you can do big changes to your code without being too worry about breaking something.

Wednesday, February 11, 2009

Refactoring single threaded code to multithreaded and to Tasks (with Parallel Extension Framework)

Source Code

Introduction

Someone asked me to post a simple sample on how to build a multi threaded application. So I put together this blog post. Of course to maintain everything readable and easy to follow there’s no error handling and no validation.

Starting point

A-SingleThreadSimpleModel in the source code

For this sample I used a person as my model object. Someone could say that I should not use stupid objects but its more understandable this way.

So we have a person that could talk and walk. Obviously most of us can do these to things at the same time, but we must tell the computer how to do it. First we will start by a single threaded model to show you all the objects involve. Then we will refactor it a little to make it more manageable and then we will make it multi treaded using first thread an later tasks. Task is a new concept that will be introduced in Visual Studio 2010.

Let’s look at the Person class:

public class Person
{
    private int _position;
    private readonly List<string> _log = new List<string>();

    public void Say(string message)
    {
        var words = message.Split(' ');
        foreach (var word in words)
        {
            Thread.Sleep(400);
            SomethingSaid(this, new TalkEventArgs(word));
        }
        Log.Add(message); // not thread safe
    }

    public void Walk(int steps)
    {
        for (int i = 0; i < steps; i++)
        {
            Thread.Sleep(100);
            _position++; // not thread safe
            MovedForward(this, new WalkEventArgs(1));
        }
    }

    public int Position { get { return _position; } }

    public List<string> Log { get { return _log; } }

    public event EventHandler<TalkEventArgs> SomethingSaid = delegate { };
    public event EventHandler<WalkEventArgs> MovedForward = delegate { };
}

We have a very simple class here with two properties, two methods and two events. After initialising this class we can call Say(…) or Walk(…) like demonstrated in this console application:

public class Program
{
    static void Main(string[] args)
    {
        var p = new Person();
        p.SomethingSaid += (sender, e) => Console.WriteLine("I'm saying: {0}", e.Message);
        p.MovedForward += (sender, e) => Console.WriteLine("I walked: {0} steps", e.Steps);

        Stopwatch stopwatch = Stopwatch.StartNew();
        p.Walk(20);
        p.Say("Let me tell you something");

        stopwatch.Stop();
        Console.WriteLine("It took me {0} ms\nto walk {1} steps\nand say \"{2}\"",
            stopwatch.ElapsedMilliseconds,
            p.Position,
            String.Join("; ", p.Log.ToArray()));
    }
}

If you run this sample you should get something like this:

I walked: 1 steps
I walked: 1 steps
I walked: 1 steps
I walked: 1 steps
I walked: 1 steps
I walked: 1 steps
I walked: 1 steps
I walked: 1 steps
I walked: 1 steps
I walked: 1 steps
I walked: 1 steps
I walked: 1 steps
I walked: 1 steps
I walked: 1 steps
I walked: 1 steps
I walked: 1 steps
I walked: 1 steps
I walked: 1 steps
I walked: 1 steps
I walked: 1 steps
I'm saying: Let
I'm saying: me
I'm saying: tell
I'm saying: you
I'm saying: something
It took me 5511 ms
to walk 20 steps
and say "Let me tell you something"
Press any key to continue . . .

We can clealry see that only one task ca be done at any time. We either walk or talk but not both.

Refactoring

B-SingleThreadRefactored in the source code

Now let’s take this sample to the next level. In this refactoring phase we will:

  • Extract an interface
  • Build a base class
  • Add a little thread safety

First the interface:

public interface IPerson
{
    void Say(string message);
    void Walk(int steps);

    int Position { get; }

    IEnumerable<string> Log { get; }

    event EventHandler<TalkEventArgs> SomethingSaid;
    event EventHandler<WalkEventArgs> MovedForward;
}

This interface will then be implemented by a base class:

public abstract class PersonBase : IPerson
{
    private int _position;
    private readonly List<string> _log = new List<string>();

    #region Implementation of IPerson

    public abstract void Say(string message);

    public abstract void Walk(int steps);

    public int Position { get { return _position; } }

    public IEnumerable<string> Log { get { return _log; } }

    public event EventHandler<TalkEventArgs> SomethingSaid = delegate { };
    public event EventHandler<WalkEventArgs> MovedForward = delegate { };

    #endregion

    protected void Moving(int steps)
    {
        for (int i = 0; i < steps; i++)
        {
            Thread.Sleep(100);
            Interlocked.Increment(ref _position); // thread safe
            MovedForward(this, new WalkEventArgs(1));
        }
    }

    protected void Speaking(string message)
    {
        var words = message.Split(' ');
        foreach (var word in words)
        {
            Thread.Sleep(400);
            SomethingSaid(this, new TalkEventArgs(word));
        }
        _log.Add(message); // not thread safe yet!
    }
}

Executing the code after this refactoring should gives you the exact same output as before.

Using Thread

C-MultiThreadUsingThread in the source code

Now let’s try to multi thread this sample a little. We will define a new interface for that. This interface will declare 4 new members: BeginSay, EndSay, BeginWalk and EndWalk. All the “begin” method will initiate background thread to do the task. All the “end” method will act as synchronization mechanism. This is where we can catch exception. Here is the IThreadPerson interface:

public interface IThreadPerson : IPerson
{
    Thread BeginSay(string message);
    void EndSay(Thread callBack);
    Thread BeginWalk(int steps);
    void EndWalk(Thread callBack);
}

And an implementation of it:

public class Person : PersonBase, IThreadPerson
{
    #region IThreadPerson Members

    public Thread BeginSay(string message)
    {
        var starter = new ThreadStart(() =>
        {
            Say(message);
        });
        var thread = new Thread(starter);
        thread.Start();
        return thread;
    }

    public void EndSay(Thread talk)
    {
        talk.Join();
    }

    public Thread BeginWalk(int steps)
    {
        var starter = new ThreadStart(() =>
        {
            Walk(steps);
        });
        var thread = new Thread(starter);
        thread.Start();
        return thread;
    }

    public void EndWalk(Thread walk)
    {
        walk.Join();
    }

    #endregion
}

Each “begin” method will return a new thread which we can use to synchronize the process. By calling “end” method we explicitly wait for the result. These method can be placed into a try catch block to handle any possible exeption that may occur.

If you run this you will get something like this:

I walked: 1 steps
I walked: 1 steps
I walked: 1 steps
I walked: 1 steps
I'm saying: Let
I walked: 1 steps
I walked: 1 steps
I walked: 1 steps
I walked: 1 steps
I'm saying: me
I walked: 1 steps
I walked: 1 steps
I walked: 1 steps
I walked: 1 steps
I'm saying: tell
I walked: 1 steps
I walked: 1 steps
I walked: 1 steps
I walked: 1 steps
I'm saying: you
I walked: 1 steps
I walked: 1 steps
I walked: 1 steps
I walked: 1 steps
I'm saying: something
It took me 2017 ms
to walk 20 steps
and say "Let me tell you something"
Press any key to continue . . .

As you can see it took about half the tome to complete and we are doing two things at the same time.

Using Task

D-MultiThreadUsingTask in the source code

Using Task from Parallel Extensions Framework is almost the same as Thread except some types and the way to create a thread are different.

Here is the ITaskPerson interface:

public interface ITaskPerson : IPerson
{
    Task BeginSay(string message);
    void EndSay(Task callBack);
    Task BeginWalk(int steps);
    void EndWalk(Task callBack);
}

And the task implementation of Person:

public class Person : PersonBase, ITaskPerson
{
    #region ITaskPerson Members

    public Task BeginSay(string message)
    {
        var task = Task.Create(x => { Say(message); });
        return task;
    }

    public void EndSay(Task talk)
    {
        talk.Wait();
    }

    public Task BeginWalk(int steps)
    {
        var task = Task.Create(x => { Walk(steps); });
        return task;
    }

    public void EndWalk(Task walk)
    {
        walk.Wait();
    }

    #endregion
}

As you can see it’s a little simpler to create a Task than a Thread. The usage is also simpler. Here how to call it:

Task walk = p.BeginWalk(20);
Task talk = p.BeginSay("Let me tell you something");
p.EndWalk(walk);
p.EndSay(talk);

But this way of doing is somehow deprecated. Here is a more “Parallel Extensions” way:

Task[] tasks = new Task[]
{
    p.BeginWalk(20),
    p.BeginSay("Let me tell you something")
};
Task.WaitAll(tasks);

The execution of both method should give you almost the same result as with the Thread version.

Conclusion

I hope it will help you building multi thread and thread safe application.

Why ++ operator is not thread safe

Here is a quick hint on how to make your software thread safe. If you want to increment a member of your class you would probably do something like this:

public void NotSafe()
{
    val++;
}

Where val is a member of your class.But this is not thread safe. Doing this involve 4 operations:

  1. Loading the field and put it on the stack
  2. Putting 1 on the stack to increment by 1
  3. Calling add on the stack
  4. Storing the result in the field

Here is the corresponding IL:

.method public hidebysig instance void NotSafe() cil managed
{
    .maxstack 8
    L_0000: nop
    L_0001: ldarg.0
    L_0002: dup
    L_0003: ldfld int32 ClassLibrary1.Class1::val
    L_0008: ldc.i4.1
    L_0009: add
    L_000a: stfld int32 ClassLibrary1.Class1::val
    L_000f: ret
}

The problem is that anywhere between any of the 4 steps another thread can try to do the same thing. For example if a second thread pup in just after the first one is between step 1 and 2 they will both try to increment the same value and store it on the stack. To resolve this problem you can use a lock like this:

public void SafeLock()
{
    lock (valLock)
    {
        val++;
    }
}

But this will generate the following IL:

.method public hidebysig instance void SafeLock() cil managed
{
    .maxstack 3
    .locals init (
        [0] object CS$2$0000)
    L_0000: nop
    L_0001: ldarg.0
    L_0002: ldfld object ClassLibrary1.Class1::valLock
    L_0007: dup
    L_0008: stloc.0
    L_0009: call void [mscorlib]System.Threading.Monitor::Enter(object)
    L_000e: nop
    L_000f: nop
    L_0010: ldarg.0
    L_0011: dup
    L_0012: ldfld int32 ClassLibrary1.Class1::val
    L_0017: ldc.i4.1
    L_0018: add
    L_0019: stfld int32 ClassLibrary1.Class1::val
    L_001e: nop
    L_001f: leave.s L_0029
    L_0021: ldloc.0
    L_0022: call void [mscorlib]System.Threading.Monitor::Exit(object)
    L_0027: nop
    L_0028: endfinally
    L_0029: nop
    L_002a: ret
    .try L_000f to L_0021 finally handler L_0021 to L_0029
}

As you can see there is a lot more code involve to ensure thread safety, A quicker, faster an easier way to do this s to use Interlocked class. This class will use low level OS call to modify the member. Here’s how to use it:

public void Safe()
{
    Interlocked.Increment(ref val);
}

This will be render as two major IL steps:

  1. Load value from the field and put it on the stack
  2. Call Increment

Here is the IL representation of this:

.method public hidebysig instance void Safe() cil managed
{
    .maxstack 8
    L_0000: nop
    L_0001: ldarg.0
    L_0002: ldflda int32 ClassLibrary1.Class1::val
    L_0007: call int32 [mscorlib]System.Threading.Interlocked::Increment(int32&)
    L_000c: pop
    L_000d: ret
}

Now every time you will see something++ you will know that this is not thread safe and how to fix it.

Tuesday, February 10, 2009

The complete history of the Internet in 8 minutes

The complete, comprehensive history of the Internet from 1957 to 2009, in just 8 minutes.

[from Gizmodo]

Saturday, February 7, 2009

XML Serialization Tip: Hiding default constructor

Here is a quick tip. You all know that to serialize and deserialize an object in XML you need a default (parameter less) constructor. But sometimes you don’t want anybody to use it other than the serializer itself.

By making the default constructor obsolete you can make sure no code will directly call it.

public class MyClass
{
	[Obsolete("For XML Serialization Only", true)]
	public MyClass()
	{
		// Needed for XML serailisation
	}

	public MyClass(string initialValue)
	{
		// ...
	}

	// ...
}

Don’t forget to specify true as the second argument to the Obsolete attribute. This will raise a compilation error if this constructor is called directly. I insist on “directly” because nothing will prevent someone to use reflection to call it. XML Serialisation can occur because it uses reflection to do it.