Saturday, November 5, 2011

Threading in C#

How Threading Works

[Joseph Albahari, O’Reilly Media, Inc. All rights reserved. www.albahari.com/threading/]


Multithreading is managed internally by a thread scheduler, a function the CLR typically delegates to the operating system. A thread scheduler ensures all active threads are allocated appropriate execution time, and that threads that are waiting or blocked (for instance, on an exclusive lock or on user input) do not consume CPU time. On a single-processor computer, a thread scheduler performs timeslicing— rapidly switching execution between each of the active threads. Under Windows, a time-slice is typically in the tens of milliseconds region—much larger than the CPU overhead in actually switching context between one thread and another (which is typically in the few-microseconds region). On a multi-processor computer, multithreading is implemented with a mixture of time-slicing and genuine concurrency, where different threads run code simultaneously on different CPUs. It’s almost certain there will still be some time-slicing, because of the operating system’s need to service its own threads—as well as those of other applications. A thread is said to be preempted when its execution is interrupted due to an external factor such as time-slicing. In most situations, a thread has no control over when and where it’s preempted.


Threads vs Processes


A thread is analogous to the operating system process in which your application runs. Just as processes run in parallel on a computer, reads run in parallel within a single process. Processes are fully isolated from each other; threads have just a limited degree of isolation. In particular, threads share (heap) memory with other threads running in the same application. This, in part, is why threading is useful: one thread can fetch data in the background, for instance, while another thread can display the data as it arrives.




Threading’s Uses and Misuses


Multithreading has many uses; here are the most common:


Maintaining a responsive user interface

By running time-consuming tasks on a parallel “worker” thread, the main UI thread is free to continue processing keyboard and mouse events.


Making efficient use of an otherwise blocked CPU

Multithreading is useful when a thread is awaiting a response from another computer or piece of hardware. While one thread is blocked while performing the task, other threads can take advantage of the otherwise unburdened computer.


Parallel programming

Code that performs intensive calculations can execute faster on multicore or multiprocessor computers if the

workload is shared among multiple threads in a “divide-and-conquer” strategy .


Speculative execution

On multicore machines, you can sometimes improve performance by predicting something that might need to be done, and then doing it ahead of time. LINQPad uses this technique to speed up the creation of new queries. A variation is to run a number of different algorithms in parallel that all solve the same task. Whichever one finishes first “wins”—this is effective when you can’t know ahead of time which algorithm will execute fastest.


Allowing requests to be processed simultaneously

On a server, client requests can arrive concurrently and so need to be handled in parallel (the .NET Framework creates threads for this automatically if you use ASP.NET, WCF, Web Services, or Remoting). This can also be useful on a client (e.g., handling peer-to-peer networking—or even multiple requests from the user). With technologies such as ASP.NET and WCF, you may be unaware that multithreading is even taking place—unless you access shared data (perhaps via static fields) without appropriate locking, running afoul of thread safety. Threads also come with strings attached. The biggest is that multithreading can increase complexity. Having lots of threads does not in and of itself create much complexity; it’s the interaction between threads (typically via shared data) that does. This applies whether or not the interaction is intentional, and can cause long development cycles and an ongoing susceptibility to intermittent and nonreproducible bugs. For this reason, it pays to keep interaction to a minimum, and to stick to simple and proven designs wherever possible. This article focuses largely on dealing with just these complexities; remove the interaction and there’s much less to say!




Threading also incurs a resource and CPU cost in scheduling and switching threads (when there are more active threads than CPU cores)—and there’s also a creation/tear-down cost. Multithreading will not always speed up your application—it can even slow it down if used excessively or inappropriately. For example, when heavy disk I/O is involved, it can be faster to have a couple of worker threads run tasks in sequence than to have 10 threads executing at once. (In Signaling with Wait and Pulse, we describe how to implement a producer/consumer queue, which provides just this functionality.)




Creating and Starting Threads


As we saw in the introduction, threads are created using the Thread class’s constructor, passing in a ThreadStart delegate which indicates where execution should begin. Here’s how the ThreadStart delegate is defined:

public delegate void ThreadStart();


  • All examples assume the following namespaces are imported:

using System;

using System.Threading;


    Calling Start on the thread then sets it running. The thread continues until its method returns, at which point the thread ends. Here’s an example, using the expanded C# syntax for creating a TheadStart delegate:



    class ThreadTest

    {

    static void Main()

    {

    Thread t = new Thread (new ThreadStart (Go));

    t.Start(); // Run Go() on the new thread.

    Go(); // Simultaneously run Go() in the main thread.

    }

    static void Go()

    {

    Console.WriteLine ("hello!");

    }

    }

    In this example, thread t executes Go() – at (much) the same time the main thread calls Go(). The result is two nearinstant hellos.

    A thread can be created more conveniently by specifying just a method group—and allowing C# to infer the

    ThreadStart delegate:


    Thread t = new Thread (Go); // No need to explicitly use ThreadStart


    Another shortcut is to use a lambda expression or anonymous method:


    static void Main()

    {

    Thread t = new Thread ( () => Console.WriteLine ("Hello!") );

    t.Start();

    }



    Passing Data to a Thread


    The easiest way to pass arguments to a thread’s target method is to execute a lambda expression that calls the method with the desired arguments:


    static void Main()

    {

    Thread t = new Thread ( () => Print ("Hello from t!") );

    t.Start();

    }

    static void Print (string message)

    {

    Console.WriteLine (message);

    }


    With this approach, you can pass in any number of arguments to the method. You can even wrap the entire

    implementation in a multi-statement lambda:

    new Thread (() =>

    {

    Console.WriteLine ("I'm running on another thread!");

    Console.WriteLine ("This is so easy!");

    }).Start();


    You can do the same thing almost as easily in C# 2.0 with anonymous methods:

    new Thread (delegate()

    {

    ...

    }).Start();


    Another technique is to pass an argument into Thread’s Start method:


    static void Main()

    {

    Thread t = new Thread (Print);

    t.Start ("Hello from t!");

    }

    static void Print (object messageObj)

    {

    string message = (string) messageObj; // We need to cast here

    Console.WriteLine (message);

    }


    This works because Thread’s constructor is overloaded to accept either of two delegates:


    public delegate void ThreadStart();

    public delegate void ParameterizedThreadStart (object obj);


    The limitation of ParameterizedThreadStart is that it accepts only one argument. And because it’s of type object, it usually needs to be cast.


    Lambda expressions and captured variables


    As we saw, a lambda expression is the most powerful way to pass data to a thread. However, you must be careful about accidentally modifying captured variables after starting the thread, because these variables are shared. For instance, consider the following:



    for (int i = 0; i < 10; i++)

    new Thread (() => Console.Write (i)).Start();


    The output is nondeterministic! Here’s a typical result:


    0223557799


    The problem is that the i variable refers to the same memory location throughout the loop’s lifetime. Therefore, each

    thread calls Console.Write on a variable whose value may change as it is running!

    • This is analogous to the problem we describe in “Captured Variables” in Chapter 8 of C# 4.0 in a Nutshell. The problem is less about multithreading and more about C#'s rules for capturing variables (which are somewhat undesirable in the case of for and foreach loops).

    The solution is to use a temporary variable as follows:


    for (int i = 0; i < 10; i++)

    {

    int temp = i;

    new Thread (() => Console.Write (temp)).Start();

    }


    Variable temp is now local to each loop iteration. Therefore, each thread captures a different memory location and there’s no problem. We can illustrate the problem in the earlier code more simply with the following example:


    string text = "t1";

    Thread t1 = new Thread ( () => Console.WriteLine (text) );

    text = "t2";

    Thread t2 = new Thread ( () => Console.WriteLine (text) );

    t1.Start();

    t2.Start();


    Because both lambda expressions capture the same text variable, t2 is printed twice:

    t2

    t2


    Join and Sleep


    You can wait for another thread to end by calling its Join method. For example:


    static void Main()

    {

    Thread t = new Thread (Go);

    t.Start();

    t.Join();

    Console.WriteLine ("Thread t has ended!");

    }

    static void Go()

    {

    for (int i = 0; i < 1000; i++) Console.Write ("y");

    }


    This prints “y” 1,000 times, followed by “Thread t has ended!” immediately afterward. You can include a timeout when calling Join, either in milliseconds or as a TimeSpan. It then returns true if the thread ended or false if it timed out.


    Thread.Sleep pauses the current thread for a specified period:


    Thread.Sleep (TimeSpan.FromHours (1)); // sleep for 1 hour

    Thread.Sleep (500); // sleep for 500 milliseconds


    While waiting on a Sleep or Join, a thread is blocked and so does not consume CPU resources.

    • Thread.Sleep(0) relinquishes the thread’s current time slice immediately, voluntarily handing over the CPU to other threads. Framework 4.0’s new Thread.Yield() method does the same thing—except that it relinquishes only to threads running on the same processor.
    • Sleep(0) or Yield is occasionally useful in production code for advanced performance tweaks. It’s also an excellent diagnostic tool for helping to uncover thread safety issues: if inserting Thread.Yield() anywhere in your code makes or breaks the program, you almost certainly have a bug.

    Naming Threads


    Each thread has a Name property that you can set for the benefit of debugging. This is particularly useful in Visual Studio, since the thread’s name is displayed in the Threads Window and Debug Location toolbar. You can set a thread’s name just once; attempts to change it later will throw an exception. The static Thread.CurrentThread property gives you the currently executing thread. In the following example, we set the main thread’s name:


    class ThreadNaming

    {

    static void Main()

    {

    Thread.CurrentThread.Name = "main";

    Thread worker = new Thread (Go);

    worker.Name = "worker";

    worker.Start();

    Go();

    }

    static void Go()

    {

    Console.WriteLine ("Hello from " + Thread.CurrentThread.Name);

    }

    }


    No comments:

    Post a Comment