Multithreaded Design

2002-12-07: I was thinking about discussions I've had with co-workers about multithreading, and thinking about the people whose lectures I've been attending recently. I decided that I've learned some techniques that I'd like to try to share with people as well. Why not try to write some of what I've learned as a paper or tutorial? Well, here goes. What you see here is a far-from-complete beggining of a first draft. Comment if you like, of course.

Revised: 2002-12-07, 2002-12-08


Introduction

I heard a lecture by Rebecca Wirfs-Brock where she mentioned that it is possible to teach people design. It can be described as being able to "see" the problem, and it is possible to teach people to see better. I know a lot of people have problems with multithreaded design, so I thouhgt I'd try to describe how I "see" multithreaded problems. Hopefully, I'll be able to show how I analyze, break down, and solve multithreaded problems.

Multithreaded computing and parallel computing are very similar. I think the difference is primarily in the granularity. In parallel computing, communication and synchronization is much harder. Not harder as in difficult from a programming perspective, but harder as in more time consuming and computationally expensive. Active centers of execution operate in greatly separated memory spaces, often on completely different machines. A lot of thought is given to minimizing the amount of data transfered and the number of synchronization operations required, since they take so much time that would be better spent calculating. Thus it is almost impossible to have an accidental data conflict, and the amount of thought that is put into communication means that deadlock is not as common.

In multithreading, everything is almost too easy. All the threads live in the same process space, so they can communicate freely and even inadvertantly. Synchronization is cheap and frequent, and it's easy to not think things all the way through, so deadlocks are more common.

A friend and co-worker (who was interested in improving his multithreading skills) once asked me to call him over if I was ever working on another multithreading problem. My reply was that I am always working on a multithreading problem. In every thing I write, I am aware of the issues that would be brought up in a multithreaded environment. Even when I know there will never be more than one thread, because I know how to "see" multithreaded design, I can quickly evaluate where multithreaded issues would be.

It's hard to be categorical when talking about programming, considering that so many general terms (like class, type, and object) have specific technical meanings. Please bear with me.

I usually think of programming entities being divited into three main categories. There are stores, whose main job is to hold data. There also coordinators, whose main job is to get things done. In object oriented parlance, you could say stores are fields and coordinators are methods. But the ideas even apply to whole objects and collections of objects. A store object will hold the data and know how to do appropriate manipulations, but it won't know what to do with the data. A coordinator object will know how to get something done, and will direct the stores to move data around. Stores and coordinators are good programming practice - division of labor, data hiding, breaking things into smaller tasks and defering the actual work until later. Stores are the "what". Coordinators are the "how". However, there's still something missing. A program will have lots of coordinators, often ones that do opposite things. Which coordinators should "do their thing" at any given time? The third entity that needs to be considered in a design is what I call a "thrust" or intention. You could think of a thrust as an actor or activity. You might think of it as the missing "when". When designing a system, you not only need to think of the objects, but the activities that these objects are participating in.

[Revise: coordinators represent information that is not raw data. Connections, associations, state. They are the structure or framework that hold the stores together. Wharehouses and railway lines(?). They are not really methods! The methods of a coordinator object know how to manipulate state, just as methods of a store object know how to manipulate data. Perhaps I need a new name. ]

What is the difference between a coordinator and a thrust? Scope and axis. A coordinator has a scope limited by encapsulation, data hiding, and functional decomposition. Like stores, coordinators tend to live on the space axis or data plane (or however you think of it). By that I mean you can draw a box representing an object, and place little boxes representing all its methods and data inside it. You can (hopefully) draw lines bounding what a method or object knows about. You can even draw lines connecting a method to all the other methods it calls and that call it.

Unfortunately, you can't really draw a thrust well on such a diagram (though you may be obliged to try). A thrust lives in the time axis. It is what happens as time progresses. A coordinator will carry out actions on behalf of a thrust, but a thrust cuts through many coordinators at once. A thrust is a thread. It is a thread's call stack. It is a thread's intent. A thrust lives perpendicular to objects. And thrusts are damn cool and very imporant.

The best portrayal I've ever seen of the life of a thread was a certain car commercial. The commercial starts with this guy working in a boring office. He walks through a door and finds himslef in a wild west saloon during a bar fight, gets thrown through the front window and falls from the sky into a marriage ceremony with an orangutan, runs through another door and ends up in a empty white room with just his dream car. Similarly, every time a thread enters a method call, it leaves it's old world behind and suddenly lives in a whole new world. The thread suddenly "knows" about a whole new set objects. What is happening where the thread came from is now shrouded in mystery and suspicion. But somehow, the thread still has an intent, an activity it is trying to accomplish. That is what I am calling the thrust.

The Anti-Pattern

The first thing most programmers learn about multithreaded programming is that you can protect an object from concurrent access by locking it. This is generally true. This leads to the belief that you can protect many objects from concurrent access by using many locks. This is generally not true. Systems using this design pattern often have deadlocks. Another symptom of this pattern is that it is usually applied after the code has been written. The object is usually not fully protected by the lock because there are many places where the programmer has forgotten to synchronize. The code is very likely to have race conditions. Race conditions are more common than deadlocks, because they are much harder to detect. It is usually pretty obvious when a deadlock occurs while using an app, so common deadlocks will be fixed pretty quickly. Race conditions can be hard to observer even if they occur frequently. For example, one race condition I saw recently caused a memory leak. It was only noticed after an all day stress test ran out of memory. Unfortunately, many race conditions occur rarely and are hard to reproduce. Another symptom is that the same area of code will have persistent threading bugs, which re-appear whenever something changes in the code.

This antipattern happens because not enough consideration has been given to multithreading issues. The model is usually that there are "many threads" trying to access "this object". Little consideration is given to what different things the threads might be trying to do, and little consideration is given to how this objects threading behavior will affect other objects.

˜ ™

Extra Material

Programmers are familiar with maintaining encapsulation for objects. They often don't know how to maintain encapsulation for thrusts. In fact, many programmers don't even realize it's something they should be thinking about!

˜ ™

The first place programmers learn to place locks in on the stores themselves. It turns out that the locks often have to be managed by the collaborators, not the stores. The stores don't know enough about what is going on to provide useful locking. You can see this in the Java API. The first generation collection objects, Vector and Hashtable, were fully synchronized. However, the second generation collection objects, ArrayList and HashSet, are not synchronized at all! Why? Because they are store objects. Any synchronization they perform is wasteful overhead when the synchronization must be performed by the containing coordinator objects.

Programmers often forget that data becomes "useless" once they release its protecting lock. The value is now "the value the object used to have but doesn't have any more". In an appropriately designed system, these values are not really useless, but can be put to work so long as the caveats are understood. However, if a programmer uses the value as if it were still valid, a race condition is bound to ensue. Consider the following:

if (!stack.isEmpty()) {
    print(stack.pop());
}
This must be a fragment from a coordinator, and stack is a store. Even if stack is "thread-safe" and synchronizes both its isEmpty and pop methods, this fragment has a race condition. As soon as isEmpty returns, the return value is "the value the object used to have but doesn't have any more". Even assuming isEmpty returns false, it is possible for a different thread to slip in and remove the last element from the stack before the current thread can evaluate the condition and call pop itself. As you can see, making isEmpty a synchronized method gains us nothing. The information returned is still unreliable, so time spent synchronizing was wasted.

Here's how we can fix this:

synchronized (stackProtectingLock) {
    if (!stack.isEmpty()) {
        print(stack.pop());
    }
}
Now we know that, so long as all access to stack is protected by the stackProtectingLock, our fragment does not have a race condition. The value returned by isEmpty is still correct by the time we reach pop because we have not released the lock.

There is still one more things we should do, however. As written, this fragment violates our principle of holding a lock for the shortest amount of time possible. We don't know what print is going to do. It might take a long time. It might need to acquire other locks. Assuming the original fragment was correct, it looks like we don't need to protect the stack any more once the object has been removed from it and we start printing the object. Assuming we will never pop null from the stack, it would be even better to write the following:

Object obj=null;
synchronized (stackProtectingLock) {
    if (!stack.isEmpty()) {
        obj=stack.pop();
    }
}
if (null!=obj) {
  print(obj)
}
Wow! We went from three lines to nine! The final code is nowhere near as clear as the original. Isn't that bad? Well, yes, it is unfortunate. However, the last one always works while the first one does not. That's what's important.

˜ ™

hard to talk about (i=5; if 5==i...)

˜ ™

2004-06-03 : Tip: Once your multi-threaded object starts having states, don't use boolean flags to keep track of the current state! Use a state variable and state enumeration instead. (Why: once you have multiple flags, it becomes difficult to determine the current state since you have to check all the flags to be sure. Also, the number of possible states doubles every time you add a flag, even if you just need one more state. Named states are earier to understand and easier to draw and modify the state transition diagram for.)
C o m m e n t s :     updated: 2009-06-09 (2876 days ago)

May 26, 2004: Just wanted to say: wicked nice int. Rock On.
-- James Piechota
2004-06-03 : Thanks for the appreciative feedback. I sure hope I get a chance to flesh this out somemore one of these days.
-- Louis
2006-02-16 : It helps to also think about narrowing the data to be synchronized. Server and persistent data is inherently multi-threaded.
-- Nobody
2006-02-17 : I'm not sure what you mean by inherently multi-threaded. In some sense, the concept of threaded-ness doesn't even apply to data. The number 5, the date February 16, the name of some customer - there are no threads implicit in these concepts. It's not until you want to modify the fields of some object that threading becomes an issue. And in fact, if there was just one field it still wouldn't be an issue. The concern is how to maintain consistent views of your data when there are multiple view points (threads) in you application. Sometimes threading problems remind me of relativity thought experiments. :) It is quite possible to write servers and persistent data stores that are not multi-threaded and don't need to be, though obviously making them thread aware allows you much more flexibility in how they are used.

"Narrowing the data to be synchronized" is also a little bit misleading. Synchronization is about speed, not memory. The goal is to minimize time wasted while maintaining correctness. It is certainly true that accessing less data takes less time, so in that sense increasing the granularity of your locking by protecting less data with each lock will tend to help. Structuring your data so that more of it is immutable and can be accessed without locking can also help a lot, and that could be considered narrowing the data to be synchronized. This still involves a trade off of memory for speed.
-- Louis


2007-11-28 : I have read a quarter through but the colors on this page makes the document very hard to keep up with.
-- Kerrigangster
2008-17-02 : Thanks - that was a good read, but agree with the posters above, you should definately sort out the colour scheme as it's hard to read.
-- Big Jimmy McTaggart
Edit