The Cambrian Period of Concurrency

Back in July, I gave an OSCON talk that was a survey of language constructs for concurrency. That talk has been making the rounds lately. Jacob Kaplan-Moss made referred to it in a major section of his excellent keynote Snakes on the Web, and Tim Bray has cited it as a reference in his Concur.next series. It seems like a good time for me to explain some of the talk content in writing and add my perspective on the current conversations.

The Cambrian

The Cambrian period was marked by a rapid diversification of lifeforms. I think that we are in a similar situation with concurrency today. Although many of the ideas that are being tossed around for concurrency have been around for some time, I don’t think that we really have a broad body of experience with any of them. So I’m less optimistic than Tim and Bruce Tate, at least on time frame. I think that we have a lot of interesting languages, embodying a number of interesting ideas for working with concurrency. I think that some of those languages have gained enough interest/adoption that we are now in a position to get a credible amount of experience so that we can start evaluating these ideas on their merits. But I think that the window for doing that is pretty large, on the order of 5 to 10 years.   

What kinds of problems

The kinds of problems I am interested in are general purpose programming problems. I’m specifically not interested in scientific, numeric, highly regular kinds of computations or data parallel computations. Unlike Tim, I do think that web systems are a valid problem domain. I see this being driven by the need to drive down latency to provide good user response time, not to provide additional scalability (although it probably will).

It’s not like Java

Erik Engbrecht, one of Tim’s commenters said:

To get Java, you basically take Smalltalk and remove all of the powerful concepts from it while leaving in the benign ones that everyday developers use.

I think there’s something to be learned from that.

This presupposes that you know what all the good concepts are and what the benign ones are. It doesn’t seem like we are at that point. When Java was created, both Lisp and Smalltalk had existed for quite sometime and it was possible to do this kind of surgery. I don’t have a clear sense of what actually works well, much less what is powerful or benign.

The hardware made me do it

From where I sit, this is all about exploiting multicore hardware, and when I say this I mean machines with more than 4 or 8 hardware threads (I say threads, not cores – actual parallelism is what is important). The Sun T5440 is a 256 thread box. Intel’s Nehalem EX will let you build a 128 thread box later this year. Those are multicore boxes. If you look at experiments, you see that systems that seem to work well at 2 or 4 threads don’t’ work well at 16 or 64 threads. Since there’s not a huge amount of that kind of hardware around yet, it’s hard for people to run experiments at larger sizes. Experiments being run on 2 thread MacBook Pro’s are probably not good indicators of what happens at even 8 threads.. This is partially because dealing with more hardware threads requires more administrative overhead, and as the functional programming people found out, that overhead is very non-trivial. The point is, you have to run on actual hardware to have believable numbers.   This makes it hard for me to take certain kinds of systems seriously, like concurrency solutions running on language implementations with Global Interpreter Locks. See David Beazley’s presentation on Python’s Global Interpreter Lock, for an example.

Comments on specific languages

At this point I am more interested in paradigms and constructs as opposed to particular languages. However, the only way to get real data on those is for them to be realized in language designs and implementations.

  • Haskell – Functional Laziness aside, the big concurrency thing in Haskell is Software Transactional Memory (STM). There are other features in Haskell, but STM is the big one. STM is an active research field in computer science, and I’ve read a decent number of papers trying to make heads from tails. Among the stack that I have read, it seems to be running about even between the papers touting the benefits of STM and the the papers saying that STM cannot scale and will not work in practice. The jury is very much out on this one, at least in my mind.
  • Erlang – I like Erlang. It’s been in production use for a long time, and real systems have been built using it. In addition to writing some small programs and reviewing some papers by Erlang’s designers, I spent a few days at the Erlang Factory earlier this year trying to get a better sense of what was really happening in the Erlang community. While there’s lots of cool stuff happening in Erlang, I observed two things. First, the biggest Erlang systems I heard described (outside of Facebook’s) are pretty small compared to a big system today. Second, and more importantly, SMP support in Erlang is still relatively new. Ulf Wiger’s DAMP09 presentation has a lot of useful information in it. On the other hand, BEAM, the Erlang VM is architected specifically for the Erlang process/actor model. This feels important to me, but we need some experimental evidence.
  • Clojure – Clojure as a ton of interesting ideas in it. Rich Hickey has really done his homework, and I have a lot of respect for the work that he is doing. Still it’s the early days for Clojure, and I want to see more data. I know Rich has run some stuff on one of those multiple hundred core Azul boxes, but as far as I know, there’s not a lot of other data.
  • Scala – The big thing in Scala for concurrency is Actors, but if you compare to Erlang, Actors are the equivalent of Erlang processes. A lot of the leverage that you get in Erlang comes from OTP, and to get that in Scala, you need to look at Jonas Boner’s highly interesting Akka Actor Kernel project. Akka also includes an implementation of dataflow variables, so Akka would give you a system with Actors, supervision, STM, and Dataflow (when it’s done).   
  • libdispatch/Grand Central Dispatch – Several of Tim’s commenters brought up Apple’s Grand Central Dispatch, now open sourced as libdispatch. This is a key technology for taking advantage of multicore in Snow Leopard. GCD relies on programmers to create dispatch queues which are then managed by the operating system. Programmers can send computations to these queues via blocks (closures), which are a new extension to Objective-C. When I look at Apple’s guide to migrating to GCD from threads, I do see a model that I prefer to threads, but it is not as high level as some of the others. Also, the design seems oriented towards very loosely coupled computations.   It will be several years before we can really know how well GCD is working. I am typing this post on a 16 thread Nehalem Mac Pro, and I rarely see even half of the CPU meters really light up, even when I am running multiple compute intensive tasks. Clearly more software needs to take advantage of this technology before we have verdict on its effectiveness in production.
  • .Net stuff like F#/Axum, etc – There is some concurrency work happening over on the CLR, most notably in F# and Axum. I spent some time at Lang.NET earlier this year, and got a chance to learn a bit about these two technologies. If you look at paradigms, the concurrency stuff looks very much like Erlang or Scala, with the notable exception of join patterns, which are on Martin Odersky’s list for Scala. I will admit to not being very up to speed on these, mostly for lack of Windows and the appropriate tools.

Other thoughts

Jacob’s take away from my talk at OSCON was “we’re screwed”. That’s not what I wanted to convey. I don’t see a clear winner at the moment, and we have a lot of careful experimentation and measuring to do. We are quite firmly in the Cambrian, and I’m not in a hurry to get out – these things need to bake a bit longer, as well as having some more experimentation.

In addition to my talk, and Tim’s wiki page, if you are really interested in this space, I think that you should read Concepts, Techniques, and Models of Computer Programming by Peter van Roy and Seif Haridi. No book can be up to date with the absolute latest developments, but this book has the best treatment that I’ve seen in terms of trying to stratify the expressiveness of sequential and concurrent programming models.

11 Responses to “The Cambrian Period of Concurrency”


  • What do you think of the MultiKernel?

    http://www.barrelfish.org

    The Multikernel: A new OS architecture for scalable multicore systems. In Proceedings of the 22nd ACM Symposium on OS Principles, Big Sky, MT, USA, October 2009
    http://www.barrelfish.org/barrelfish_sosp09.pdf

    Reid

  • The point is, you have to run on actual hardware to have believable numbers.

    So what’s a poor developer to do?

    If you’ll indulge me for a moment, I think it would be real neat if I could sign up with a cloud computing service and fire up a 256 thread machine for a few hours to play with some of these concurrency paradigms in the Real World.

    A million tiny Wide Finders. Exciting!

  • What about Stackless Python, or Fibers + assorted gems in Ruby 1.9?

  • Reading ‘Concepts, Techniques and Models of Computer Programming’ at the moment (next to ‘Real World Haskell’), and it indeed is a fascinating book.

    Nice overview, by the way. I’ve been working with dataflow variable based systems in several languages lately, and it’s definitely a useful abstraction, even though it doesn’t get as much (enough?) press as actor or STM based systems.

  • Rich Hickey gives a recent talk at Qcon that discuss in more depth about how clojure does it and what clojure’s primary focus on concurrency is, namely single system concurrency.

    http://www.infoq.com/presentations/Value-Identity-State-Rich-Hickey

  • Thanks for the update and the nuance! I’m sure people will be paying attention.

    Less nuance seems ok to me in places…

    “Experiments being run on 2 thread MacBook Pro’s are probably not good indicators of what happens at even 8 threads.”

    I sort-of wonder why you feel the need for the “probably”?

    I have not yet found an application in my current problem domains of choice (dynamic websites and analytics processing) where dual-core helps predict the performance of many-core.

    But then some things deserve more nuance…

    “This makes it hard for me to take certain kinds of systems seriously, like concurrency solutions running on language implementations with Global Interpreter Locks.”

    Hmm. I’ve found the multiprocessing module rather easy to use and I’ve found the resulting process/threads hybrids (a bit over one process per CPU core seems to work well) sometimes work better in practice than single-process multi-hardware-thread things on the JVM. I’d argue you should take seriously any setup that solves real-world problems with decent real-world performance :-)

  • Reid,
    Barrelfish is one layer down from the layer that I am interested in. It looks promising in terms of improving the performance of message oriented models.

    Travis,
    Keep in mind that most cloud environments are provisioned via some form of virtualization. That’s likely to impact any numbers that you might generate there.

    Ronen,
    Global interpreter locks.

    Nicolas,
    Yep, CTM style dataflow concurrency is definitely underrated.

    Mac,
    Yes, I’ve seen the talk and several earlier variants, and have talked to Rich in person.

    Leo,
    If you are happy using multiprocessing, that’s great. It doesn’t really help raise the level of concurrent programming, it’s just an implementation detail for getting parallelism.

  • typo
    Jacob Kaplan-Moss made referred -> … made references

  • Clojure as a ton of interesting ideas in it -> Clojure has a ton…

  • Ted,

    When you say that most Erlang-based applications you’ve heard of are small, on what dimension do you measure? Number of cores? Number of computers? Number of concurrent processes? LOC?

    In terms of LOC, there are several commercial systems built in Erlang with hundreds of thousand lines of code. The biggest I know of is closer to 2 million. It’s not uncommon for these systems to have tens of thousand concurrent processes on each node. These systems were not described at the Erlang Factory, since they are more or less ‘legacy’ in the Erlang community.

    In terms of number of cores, I know of prototype systems using 24 cores, but this is largely uncharted territory. The long-running telecom systems are mainly using quad-cores AFAIK. They tend to lag behind, since they need to wait for NEBS-compliant versions of the processor boards.

    Since you mention Facebook chat, I’m tempted to think that you mean ‘big’ in terms of number of computers. Otherwise, in terms of functional content, the Kreditor system mentioned at the Factory is far more complex.

    Commercial, high-availability products have been using SMP Erlang in the field since 2007. Granted, this is not a terribly long time, but I think there are certainly enough commercial users out there to label SMP Erlang as battle tested.

  • Ulf,

    Sorry to be vague on size. I mostly meant number of machines / number of cores. Certainly the sizes of Erlang software systems is sizable.

    As far as SMP goes, I’m looking at the high core count machines, and pretty much everybody is on the same footing there, simply because the hardware is pretty recent.

Leave a Reply