No monadic headaches: multi-core concurrency is easy in Haskell

Over on Good Math, Bad Math, MarkCC writes about Erlang and Haskell:

Haskell systems don’t, in general, parallelize well. They’re particularly bad for the kind of very coarse thread-based concurrency that we need to program for on multi-core computers, or distributed systems.

I suspect Mark hasn’t written much multicore concurrent Haskell code, as he resorts to the usual “monads are hard” argument, rather than offering example code. However, here in the real world, concurrent programming in Haskell is pretty much identical to Erlang, though with nicer syntax, in a more modern language.

Importantly, there are no monadic headaches involved (monads don’t even enter the picture). And the results often out-perform Erlang (and most other languages), due to native code compilation with type erasure, and better data structure support, provided by Haskell libraries (particular for strings) simply not available in Erlang.

So let’s look at the basic tools at our disposal here are:

Starting first with a look at the Erlang code. Mark starts with an introduction to syntax:

    -module(map).
    -export(map/2, two/1).

    map(F, []) -> [];
    map(F,[Car|Cdr]) ->
       NewCar = apply(F, [Car]),
       NewCdr = map(F,Cdr),
       [NewCar|NewCdr].

    two(X) -> 2*X.

Which looks a bit like an antique Haskell, with funny numbers instead of a type system (in the export list). Rewriting in Haskell:

    module Map (map, two) where

    map _ []     = []
    map f (x:xs) = f x : map f xs

    two = (2*)

Ok, so simpler and cleaner (I mean, really, using `apply’ for function application!?)

Now, let’s look at the Erlang for forking a simple process, that receives messages in a loop:

    -module(echo).
    -export([start/0, loop/0]).

    start() ->
      spawn(echo, loop, []).

    loop() ->
       receive
          {From, Message} ->
             From ! Message,
             loop()
       end.

This translates directly to Haskell’s forkIO and MVars for communication: We just launch the main thread here, which forks a second Haskell thread, which in turn sits receiving messages and printing them:

    main = do
        box <- newEmptyMVar
        forkIO (echo box)

    echo c = forever $ do
        msg <- takeMVar c
        print msg

‘spawn’ maps to ‘forkIO’, and Erlang’s builtin message passing maps to a communication mechanism of your choice in Haskell. We chose MVars here, but you could also have gone with

To run such a threaded Haskell program across multiple cores (this coarse gained, multicore concurrency Mark talks about), we need to nothing more than compile in the SMP runtime:

    $ ghc -threaded -O Concurrent.hs

Now that’s compiled to real native code, on an smp parallel runtime! We can run this over as many cores as we have, and GHC will schedule threads across the cores:

    $ ./a.out +RTS -N2

There are no monadic headaches. Concurrent and parallel multicore programming in Haskell is simple, efficient and easy!

Since its so easy, and has such little impact on the structure of your Haskell programs, you can start speculatively supporting multicore hardware: your Haskell program will utilise as many real cores as it needs, without needing to recompile and modify the code. Just change the value of `N’, and throw around forkIO liberally, much as you would in Erlang.

So let’s do something useful with this, how about a little program that computes primes and fibonacci numbers? We’ll just fork processes to compute prime numbers and fibonacci numbers, and have the main thread lazily print results as they’re found:

    import Control.Concurrent.Chan
    import Control.Concurrent

    -- Fork some computation processes, print their results
    main = do
        primes <- run primes
        fibs   <- run fibonacci
        mapM_ print $ zip primes fibs

      -- fork a process, return any messages it produces as a list
      where
        run f = do
            c <- newChan
            l <- getChanContents c
            forkIO (writeList2Chan c f)
            return l

    -- A function to compute primes
    primes = sieve [2..]
        where
           sieve (p : xs) = p : sieve [x | x <- xs, x `mod` p > 0]

    -- A function to compute fibonacci numbers
    fibonacci = 0 : 1 : zipWith (+) fibonacci (tail fibonacci)

Very importantly, we see our computational processes are just pure Haskell code, and our main function is some IO skin, as usual. This is almost identical to code you would write ignoring concurrency: nothing scary is needed to write parallel Haskell!

Compiling this with the parallel runtime:

        $ ghc -threaded -O Server.hs

And running it on across 2 cores:

    $ time ./Z +RTS -N2
    (2,0)
    (3,1)
    (5,1)
    (7,2)
    (11,3)
    (13,5)
    (17,8)
    (19,13)
    (23,21)
    (29,34)
    (31,55)
    (37,89)
    (41,144)
    (43,233)
    (47,377)
    (53,610)
    (59,987)
    (61,1597)
    (67,2584)
    ^C interrupted

Easy! The key points to take away:

  • parallel and concurrent code is easy in Haskell, much as in Erlang
  • multi-core concurrency is well supported, out of the box
  • concurrent Haskell is damn fast

And most importantly, there are no monadic headaches!

No more exceptions: debugging Haskell code with GHCi

One of the true gems of the new GHC 6.8.1 release (besides the ubiquitous 20% speedup in your compiled code) is that we finally have a robust solution to tracking down unknown exceptions in Haskell code! Using the shiny new GHCi debugger, courtesy of Simon Marlow, José Iborra, Bernie Pope and Andy Gill it is possible to obtain a perfect backtrace to source location, and display the source, of your failing code — the kind you might find if you’re a bit sloppy when reasoning about totality:

    $ ./a.out
    *** Exception: Prelude.head: empty list

Pretty useless error message. Tracking this down has been a hard problem using GHC in the past. You had to either:

  • Convince the profiler to give you a stack trace (i.e. the little known +RTS -xc option) (buried in the “interested souls” section of the GHC manual
  • Replace calls to head with calls that will fail with a better error message (maybe using a library such as loch)
  • Get your code into Haskell 98 shape, and use Catch totality checker to statically analyse your code for partial functions, or maybe use Hat to trace the code.

All three solutions are quite obscure, the first one works but is tedious, the second is used by no one, and the third only works in theory. Yet the problem of partial functions and unknown exceptions persists.

Until now.

Imagine you develop a lovely Haskell program called HsColour, and deep down inside is a call to ‘head’:

    head       :: [a] -> a
    head (x:_) = x
    head []    = error "Prelude.head: empty list"

that relies on some invariant in your code you forgot to write a QuickCheck property for. As long as the invariant holds, the call to head is safe. When run, your program happily marks up Haskell code as html:

    $ runhaskell HsColour.hs -css < source.hs
    -- The stream data type
    data Stream a = forall s. Unlifted s =>
                              Stream !(s -> Step a s)  -- ^ a stepper function
                                     !s                -- ^ an initial state

    map :: (a -> b) -> Stream a -> Stream b
    map f (Stream next0 s0) = Stream next s0
      where
        next !s = case next0 s of
            Done       -> Done
            Skip    s' -> Skip        s'
            Yield x s' -> Yield (f x) s'

Lovely stuff. Years go by, and someone joins your project, and starts refactoring. In the process, they break this crucial invariant. Now your programs says:

    $ runhaskell HsColour.hs -css < source.hs
    HsColour.hs: Prelude.head: empty list

Where do you begin to debug this? Not much information. Now, some people, at this point would mumble something about monads being too hard, and rewrite the project in Python, but there’s no need. Fire up ghci 6.8.1, and set a breakpoint on exceptions, and set up some command line args for your program:

    $ ghci HsColour.hs
    *Main> :set -fbreak-on-exception
    *Main> :set args "source.hs"

Now run your program with tracing on, and GHCi will stop your program at the call to error:

    *Main> :trace main
    Stopped at (exception thrown)

Ok, good. We had an exception… Let’s just back up a bit and see where we are. Watch now as we travel backwards in time through our program, using the (bizarre, I know) “:back” command:

    [(exception thrown)] *Main> :back
    Logged breakpoint at Language/Haskell/HsColour/Classify.hs:(19,0)-(31,46)
    _result :: [String]

This tells us that immediately before hitting error, we were in the file Language/Haskell/HsColour/Classify.hs, at line 19. We’re in pretty good shape now. Let’s see where exactly:

    [-1: Language/Haskell/HsColour/Classify.hs:(19,0)-(31,46)] *Main> :list
    18  chunk :: String -> [String]
        vv
    19  chunk []    = head []
    20  chunk ('\r':s) = chunk s -- get rid of DOS newline stuff
    21  chunk ('\n':s) = "\n": chunk s
                                       ^^

Ok, that’s awesome! Our bogus call to head, deep down in the program is somewhere on those two lines: can you spot it? :) Fixing that line, and we’re back in business. The days of head scratching at head [] failures are over. Now go and write some QuickCheck properties to ensure this never happens again.

The GHCi debugger can do a lot more than just travel back in time and find your bugs for you, and it is just one point in the world of declarative debugging. For more reading have a look at:

Haskell has a debugger, it’s there if you need it.