Haskell is a strict language

A common misconception from non-Haskellers is that Haskell, when compiled, pays an ongoing penalty for supporting laziness by default. The idea is that in a lazy language, every expression would suspend to a heap-allocated, garbage collected, thunk. Even if you were to use the variable immediately.

That sounds scarily expensive.

In the real world however, Haskell is often as not a strict language. Compiled Haskell undergoes strictness analysis which filters out the “needs to be lazy” code, from the “just do it already” strict stuff. The end result is purely functional code, written in a lazy style, that gets compiled down to raw, strict assembly, without any thunks, heaps, or GC getting in the way of raw, all out speed.

Like type inference, this means we usually don’t have to manually annotate code with its strictness (or laziness) properties, instead, the compiler figures it out from how we use the values.

Here’s a nice example, from a new arrays library, based on the larger data parallel arrays project.

We’ll write some code to generate 3 arrays, add them together, then bit shift all the elements, and finally, take the sum.

In a language like C, this would mean first allocating a bunch of arrays, writing loops to fill them, then separate loops to do the the rest of the pipeline. If we were looking for speed, we’d perhaps try to manually fuse some of the loop bodies together to avoid extra traversals, and deforesting intermediate data structures.

In Haskell though, we tend to build up loops by composing separate combinators, as if building up a pipeline in the shell. For list operations, these then get fused away, (using build/fold fusion), meaning only a single traversal need be done.

Similarly, for array programming, we can use stream fusion to eliminate traversals over arrays, making array loops written in a compositional style as efficient as manually fused loops — assuming all the mystical laziness doesn’t get in the way.

The data parallel and uvector array libraries are implemented using stream fusion for strict arrays, so we can try this out.

Here’s an encoding of the take this sum of shifts of sums of arrays, compositionally:

    import Data.Array.Vector
    import Data.Bits

    main = print . sumU
                 . mapU ((2::Int) `shift`)
                 . zipWith3U plus3
                        (replicateU 100000000 (2::Int))
                        (replicateU 100000001 (3::Int)) $
                        (replicateU 100000002 (5::Int))
        where
            plus3 x y z = x + y + z

The nice aspect of this is the reduction in cognitive load: you build the loop up piecewise, and let the compiler do the (aggressive) thinking for you. As an exercise, try writing a C program that allocates these 3 arrays, and then uses a single loop to initialise them, add their elements, bit shift them, and sum the result.

And how well does it work? Our program above undergoes aggressive inlining (as it’s notionally pure and lazy) by GHC, and as we’ve used combinators, rather than manual loops, GHC can see how to easily combine them, yielding this single loop at the end:

    fold :: Int# -> Int# -> Int# -> Int# -> Int#
    fold x y z acc =
        case z of _ ->
            case y of _ ->
                case x of _ ->

                    fold (x +# 1)
                         (y +# 1)
                         (z +# 1)
                         (acc +# 2048)

                  100000000 -> acc
              100000001 -> acc
          100000002 -> acc

Look at that constant folding! The compiler saw the arrays were initialised with all elements to 2, 3 and 5, which were then added, and shifted. The compiler spotted this and realised we’re just describing a loop that adds 2 << 10 in a loop.

The other thing it noticed was that the values where always going to be demanded. There’s no laziness in this program. So GHC removed all the lazy Int allocations and replaced them with unboxed Int# types — raw machine integers living in registers.

There are no `let’s left in this code — so *nothing* is going to end up on the heap, and the garbage collector isn’t going to get a chance to run!

With strictness analysis and aggressive loop fusion, we end up with the code we’d write in a low level, non-GC’d language like C — we just got to write it in a very declarative, descriptive way.

This tight loop gets passed off to the code generator, yielding the final assembly,

    fold:
          cmpq        $100000000, %r8
          je  .L11
        .L5:
          cmpq        $100000001, %rdi
          je  .L11
        .L8:
          cmpq        $100000002, %rsi
          je  .L11
        .L10:
          leaq        1(%rsi), %rsi
          leaq        1(%rdi), %rdi
          leaq        1(%r8), %r8
          addq        $2048, %r9
          jmp fold

        .L11:
          movq        %r9, %rbx
          jmp *(%rbp)

Which is looks pretty good. Just a tight loop, no allocation, everything in registers.

    $ time ./zipwith3
    204800000000
    ./zipwith3  0.20s user 0.00s system 95% cpu 0.209 total

It fairly zips along.

This kind of very aggressive optimisation only makes sense if you’ve already statically typed your program, inferred that the loops are referentially transparent, and decided what algorithm each loop is actually doing. And GHC is doing this by default — which is pretty sweet.

Strictness analysis and loop fusion should make the new array libraries a competitive solution.

No more exceptions: debugging Haskell code with GHCi

One of the true gems of the new GHC 6.8.1 release (besides the ubiquitous 20% speedup in your compiled code) is that we finally have a robust solution to tracking down unknown exceptions in Haskell code! Using the shiny new GHCi debugger, courtesy of Simon Marlow, José Iborra, Bernie Pope and Andy Gill it is possible to obtain a perfect backtrace to source location, and display the source, of your failing code — the kind you might find if you’re a bit sloppy when reasoning about totality:

    $ ./a.out
    *** Exception: Prelude.head: empty list

Pretty useless error message. Tracking this down has been a hard problem using GHC in the past. You had to either:

  • Convince the profiler to give you a stack trace (i.e. the little known +RTS -xc option) (buried in the “interested souls” section of the GHC manual
  • Replace calls to head with calls that will fail with a better error message (maybe using a library such as loch)
  • Get your code into Haskell 98 shape, and use Catch totality checker to statically analyse your code for partial functions, or maybe use Hat to trace the code.

All three solutions are quite obscure, the first one works but is tedious, the second is used by no one, and the third only works in theory. Yet the problem of partial functions and unknown exceptions persists.

Until now.

Imagine you develop a lovely Haskell program called HsColour, and deep down inside is a call to ‘head’:

    head       :: [a] -> a
    head (x:_) = x
    head []    = error "Prelude.head: empty list"

that relies on some invariant in your code you forgot to write a QuickCheck property for. As long as the invariant holds, the call to head is safe. When run, your program happily marks up Haskell code as html:

    $ runhaskell HsColour.hs -css < source.hs
    -- The stream data type
    data Stream a = forall s. Unlifted s =>
                              Stream !(s -> Step a s)  -- ^ a stepper function
                                     !s                -- ^ an initial state

    map :: (a -> b) -> Stream a -> Stream b
    map f (Stream next0 s0) = Stream next s0
      where
        next !s = case next0 s of
            Done       -> Done
            Skip    s' -> Skip        s'
            Yield x s' -> Yield (f x) s'

Lovely stuff. Years go by, and someone joins your project, and starts refactoring. In the process, they break this crucial invariant. Now your programs says:

    $ runhaskell HsColour.hs -css < source.hs
    HsColour.hs: Prelude.head: empty list

Where do you begin to debug this? Not much information. Now, some people, at this point would mumble something about monads being too hard, and rewrite the project in Python, but there’s no need. Fire up ghci 6.8.1, and set a breakpoint on exceptions, and set up some command line args for your program:

    $ ghci HsColour.hs
    *Main> :set -fbreak-on-exception
    *Main> :set args "source.hs"

Now run your program with tracing on, and GHCi will stop your program at the call to error:

    *Main> :trace main
    Stopped at (exception thrown)

Ok, good. We had an exception… Let’s just back up a bit and see where we are. Watch now as we travel backwards in time through our program, using the (bizarre, I know) “:back” command:

    [(exception thrown)] *Main> :back
    Logged breakpoint at Language/Haskell/HsColour/Classify.hs:(19,0)-(31,46)
    _result :: [String]

This tells us that immediately before hitting error, we were in the file Language/Haskell/HsColour/Classify.hs, at line 19. We’re in pretty good shape now. Let’s see where exactly:

    [-1: Language/Haskell/HsColour/Classify.hs:(19,0)-(31,46)] *Main> :list
    18  chunk :: String -> [String]
        vv
    19  chunk []    = head []
    20  chunk ('\r':s) = chunk s -- get rid of DOS newline stuff
    21  chunk ('\n':s) = "\n": chunk s
                                       ^^

Ok, that’s awesome! Our bogus call to head, deep down in the program is somewhere on those two lines: can you spot it? :) Fixing that line, and we’re back in business. The days of head scratching at head [] failures are over. Now go and write some QuickCheck properties to ensure this never happens again.

The GHCi debugger can do a lot more than just travel back in time and find your bugs for you, and it is just one point in the world of declarative debugging. For more reading have a look at:

Haskell has a debugger, it’s there if you need it.