The 500 Packages: Haskell, Distros and Maintainership

Monday was something of a landmark for the Haskell community, as the 500th Haskell package was added to the Arch Linux distribution. You can see all the Haskell packages here (excepting some packages in the core system, like xmonad and ghc). All these packages are built from source hosted on http://hackage.haskell.org. Similar efforts to comprehensively package Haskell natively are underway on Debian/Ubuntu, Gentoo and Fedora.

And the benefit of automation is stark. Instead of a developer maintaining, say, 5 or even 50 packages, with automation and a declarative build specification such as Cabal provides, one person can construct and maintain 500 packages with relative ease. That’s an order of magnitude improvement in productivity!

Hackage is kicking along. In total, 257 Haskell developers have uploaded 723 Haskell applications, tools and libraries to Hackage since January 2007 when Hackage went live. That’s 36 new packages a month on average. In July 2008, we had 160 packages update, and there’s been 120 already in August.

Some choice statistics about Hackage:

  • 723 unique packages
  • Over a million lines of Haskell code
  • 2600 pages of API documentation

(And to get a sense of what an achievement it is to have 500 Haskell package in Arch Linux native format, Arch has, in contrast, only 6 Erlang packages, 21 OCaml packages, 28 Lisp packages, 101 Ruby packages, and 495 Python packages).

Open source works.

In this post, I want to give a bit of an overview of the variety of Haskell software now available in the native package system, how the process of native packaging was automated, and future directions for the Haskell platform.

So what do we get to play with

Here’s ten cool Haskell packages, in no particular order, that I like (and clearly I’m biased towards developer-oriented stuff), that you might not have known about, all now available in the Arch package system. Later, we’ll look at how all this was produced, how we connect Cabal and Hackage to a distro package system.

0. WWW: feed-cli

feed-cli is a great little command line tool, written in Haskell, for generating, or appending to, RSS feeds. I use it to generate RSS feeds of cron job events (such as package uploads). It uses the comprehensive RSS and Atom feed generation and parsing library.

1. Development: ghc-core

ghc-core, a pager-like program for displaying GHC Haskell intermediate structures, and generated assembly, in a human readable way. If you’re working on low level numerics libraries, or high performance code, ghc-core makes it easy to get a sense for the generated Haskell code. It uses the PCRE regex bindings and hscolour, a syntax colouriser for Haskell, to generate nice output.

2. Science: haskell-blas, haskell-hmatrix and haskell-fftw

Numerics libraries have long been a missing link for Haskell, and in the past few months we’ve seen several emerge, in particular haskell-blas, a binding to BLAS and LAPACK, and haskell-matrix, efficient bindings to the GSL (Fortran!) code. There’s also a haskell-fftw binding now. And Cabal takes care of all the tedium of linking against C and Fortran, and you can just use the nice high level interface.

3. Data structures: haskell-bloomfilter

Bloom filters are unusual data structures. They’re set-like, and are highly efficient in their use of space, and they only support two operations: insert and membership querying. And unlike normal data structures, bloomfilters can give incorrect answers! However, they have a low rate of false positives for membership, which is tolerable for some applications (say, traffic shaping). The haskell-bloomfilter package gives us fast and efficient bloomfilters for Haskell, with both pure and impure interfaces.

4. Types: haskell-dimensional

No more mixing up yards and metres, or cubic centimetres and millilitres, the dimensional library encodes the standard units of measurement in the Haskell type system, so that it becomes a compile time error to mix up units. The library encourages best practices for unit usage, and shows how an entire class of bugs can be eliminated via an expressive type system.

5. Network: haskell-download

For a long time there was no convenient way to use arbitrary network resources from Haskell. While Ruby had convenient openURI functions, the Haskell community had no such equivalent. haskell-download-curl and haskell-download answer that, providing a single function, openURI, for opening network resources, getting the result back as a bytestring. It also provides convenient wrappers for lazy downloading, or treating the content as XML, Atom or RSS, or unstructured HTML tags.

6. Graphics: haskell-googlechart

Everyone likes charts and graphs, right? Now there’s a Haskell interface to Google’s charts API, so you can construct those pretty graphs from Haskell, like this one, for the proportion of packages in each category:

7. Physics: haskell-hipmunk

2D physics engines are just plain awesome. A new Haskell interface, hipmunk, gives you high level, efficient access to the C chipmunk 2D physics engine. Hours of fun building engines, gears and levers, and watching the physics at work.

8. Languages: haskell-lua, haskell-perl5, haskell-language-c

Comprehensive support for interacting with other languages is another sign of a robust community. Haskell should play well with others, and there’s now, alongside the standard C api, new bindings to Lua and Perl, as well a the rather stunning Language.C, a library for parsing, analysis and generation of C with GCC extensions. (Want to hunt for bugs in the Linux kernel? This is how you’d do it). You can even generate Flash code, if you like. Or if you love assembly, script LLVM from Haskell, or just generate x86 code directly.

9. Games: hback

Finally, and I’m not sure why, but there’s been a bit of small game development in Haskell recently (possibly due to having good OpenGL tutorials?). Anyway, one of cooler games, in my opinion, is hback, a “dual N-back” memory training game, complete with audio and bells. A recent research paper claimed that following the puzzle protocol implemented in this game would improve your fluid intelligence. It’s fun, but hard.

And there are 490 other Haskell in Arch (and on Hackage!) to play with. So how did we get all these into the system?

Arch Linux + Haskell

The effort to modernise Haskell on Arch began in early June, with half a dozen Haskell/Arch people formed an IRC channel, a mailing list, and started a wiki page to plan and coordinate efforts. A key decision was made to automate as much as possible, after watching other distros struggle to keep up with the pace of change.

Several factors came into play, making it actually feasible to automatically package Haskell source:

  • Central package hosting on Hackage.
  • A single, build system, Cabal.
  • And the use of a declarative dependency specification.

Central hosting meant that all packages can be found from one place, making it easy to track lots of packages. Having a single shared build system means that tools can rely on a common API for bundling software (no need to teach the tool about lots of crazy build strategies). All that helps.

The most important factor by far though, is that Cabal declares dependency information in a purely declarative way. Dependencies are stated explicitly, and can be analysed statically. This is in stark contrast to tools like autoconf, which require initialisation on the target machine to determine the actual build dependencies.

All an automated package tool needs to do for Cabal is translate the names of Haskell and C dependencies into the names of native packages on the system, and then spit those results into the native package format. For Arch, we wrote cabal2arch to do just this.

cabal2arch

Given the url of a Haskell package’s .cabal file, cabal2arch spits out a ready to use Arch package for the same package, Like so:

    $ cabal2arch
    http://hackage.haskell.org/packages/archive/hmp3/1.5.2.1/hmp3.cabal
    Using /tmp/tmp.YyRxcmyxTX/hmp3.cabal
    Fetching http://hackage.haskell.org/packages/archive/hmp3/1.5.2.1/hmp3-1.5.2.1.tar.gz
    Created /tmp/hmp3.tar.gz

And that’s it. The result, hmp3.tar.gz, is a package we can then upload into the Arch Linux repository.

The input .cabal file contains the following relevant information:

executable hmp3
    build-depends:     unix,
                       zlib >= 0.4,
                       binary >= 0.4,
                       pcre-light >= 0.3,
                       mersenne-random >= 0.1
    if flag(small_base)
        build-depends: base >= 3,
                       bytestring >= 0.9,
                       containers,
                       array,
                       old-time,
                       directory,
                       process
    else
        build-depends: base < 3

    extra-libraries:     curses

Which cabal2arch analyses, translating the name of each Haskell package to its Arch equivalent, and looking up the correct names for the C dependencies, yielding a native package specification of the following form:

# Contributor: Arch Haskell Team
# Package generated by cabal2arch 0.3.8.2
pkgname=hmp3
pkgrel=1
pkgver=1.5.2.1
pkgdesc="An ncurses mp3 player written in Haskell"
url="http://hackage.haskell.org/cgi-bin/hackage-scripts/package/hmp3"
license=('GPL')
arch=('i686' 'x86_64')
makedepends=('ghc'
             'haskell-cabal'
             'haskell-binary>=0.4'
             'haskell-mersenne-random>=0.1'
             'haskell-pcre-light>=0.3'
             'haskell-zlib>=0.4'
             'ncurses')
depends=('gmp' 'ncurses')
options=('strip')
source=(http://hackage.haskell.org/packages/archive/hmp3/1.5.2.1/hmp3-1.5.2.1.tar.gz)
md5sums=('4f72ab118929a9137ae1339c740b4581')
build() {
    cd $startdir/src/hmp3-1.5.2.1
    runhaskell Setup configure --prefix=/usr || return 1
    runhaskell Setup build                   || return 1
    runhaskell Setup copy --destdir=$startdir/pkg || return 1
}

All the hard work is done for us, and the translation itself was pretty straightforward to write (a few hundred lines of Haskell, to download cabal files, parse them, resolve dependencies, construct a valid Arch package spec, write that to disk, and tar up the results into a bundle.

“The same thing we do every day, Pinky: try to take over the world”

Not really, but there are two clear steps forward from here. Automation tools for othe distributions (cabal-debian for example, will be crucial for that platform). The other big step is to standardise a set of all these packages, to give us a comprehensive, trusted, high quality base, for future applications. That is, Haskell: Batteries Included.

If we do this right, and the large Haskell community continues to work efficiently, scaling up the benefits of pure, polymorphic components to larger and larger collections of systems, who knows? An open source, purely functional, well-typed lambda for every child? :-)