Popular Haskell Packages: Q2 2010 report

Here is some data on downloads of Haskell libraries and apps on Hackage, for the first half of 2010.

The Hackage dependency graph

Hackage is the central repository of open source Haskell libraries and tools. Once they install the Haskell Platform, users get more libraries from Hackage, via “cabal install”.

Headlines

May was the most popular month for Hackage ever, breaking 150k downloads in a single month for the first time.

The 2000th Haskell package was released on April 16.

Total downloads on Hackage since 2006 have passed 2.4 million, with 780 thousand downloads in 2010 so far (double the total from the same time in 2009).

Totals

Total cabal packages: 2182. (+ 208 in Q2).

Total contributing developers: 575 (42 new developers in Q2)

90 day moving average: 12 packages per day uploaded.

Total downloads from Hackage 2007-present: 2.42 million

Average monthly downloads in 2010: 130 thousand.

Top of the Pops

The top 15 most popular libraries in the first half of 2010 were:

  1. HTTP
  2. parsec (+1)
  3. zlib (-1)
  4. binary (+1)
  5. network (+2)
  6. utf8-string (-2)
  7. Cabal (+1)
  8. QuickCheck (-2)
  9. mtl (+1)
  10. haskell-src-exts (-1)
  11. regex-base
  12. deepseq (+6)
  13. ghc-paths (+2)
  14. hslogger (+6)
  15. regex-posix (-2)

Top 15 most popular applications in the first half of 2010:

  1. cabal-install
  2. xmonad
  3. haddock (+1)
  4. cpphs (-1)
  5. happy
  6. darcs (+1)
  7. alex (+1)
  8. hscolour (-2)
  9. pandoc
  10. hlint
  11. leksah
  12. xmobar
  13. yi
  14. hint
  15. agda

Honorable Mentions

  • The Galois xml library was more popular in the first half of 2010 than HaXml, dethroning HaXml for the first time.
  • text has made it into the top 30 libraries
  • HDBC continues to be the most popular database library
  • vector has almost surpassed array in downloads (array is part of the Haskell Platform though)
  • wxHaskell is still more popular than gtk2hs on Hackage,  though gtk2hs has almost caught up.

You can read all the 2010 data for your favorite packages, and ranked by 2010 popularity.

Top Libraries by Category

  • Networking: HTTP, network, network-bytestring, curl
  • Parsing: parsec, polyparse, attoparsec
  • Compression: zlib, zip-archive
  • Binary formats: binary, cereal
  • Text formats: utf8-string, text, dataenc
  • Markup: pandoc, xhtml, tagsoup, html
  • JSON: json
  • Atom/RSS: feed
  • XML: xml, HaXml, hexpat
  • Web services:  happstack, snap
  • GUIs: wxHaskell, gtk2hs
  • Graphics: SDL, cairo, gd
  • Templates: HStringTemplate
  • Testing: QuickCheck, HUnit, testpack, hpc
  • Control: mtl, transformers, monads-fd
  • Languages: haskell-src-exts, haskell-src, HJavaScript
  • Regexes: regex-{base,posix,compat,tdfa}, pcre-light
  • Logging: hslogger
  • Generics: uniplate, syb-with-class, syb
  • 3D: OpenGL
  • Edit history: haskeline
  • Concurrency and parallelism: parallel, stm
  • Databases: HDBC
  • Arrays: array, vector, hmatrix
  • Hashing: pureMD5, SHA
  • Data structures: containers, fingertree, dlist
  • Science:  statistics
  • Benchmarking: criterion
  • Storage: hs3

Is there anything else you see in the data?

There are a hell of a lot of Haskell libraries now. What are we going to do about it?

The Haskell community has reached a bit of a milestone: there are now more than 2000 open source libraries for Haskell on Hackage! However, with this also comes a problem: how do you work out which library to use? (Without learning one Haskell library a day for the next 6 years?) Which ones are robust, and supported, and which ones aren’t? This isn’t a new problem in open source: the Perl community has faced it with CPAN for a decade or more. Now Haskell is in the same situation.

In fact, it’s kind of startling to look back: in 2006, there were only a handful of open source Haskell libraries for developers to use in their projects (just HDBC, zlib, libxml, Crypto…). Today, there are 2121 (more by the time you read this) libraries for Haskell, available as source on http://hackage.haskell.org (only a “cabal install” away), and often 100s of Haskell libraries in binary form on your favorite distro. You can even follow the package flood on Twitter.

Here’s what the growth in available Haskell libraries over the last 4 years looks like:

We passed 1000 libraries in early 2009, and doubled that a year later.

So this is great for the Haskell dev community. In some areas, like database interfacing, we’ve gone from a single option (HDBC) to a full range, including new stuff like, uh, well, Cassandra, CouchDB, Amazon SimpleDB, MongoDB, Tokyo Cabinet, and pure Haskell libs like TCache, or safe, high level libs like HaskellDB.

We’re rapidly running into CPAN-like problems of just managing the weight of so much Haskell code. How do you know which one to use? Should you use, say, Galois’ xml library, or Lemmih’s xml library? . Someone recently said “It is bewildering trying to figure out which ones are actively supported and which ones are zombie projects that stopped working years ago.”

So what are we doing about it?

There are four efforts underway to help Haskellers manage this work, and you can contribute!

  1. The Haskell Platform – a easy, one-click installer for the core system, including a blessed set of libraries, with a commercially friendly BSD license (like most of Hackage). At the moment, this means just these libraries, and we need developers to propose new additions to the blessed set.
  2. Google Summer of Code: Hackage 2.0 – we have Matt Gruen working this summer to finish the implementation of Hackage 2.0 – an improved Hackage that will allow for many new features to help sort out the wheat from the chaff in Haskell packages: build reports, wiki commenting, and social voting.
  3. Google Summer of Code: Cabal Test: we also have Thomas Tuegel working on “cabal test”  — to allow automated testing and reporting of cabalized (and thus, all of Hackage). This is the second plank in the solidifying the quality assurance story for Hackage.
  4. Regular regression testing of Hackage: having all that code is great – it means we can do regular regression testing of compilers and tools on a multi-million line Haskell codebase. For the 6.10 GHC release, for example, we were able to narrow breakages of all known open source Haskell to just 5% of Hackage, and post detailed instructions on how to address those changes. This gives us significant stability.

So, the HP to make it simpler to install Haskell and get started with a good set of libraries (several hundred thousand downloads of the installers so far!), a better Hackage to help us rate and rank packages, regression testing against Hackage to keep things stable, and in particular, test reporting support to make it easier to do quality assurance estimates.

How would you like to see changed in the Haskell library world? What libraries do you love? What do you hate? How do you find the packages you need?

And you don’t have to wait for others to solve this. Write tools to pick the best libs. Do your own quality ratings and share them. Write reviews of packages, and compare them, then let everyone know.

This is open source – it is up to you to help make things happen.