The Haskell community has reached a bit of a milestone: there are now more than 2000 open source libraries for Haskell on Hackage! However, with this also comes a problem: how do you work out which library to use? (Without learning one Haskell library a day for the next 6 years?) Which ones are robust, and supported, and which ones aren’t? This isn’t a new problem in open source: the Perl community has faced it with CPAN for a decade or more. Now Haskell is in the same situation.
In fact, it’s kind of startling to look back: in 2006, there were only a handful of open source Haskell libraries for developers to use in their projects (just HDBC, zlib, libxml, Crypto…). Today, there are 2121 (more by the time you read this) libraries for Haskell, available as source on http://hackage.haskell.org (only a “cabal install” away), and often 100s of Haskell libraries in binary form on your favorite distro. You can even follow the package flood on Twitter.
Here’s what the growth in available Haskell libraries over the last 4 years looks like:
We passed 1000 libraries in early 2009, and doubled that a year later.
So this is great for the Haskell dev community. In some areas, like database interfacing, we’ve gone from a single option (HDBC) to a full range, including new stuff like, uh, well, Cassandra, CouchDB, Amazon SimpleDB, MongoDB, Tokyo Cabinet, and pure Haskell libs like TCache, or safe, high level libs like HaskellDB.
We’re rapidly running into CPAN-like problems of just managing the weight of so much Haskell code. How do you know which one to use? Should you use, say, Galois’ xml library, or Lemmih’s xml library? . Someone recently said “It is bewildering trying to figure out which ones are actively supported and which ones are zombie projects that stopped working years ago.”
So what are we doing about it?
There are four efforts underway to help Haskellers manage this work, and you can contribute!
- The Haskell Platform – a easy, one-click installer for the core system, including a blessed set of libraries, with a commercially friendly BSD license (like most of Hackage). At the moment, this means just these libraries, and we need developers to propose new additions to the blessed set.
- Google Summer of Code: Hackage 2.0 – we have Matt Gruen working this summer to finish the implementation of Hackage 2.0 – an improved Hackage that will allow for many new features to help sort out the wheat from the chaff in Haskell packages: build reports, wiki commenting, and social voting.
- Google Summer of Code: Cabal Test: we also have Thomas Tuegel working on “cabal test” — to allow automated testing and reporting of cabalized (and thus, all of Hackage). This is the second plank in the solidifying the quality assurance story for Hackage.
- Regular regression testing of Hackage: having all that code is great – it means we can do regular regression testing of compilers and tools on a multi-million line Haskell codebase. For the 6.10 GHC release, for example, we were able to narrow breakages of all known open source Haskell to just 5% of Hackage, and post detailed instructions on how to address those changes. This gives us significant stability.
So, the HP to make it simpler to install Haskell and get started with a good set of libraries (several hundred thousand downloads of the installers so far!), a better Hackage to help us rate and rank packages, regression testing against Hackage to keep things stable, and in particular, test reporting support to make it easier to do quality assurance estimates.
How would you like to see changed in the Haskell library world? What libraries do you love? What do you hate? How do you find the packages you need?
And you don’t have to wait for others to solve this. Write tools to pick the best libs. Do your own quality ratings and share them. Write reviews of packages, and compare them, then let everyone know.
This is open source – it is up to you to help make things happen.
24 thoughts on “There are a hell of a lot of Haskell libraries now. What are we going to do about it?”
With the Platform, I was under the impression that not only would it come with a set of default libraries, but would also have a sub-section of “recommended” libraries on the website where a particular xml library for example is recommended by the community as being the best one available for most purposes. Is this still the plan?
There’s no plans for a separate “recommended” set. The intent has always been that the HP will grow the set of default libraries to included XML, database bindings and guis, etc.
So does this mean that the HP will grow to need more C bindings, etc. to be able to install it?
1) To remove dead packages it’s important to allow authors to deprecate packages easily (part of Hackage 2.0).
2) Many of the packages are true duplicates or should be collapsed into one package*. This is just a matter of time and encouraging some activism in the community, but could yield a sizeable reduction in the number of packages while increasing the overall value of Hackage.
* Some examples:
SHA, pureMD5, SHA2, pbkdf2, aes, simpleAES, twofish, tigerhash, etc ==> Crypto
prettyclass, pretty ==> pretty
nano-hmac, nano-md5, openssl-createkey, hopenssl, hsopenssl ==> openssl
I would like to crowdsource data gathering on library quality. Some sort of reddit-like voting mechanism and the ability to comment on a libraries hackage page could eventually be a huge leap forward in terms of making it simpler for developers to choose which libraries to use.
I like the smoke-testing features of CPAN-Testers.
They just finished a toolchain overhaul there that you might want to check out.
CPAN is the gold standard for this sort of stuff AFAIK.
TAP + embedded smoke-testing in the toolchain = win.
Speaking as someone who’s put a non-insignificant amount of thought into a “Everything and the Kitchen Sink” install base for Perl. You may wan to think about a way to migrate packages out of The Haskell Platform as well. As time moves on “best practices” change, new ideas and methods of work replace older ones. An example from the Perl world is the ubiquitous CGI.pm which was perfect for the period in which it was written, but is now something many would rather moved out of core into CPAN for it’s retirement.
: I maintain, for some sense of the word, Task::Kensho, which is an attempt to cull an “industrial standard Perl” out of CPAN, external to what ships in our Core.
A couple of things come to mind for me.
It has been stated many times, but we really need a better Windows story for some of the key libraries which are currently either Unix only, or inordinately difficult to install on Windows. TakeoffGW looks like it could be a real win for us.
Another is that we probably (whether through crowdsourcing, ratings or whatever) do need to focus on getting at least one really ‘commercial grade’ library in each of a few core areas.
For me, commercial grade means:
– Installs easily and works out of the box on at least Linux, Windows and Mac
– Is actively maintained with bugs being tracked and fixed. In most cases this really requires a team of people to get the work done (particularly for bindings to significant external libraries)
– Provides at least 90% of the functionality that any developer experienced in such a library on some other language platform would expect.
– Provides extensive documentation: good API documents, a ‘beginners guide’ and ideally a blog or wiki aimed at helping those who have passed the ‘beginner’ stage, but haven’t yet reached enlightenment.
– Is licensed in a way which provides unambigouously for commercial closed source development. This is likely mainly a problem for bindings to external libraries as it makes no sense for e.g. wxHaskell or Gtk2HS to be licensed differently from the libraries they bind (since in practice the most restrictive license in a software is the one which applies to anything based off of it).
– Is licensed in a way which is unambiguously compatible with the GPL (at least V2), to enable pure GPL open source development.
As an example which perhaps reflects poorly on my own project: wxHaskell is essentially a one-man project most of the time, and this makes it almost impossible to meet that ‘commercial quality’ bar, despite the amazing help provided from time to time by others.
I suspect that the same may be at least somewhat true of all of the Haskell GUI bindings.
Another large category of packages are extensions to “base”. predicates, higherorder, bimap, checked, digits, data-ordlist, list-extras, split, … These could probably be merged into one package.
I guess Thomas DuBuisson is right: package maintainers should talk more to each other and try to patch up existing packages with new functionality instead of always releasing new packages (there are 30+ packages extending bytestring).
Maybe a mailing list of something where people can send their package proposals before they upload them so people can spot ways to merge them into existing ones or stop duplicates in functionality?
If there is a genuine overlap between libraries, choose the one with the best documentation. ** It’s that simple ** There’s no need to turn Hackage into Reddit.
(It would be nice if Hackage gave authors a free-text field on the package entry page so they could point to documentation outside the package – especially if the documentation is advancing but the package is stable).
Immature projects rarely get documented as the priority is on the design and creation, there is small value to an author in documenting (and then re-documenting) code in flux. So using the quality of documentation doubles as a good stability metric as well.
There are some secondary metrics I use to choose: prefer pure Haskell libs to FFI bindings; favour the incumbent – if someone has written an alternative to Parsec but hasn’t explained why I might choose it over Parsec I’m always going to chose Parsec.
Also, it’s easy to figure out “which ones are actively supported and which ones are zombie projects” – just look at Versions the top line of the table on each packages Hackage entry page. Maybe a sparkline could re-iterate the activity if people were bothered. This would be useful for a metric where you favour stable projects rather than ones that are being tweaked all the time.
By the way, what’s the procedure for authors to mark their own projects as zombies?
I can’t wait to see those 2 Google Summer of Code projects completed and released! I think both will prove to be very valuable additions.
A dependency graph of the libraries with release dates would help you to see which packages are no longer being used. For example, if all the packages that depend on a particular XML library are themselves rather old and newer packages tend to use a different one, that’s valuable info right there.
Us Perl-ers welcome you to those that have over 1,000 libraries. We have over 20,000 in order to solve your problem Enlightened perl created something called Task::Kensho
If people using Windows want better Haskell support, then they should help remedy the situation.
From what I know, most of the people involved with the Platform, etc. don’t use Windows, so how are they meant to fix it for you?
I assume you mean pick the better-documented one if all other aspects are the same?
Release dates only tells you when it was released, not how useful it may still be. What was the release date of “base”. :-)
A handful of things could be helpful here:
1) The ability for the author to add comments to existing packages. This could be done by releasing with a new README, but something that’s more easily visible to Hackage would be good. The author can then add comments like “intended to be an alternative to …” and “deprecated in favor of …”.
2) Comments/feedback. Of course, this would have to be moderated. Perhaps a wiki organized around category and subcategory.
3) Auto-generated “connectedness” information. Hackage already shows dependencies, but inverting this and extending it would be useful:
Foo.Blah is used by 2: Bar.Door (4/18) Cow.Moo (1/3)
Notation here indicates that Bar.Door is used directly by 4 packages, and that the subgraph of Bar.Door contains 18 packages. This indicates that Bar.Door is pretty popular itself, so it’s use of Foo.Blah is a good recommendation. Conversely, Cow.Moo is only used by one other package, and the subgraph only contains 3 elements, so it’s a weaker recommendataion of Foo.Blah.
@ Ivan Lazar Miljenovic
It’s Open Source – no-one *has* to do anything.
However, solid cross-platform support in key libraries is undoubtedly going to help to promote Haskell – a key advantage of Python is that pretty much all of its key libraries work essentially identically across all supported platforms (I understood that we are trying to fail to avoid success here :-)
The download stats say that something like 75% of Haskell Platform downloads are for Windows, so clearly the demand is there for solid support.
On wxHaskell, I try hard to ensure that everything works smoothly for Windows, Linux and Mac users because I want people to use the library. For my own purposes, I don’t need to care about Linux at all, and I use Mac only rarely. Many other library owners have the same attitude (e.g. the Gtk2HS team – and Gtk is one of the trickier libraries to get running on Windows, so it’s certainly possible).
In any case, I think that a large part of a better Windows story will be to have a standardized place where Cabal can look for libraries and headers, and use an external package manager like TakeoffGW to provide it.
At the very least, it would be helpful to have a ‘supported-platforms’ stanza in Cabal so that I could cabal list only those libraries which are supported on Windows, something like:
cabal list –supported-platforms=”msw”
Rather than discover that the library doesn’t work part way through compilation.
I think the problem here is that packages that are likely to be tricky for cross-platform scenarios (i.e. do a lot of specialised IO stuff) don’t always have a large enough team with people on a range of platforms.
For example, with my graphviz library, I have no idea whether it works on Windows or Mac; however, as long as dot, neato, etc. are in the $PATH (i.e. so that runInteractiveProcess can find it) I see no reason why it shouldn’t. There might be a better, more cross-platform approach to dealing with external commands, but I don’t know of it.
My previous comments weren’t aimed explicitly at you; it’s just that every now and then I see complaints from people on #haskell, haskell-cafe, etc. that such-and-such library, tool, etc. doesn’t work properly on Windows but rarely do they ever seem to indicate that they are willing to put actions to words and help _make_ it work.
I have a live clone of Hackage that shows reverse dependencies:
It is updated daily (cron job). I think it comes pretty close to what Kevin Quick proposed.
a wiki? with introduction to each packages philosophy, with example code, use cases, comments? with links to community pages? and books? or per package use counts crawled and computed with a refined ohloh.net (clone)?
i’d say it’s not solely a technical problem.
I miss two relatively simple things on Hackage:
1) ability to deprecate (and hide) a project
2) visible indication of the license in the pkg-list.html, better color-coded
I would like to see download of the generated documentation for each library as some “zip” for offline study.
anyway, it is already pleasure to use hackage and cabal.
How do new folks to Haskell know which libraries are *good* or *recommended*?
I agree it’s up to the Windows users (like me) to fix some of these issues, but the root is farther down than the haskell developers themselves. For instance, hopenssl depends on openssl headers that don’t exist in Windows by default.
As long as we take shortcuts through the GPL C bindings the situation will never improve for Windows users.