Heuristics for Blessing Software Packages

In the next branch of Haskell Platform we’ll be adding and removing packages from the specification for the first time. The Haskell Platform steering committee will make recommendations for additions and removals based on individual proposals to add and remove packages from the list.

It is hard to come up with “notability” criteria for why a package should be added or removed. There are many competiting reasons why people use the Haskell Platform, and what packages they need.

The goal though should be an almost fully automated criteria for determining when a package should be added, based on objective data. Then, combined with strategic and other concerns, packages will be added or, sometimes, removed.

Possible Criteria for Notability

A quick list of possible criteria by which to evaluate whether a package is “blessed”:

  • How popular is the package in Hackage downloads?
  • How many packages depend on it?
  • Do any applications of note depend on it?
  • Does it meet a stated end-user need?
  • Do similar systems include such a library (e.g. Python)?
  • Is it portable?
  • Does it add additional C libraries?
  • Does it follow the package versioning system?
  • Is the code of good quality?
  • Does it have a good development history?
  • Is it on hackage?
  • Does it provide haddock documentation?
  • Does it come with examples?
  • Does it have a test suite?
  • Does it have a maintainer?
  • Does it in turn require new Haskell dependencies?
  • Does it have a simple/configure-based Cabal build?
  • Does it conflict/compete with existing functionality?
  • Does it reuse existing types?
  • Does it follow the hierarchical naming conventions?
  • Is it -Wall clean?
  • Have declared correctness or performance statements?
  • Is it BSD licensed?
  • Is it thread-safe?

A Point System

One way of determining notability for a package would be to use a points system against an agreed-upon set of such criteria.

Does anyone know of similar examples, or would like to code up some programs to experiment with these ratings?

Distro Page Rank

Another source of raw data may well be a sort of “Page Rank” across unix distros for how often a package is used. On the Arch Linux distribution, we have 3 level support for Haskell. In the core system some Haskell apps and tools are provided in binary form. In the “community” binary repo there are yet more packages. Finally, in the user-contributed repository are around 1300 other packages (~90% of Hackage).

Does your distro have popularity statistics? Could you determine the top 100 Haskell package by vote?

Most Popular Packages in Arch Linux

Some users install packages with the ‘yaourt’ tool, and some of those users opt in to voting when they install. Here’s the top 100 packages sorted by votes in Arch Linux, with those that are in the Haskell Platform already, indicated:

HP Repository Category Library/Program Votes Synopsis Notes
Extra darcs Decentralized replacement for CVS with roots in quantum mechanics
Extra haskell-extensible-exceptions Extensible exceptions darcs dep
Extra haskell-hashed-storage Hashed file storage support code. darcs dep
Extra haskell-haskeline A command-line interface for user input, written in Haskell. darcs dep
Extra haskell-mmap Memory mapped files for POSIX and Windows darcs dep
Extra haskell-terminfo Haskell bindings to the terminfo library. darcs dep
Extra haskell-utf8-string Support for reading and writing UTF8 Strings darcs dep
YES Extra ghc The Glasgow Haskell Compiler
Extra hugs98 Haskell 98 interpreter
YES Extra happy The Parser Generator for Haskell
YES Community alex a lexical analyser generator for Haskell
Community gtk2hs A GTK+2 binding for Haskell
YES Community haskell-http A library for client-side HTTP cabal dep
YES Community cabal-install The command-line interface for Cabal and Hackage.
Community haskell-x11 A Haskell binding to the X11 graphics library. xmonad dep
Community haskell-x11-xft Bindings to the Xft, X Free Type interface library, and some Xrender parts xmonad dep
YES Community haskell-zlib Compression and decompression in the gzip and zlib formats cabal dep
Community pandoc Haskell library and program to convert one markup format to another
Community xmonad A lightweight X11 tiled window manager written in Haskell
Community xmonad-contrib Add-ons for xmonad xmonad dep
lib haskell-binary 98 Binary serialisation for Haskell values using lazy ByteStrings
YES lib haskell-opengl 56 A binding for the OpenGL graphics system
lib haskell-hslogger 1.0.7-2 51 Versatile logging framework
lib haskell-puremd5 48 MD5 implementations that should become part of a ByteString Crypto package.
YES lib haskell-syb 48 Scrap Your Boilerplate
YES devel haddock 2.4.2-1 46 A documentation-generation tool for Haskell libraries
devel haskell-xft 0.2-2 46 Bindings to the Xft library, and some Xrender parts
lib haskell-ghc-paths 45 Knowledge of GHC’s installation directories
lib haskell-haxml 1.13.3-1 42 Utilities for manipulating XML documents
lib haskell-missingh 1.1.0-1 40 Large utility library
lib haskell-testpack 1.0.2-1 36 Test Utililty Pack for HUnit and QuickCheck
YES lib haskell-time 36 A time library
lib haskell-uniplate 36 Uniform type generic traversals.
lib haskell-diff 0.1.2-1 35 O(ND) diff algorithm in haskell.
YES lib haskell-mtl 35 Monad transformer library
YES lib haskell-regex-base 0.93.1-1 33 Replaces/Enhances Text.Regex
YES lib haskell-parsec 3.0.0-1 32 Monadic parser combinators
devel cpphs 1.7-1 31 A liberalised re-implementation of cpp, the C pre-processor.
lib haskell-curl 1.3.5-1 31 Haskell binding to libcurl
lib haskell-hinotify 0.2-1 31 Haskell binding to INotify
lib haskell-transformers 31 Concrete monad transformers
lib haskell-unix-compat 31 Portable POSIX-compatibility layer.
devel cabal2arch 0.5.3-1 30 Create Arch Linux packages from Cabal packages
lib haskell-fingertree 30 Generic finger-tree structure, with example instances
lib haskell-haskell-src-exts 1.0.1-1 30 Manipulating Haskell source: abstract syntax, lexer, parser, and pretty-printer
YES lib haskell-glut 29 A binding for the OpenGL Utility Toolkit
lib haskell-pcre-light 0.3.1-2 29 A small, efficient and portable regex library for Perl 5 compatible regular expressions
lib haskell-rosezipper 0.1-1 29 Generic zipper implementation for Data.Tree
devel hscolour 1.13-1 28 Colourise Haskell code.
lib haskell-data-accessor 26 Utilities for accessing and manipulating fields of records
lib haskell-data-accessor-template 26 Utilities for accessing and manipulating fields of records
lib haskell-regex-tdfa 1.1.2-2 26 Replaces/Enhances Text.Regex
lib haskell-xml 1.3.4-1 26 A simple XML library.
lib haskell-hsh 2.0.2-1 25 Library to mix shell scripting with Haskell programs
lib haskell-split 0.1.1-1 25 Combinator library for splitting lists.
lib haskell-utility-ht 25 Various small helper functions for Lists, Maybes, Tuples, Functions
lib haskell-vty 25 A simple terminal access library
lib haskell-syb-with-class 0.5.1-1 24 Scrap Your Boilerplate With Class
YES lib haskell-cgi 3001.1.7.1-1 23 A library for writing CGI programs
YES lib haskell-fgl 23 Martin Erwig’s Functional Graph Library
devel derive 0.1.4-1 22 A program and library to derive instances for data types
lib haskell-monads-fd 21 Monad classes, using functional dependencies
devel haskell-pandoc 1.2.1-1 21 Conversion between markup formats
lib haskell-safe 0.2-1 21 Library for safe (pattern match free) functions
lib haskell-zip-archive 21 Library for creating and modifying zip archives.
YES lib haskell-bytestring 20 Fast, packed, strict and lazy byte arrays with a list interface
lib haskell-configfile 1.0.4-2 20 Configuration file reading & writing
lib haskell-data-accessor-monads-fd 0.2-1 20 Use Accessor to access state in monads-fd State monad class
lib haskell-hstringtemplate 0.6-1 20 StringTemplate implementation in Haskell.
lib haskell-pointedlist 0.3.5-1 20 A zipper-like comonad which works as a list, tracking a position.
YES lib haskell-quickcheck 20 Automatic testing of Haskell programs
lib haskell-convertible 1.0.5-1 19 Typeclasses and instances for converting between types
lib haskell-digest 19 Various cryptographic hashes for bytestrings; CRC32 and Adler32 for now.
lib haskell-hdbc 2.1.1-1 19 Haskell Database Connectivity
network twidge 0.99.3-1 19 Unix Command-Line Twitter and Identica Client
lib haskell-hspread 0.3.3-1 18 A client library for the spread toolkit
lib haskell-readline 17 An interface to the GNU readline library
lib haskell-strict 0.3.2-2 17 Strict data types and String IO.
lib haskell-happs-util 0.9.3-1 16 Web framework
devel hoogle 4.0.7-1 16 Haskell API Search
editors yi 0.6.1-1 16 The Haskell-Scriptable Editor
lib haskell-findbin 0.0.2-1 15 Locate directory of original program
lib haskell-glfw 0.3-1 15 A binding for GLFW, An OpenGL Framework
lib haskell-json 0.4.3-1 15 Support for serialising Haskell to and from JSON
YES lib haskell-network 15 Networking-related facilities
lib haskell-stream 0.3.2-1 15 A library for manipulating infinite lists.
lib haskell-tagsoup 0.6-2 15 Parsing and extracting information from (possibly malformed) HTML documents
YES lib haskell-editline 14 Bindings to the editline library (libedit).
lib haskell-sdl 0.5.5-1 14 Binding to libSDL
editors leksah 0.6.1-1 14 Haskell IDE written in Haskell
devel c2hs 0.16.0-1 13 C->Haskell FFI tool that gives some cross-language type safety
lib haskell-hsx 0.5.6-1 13 HSX (Haskell Source with XML) allows literal XML syntax to be used in Haskell source code.
devel hlint 1.6.4-1 13 Source code suggestions
lib haskell-crypto 4.2.0-1 12 Collects together existing Haskell cryptographic functions into a package
lib haskell-hdbc-sqlite3 12 Sqlite v3 driver for HDBC
lib haskell-highlighting-kate 0.2.4-1 12 Syntax highlighting
lib haskell-hjavascript 0.4.4-1 12 HJavaScript is an abstract syntax for a typed subset of JavaScript.
lib haskell-hjscript 0.4.4-1 12 HJScript is a Haskell EDSL for writing JavaScript programs.
devel mkcabal 0.4.2-2 12 Generate cabal files for a Haskell project
lib haskell-arrows 11 Arrow classes and transformers
lib haskell-filemanip 0.3.2-1 11 Expressive file and directory manipulation for Haskell.
lib haskell-happs-data 0.9.3-1 11 HAppS data manipulation libraries
lib haskell-happs-ixset 0.9.3-1 11
lib haskell-happs-state 0.9.3-1 11 Event-based distributed state.
lib haskell-harp 0.4-1 11 HaRP allows pattern-matching with regular expressions
lib haskell-lazysmallcheck 0.3-2 11 A library for demand-driven testing of Haskell programs
lib haskell-typecompose 0.6.4-1 11 Type composition classes & instances
lib haskell-dataenc 10 Data encoding library
lib haskell-happstack-util 0.3.2-1 10 Web framework
lib haskell-hxt 8.3.1-1 10 A collection of tools for processing XML with Haskell.
lib haskell-maybet 0.1.2-1 10 MaybeT monad transformer
lib haskell-platform 2009.2.0.2-1 10 The Haskell Platform
office pdf2line 0.0.1-1 10 Simple command-line utility to convert PDF into text
lib haskell-category-extras 0.53.5-1 9 Various modules and constructs inspired by category theory
lib haskell-colour 2.2.1-1 9 A model for human colour/color perception
lib haskell-datetime 0.1-1 9 Utilities to make Data.Time.* easier to use.
lib haskell-happs-server 0.9.3-1 9 Web related tools and services.

Now, one of the other constraints on the Haskell Platform is sustainable growth. We can’t add 1000 packages tomorrow and hope to maintain quality. Instead, something like 10-20% growth per release cycle seems plausible. This would mean adding 4 to 9 new packages.

If we were to judge only on download popularity, the 10 new packages would be:

Now, one of the other constraints on the Haskell Platform is sustainable growth. We can’t add 1000 packages tomorrow and hope to maintain quality. Instead, something like 10-20% growth per release cycle seems plausible. This would mean adding 4 to 9 new packages.

If we were to judge only on download popularity, our first 5 new packages would be:

Merely because one killer app, darcs, depends on them, and so they are widely built (they may also fail to satisfy many of the other critieria noted above).

If we ignore those packages popular for being dependencies, we get a different top 5:

Now we’re getting there. pandoc is both a library and a popular app, so we might treat it specially. gtk2hs is very popular, but not cabalised, so we might also set that aside, leaving (and I’ll ignore ghc-paths as it is used by ghc):

Which is starting to look like a plausible list. In turn however, you can find fault with all these packages in various dimensions (utf8-string may be obsoleted by Data.Text, haxml is LGPL licensed).

Coming up with an obvious list is non-trivial!

Finally, this is clearly only one very small data set, which should only have a small influence. If we step over an look at the Hackage download statistics, sorted by popularity, our top 5 new packages would be:

Popularity by Category

If instead we thought that having a comprehensive library set was the key goal, we may choose to include libraries via category, no matter how popular in the global list. This would yield, according to Hackage,

For example.

What Is The Decision Model?

So how do we decide what goes in? One model would be:

  1. Have people propose packages
  2. Sort them by category need
  3. Identify the top rank package in each category using a points system or page rank
  4. Add or remove packages based on this?

What do you think? What is a good way to decide when a package is sufficiently notable to add to the Haskell Platform?

What critieria would you use to determine when a package is blessed?

13 thoughts on “Heuristics for Blessing Software Packages

  1. A “best in category” seems simple. But how do we define which of the competing packages is best? Well, IMO, performance is very important, so each package would have to be run through a typical user scenario benchmark, or something. Next, functionality: compare the capabilities of both, including their future direction. Lastly, how well are they written? If it looks like messy code that is hard to maintain and extend then possibly performance and functionality don’t matter as much. I agree, it’s not a simple process. But it would be great (for Haskell in general) to have a highly elegant, fast, and rich platform to build upon without having to hunt down the good stuff.

  2. Remember that GHC statically links libraries, which removes many of the reasons to use an LGPL license.

  3. “Library longevity”, while not a common issue is worth considering. For example: SHA, pureMD5, RSA, etc should all disappear in the long run and be replaced with an improved Crypto library.

  4. Don, I don’t think you should have a point system or anything of the sort. While community input should weigh into the decisions, I think you and the other platform maintainers should feel authorized to use your own subjective judgement as appropriate.

    What I think matters the most about something in the platform is the quality of the code. I was flattered when you once encouraged me to contribute stuff to Hackage, but at the same time that told me that Hackage is populated by code written by relatively unwashed implementers like me–and therefore I don’t want to use Hackage code for anything serious (except for some specific packages that I know are well-regarded). Platform code should be:

    1) reviewed by experts, who wouldn’t be expected to take responsibility for it themselves, but who would at least make sure that the implementation is reasonably complete and reasonably sane; and

    2) well-documented, which not much of it is right now;

    3) One significant priority should be on having enough functionality in the platform to enable easy and well-performing implementation of anything that can be done in competing languages like Python or Java.

    Example of #3: bos’s Text module makes it possible for the first time to write competitive Unicode text-processing applications. So it should be treated as important.

    I agree with the idea of an eventual crypto module, but functions like sha are important enough to include immediately. The module contents can later be replaced with calls to the TBD crypto module so that the existing API will keep working.

    Also, a code signing capability should be added to cabal, similar to signatures on .jar files.

  5. There should also be some refactoring of stuff between packages. For example, the MaybeT module should go away, and MaybeT should be included in Data.Maybe or mtl or wherever the standard place for such things is.

  6. It may be that there simply don’t currently exist as many as 4 libraries that are high-quality enough to go into the Platform currently, too, depending how picky we want to be.

  7. If it’s decided that a package should go in *but* — i.e. but it needs better test coverage, or but it needs to clean up its dependencies, or etc, then it seems that the right thing would be to make it a point to encourage package maintainers to bring it up to par. Additionally, I suspect package maintainers might not *want* to be blessed at a given stage. E.g., haxml might want to wait on resolving the two-version issue, etc?

    Data.Binary seems to be a gimme, but on the other hand, something like HDBC is more complicated — even though it should be the standard, I think, it needs a backend to do anything, and the backends in turn need to have libraries installed to bind to, which raises the question of why install the frontend if it can’t do anything on its own?

  8. Also it would be particularly nice to bless hslogger, and a few other related packages designed for parsing config files and command line options, etc — the sort of basic machinery to help users get up and running quickly. I don’t know the space well, but I’m sure there’s a few people who can put the right thought and discussion into it.

  9. ..so it looks like any package will only be accepted if it has a maintainer who is willing to maintain it for Haskell Platform. Generally an existing maintainer would take on the rather light responsibilities (stick to PVP, make announcements when there are new releases, anything else?)… but if the maintainer thinks it’s not ready for HP, then it’s probably not, but theoretically someone could sort of fork it and maintain it for HP.

    maybe that’s a sketch of the relationship we need to have with maintainers?

    well, except some of the core packages are just maintained by “libraries@” which means they make decisions and various people do the busywork of applying patches when appropriate. And often fix them in regards to GHC changes also

  10. Ubuntu and Debian have package statistics for users who have installed the popularity-contest package (Ubuntu, at least, offers to install it when setting up). Statistics are available at http://popcon.debian.org and http://popcon.ubuntu.com.

    On these pages “inst” means number of installations, “vote” means number of installations *which have been used in the past n days* (I think n=30).

  11. What you folks are doing with the Haskell Platform is absolutely awesome.

    I fully agree with solrize that a set of “benevolent dictators for life” taking input from the community would trump any point system.

    Haskell is an absolutely awesome language but it strikes me that rounding out the full functionality for a “batteries included” set of “blessed” libraries to offer something competitive with Python and/or Java or .Net languages is what’s really needed (solrize’s point 3).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s