Very fast, scalable mutable maps and hashes for Haskell

I’m at HacPDX working on judy, a finite-map like interface to the classic judy arrays library, which provides fast and scalable mutable collection types for Haskell. While developing this library, where performance is the primary concern, I’m making heavy use of Bryan O’Sullivan’s new criterion benchmarking and statistics suite, announced recently at the Haskell Implementors Workshop.

Criterion is awesome. It is going to change how we design high performance libraries in Haskell. Read on to see why.

The Judy bindings for Haskell store Haskell keys and values in judy arrays, with resources tracked by a ForeignPtr on the Haskell heap. They provide a mutable, imperative collection type, similar to the old Data.HashTable (now slated for removal), but with an interface closer to Data.Map.

The key benefit over previous hashtable implementations for Haskell is that the judy bindings scale very well, as we shall see. Also, unlike, say, Data.IntMap, judy arrays are mutable, making them less useful for some applications. They are not a solution for all container problems. However, if you need large scale collections, the judy binding might be appropriate.

The library is under active development, and currently approximates a mutable IntMap structure, with more work planned to add optimized hashes, type-family based representation switching, and more. It has a straight forward interface:

new    :: JA a => IO (JudyL a)
insert :: JA a => Key -> a -> JudyL a -> IO ()
lookup :: JA a => Key      -> JudyL a -> IO (Maybe a)
delete ::         Key      -> JudyL a -> IO ()

Let’s look at the performance profile.

Insertion

Here we measure the cost for inserting 1k, 100k and 10M consecutive word-sized keys and values, over repeated runs. Criterion takes care of the measurement and rendering. First 1k values:

insert-1k-timings-600x200

Above, we see the timings for 100 runs of inserting one thousand values into an empty judy array. The fastest times were around 0.22ms (220 microseconds) to build the table of 1000 values. The slowest was around 230 microseconds. A tight cluster.

Criterion can then compute the probability density curve, showing a good clumping of times.

insert-1k-densities-600x200

Next we see that inserting one hundred thousand elements takes about 100x longer. That is, it scales linearly, as the docs suggest it should. We go from 1k elements in 0.2ms to 100k in 20ms.

insert-100k-timings-600x200 insert-100k-densities-600x200

Here we see a good clustering of values again. There’s very little variance. I can reliably insert 100k elements in 20ms.

Now the bigger test, 10 million elements. Again, the performance of the Judy bindings scales linearly with input size. 0.2ms for 1k, 20ms for 100k, ~2.2s for 10M, though there are two distinct performance bands at 10M elements.

insert-10M-timings-600x200

The density function shows the performance clusters quite clearly. A peak around 2.2s and a broader peak around 2.4s.

insert-10M-densities-600x200

Judy arrays scale very, very well. I was able to insert 100 million elements in 22s (10x slower again), using < 1G of memory. IntMap at equilvalent N exhausted memory.

Data.HashTable

The main problem with Data.HashTable is its reliance on the garbage collector to not get in the way. As the hash sizes grow, heap pressure becomes more of an issue, and the GC runs more often, swamping performance. However, for smaller workloads, and with decent default heap sizes Data.HashTable outperforms the Judy bindings:

insert-hash-100k-densities-600x200

While at larger sizes Data.HashTable performance degrades, taking on average 5.6s to insert 10M elements (using +RTS -H1000M -A500M).

insert-hash-10M-densities-600x200 With default heap settings Data.HashTable has very poor performance, as GC time dominates the cost.

insert-hash-10M-densities-600x200 That is, the Data.HashTable, at N=10M is 20x slower than Judy arrays, and with optimized heap settings, Data.HashTable is 2.5x slower than judy.

Data.IntMap and Data.Map

We can also measure the imperative lookup structures againts persistance, pure structures: Data.IntMap (a big-endian patricia tree) and Data.Map (a size-balanced tree).

Data.IntMap at N=10M, with default heap settings takes on average 7.3s to insert 10M elements, or about 3.3x slower than judy arrays (and slower than the optimized HashTable).

Data.Map at N=10M, with default heap settings, takes on average 24.8s to insert 10M elements, or about 11x slower than judy arrays.

Conclusions

At small scale, (under 1M elements), for simple atomic types being stored, there are a variety of container types available on Hackage which do the job well: IntMap is a good choice, as it is both flexible and fast. At scale, however, judy arrays seem to be the best thing we have at the moment, and make an excellent choice for associative arrays for large scale data. For very large N, it may be the only in-memory option.

You can get judy arrays for Haskell on Hackage now.

I’ll follow up soon with benchmarks for lookup and deletion, and how to generalize the interface.

8 thoughts on “Very fast, scalable mutable maps and hashes for Haskell”

augustss says:

September 26, 2009 at 8:27 pm

I wish the bindings were not in IO, but in some monad with a run. Is there something fundamental about Judy that forces IO? Like global variables?
dons00 says:

September 26, 2009 at 8:38 pm

There’s no reason for it to be in IO. I’m using the IO as a base layer, but the plan is to add a freeze(), and an ST mode.

How does that sound?
Max Bolingbroke says:

September 26, 2009 at 10:39 pm

I’m also here here to plead for a ST interface. I’ve actually ported Data.HashTable to ST before because I needed a halfway decent mutable collection type – having a properly tested mutable collection that actually performs well would be a godsend.
Jason says:

September 26, 2009 at 11:30 pm

Interesting. Hopefully you’ll be at HacPDX again tomorrow. I’ve been profiling a bit of darcs code that uses Data.Map but seems to generate a lot of garbage and take a long time. Perhaps with judy arrays we can see significant savings in both dimensions.
Ben Marsh says:

September 27, 2009 at 6:10 am

Can you please tell me what you used for doing the graphs in this post?
dons00 says:

September 27, 2009 at 8:22 am

Ben Marsh: the graphs are generated by criterion using the Haskell chart library: http://hackage.haskell.org/package/Chart
Pingback: Fast mutable collections for Haskell: more benchmarks « Control.Monad.Writer
Pingback: Top Posts « WordPress.com