Bryan and Johan have been working hard on replacing the GHC runtime’s concurrency mechanism based on select, with a new one based on epoll. This should improve scalability of Haskell server code in a number of ways — more connections sec, more concurrent connections, and so on.
You can read about their work in a series of posts:
- Making GHC’s I/O manager more scalable – Dec 17, 2009
- Progress on GHC’s I/O manager – Jan 11, 2010
And you can find the current code for haskell-event on github: http://github.com/tibbe/event
After reading this post, on how epoll-based user land callbacks can outperform general purpose preemptive threads for non-compute bound IO work, I wanted to see what improvements we can hope for from an epoll-based version in Haskell.
The full code, which shows how to set up simple socket event notification using the event lib, is on the Haskell wiki, and will be useful for getting a flavor of the event lib use.
So while a simple forkIO/accept version, doing Handle and String IO would peak out around 7k req/sec, the epoll based version reached over 15k req/sec.
So that’s useful: good epoll-based event code for Haskell. The next step will be to see the redesign of Control.Concurrent based on epoll being merged into GHC.
16 thoughts on “Playing with the new Haskell epoll event library”
It is interesting to know the hardware configuration for this numbers. Thanks!
This was on an out of the box Linux 2.6.31-ARCH x86_64 Thinkpad laptop, with 2x Intel Core2 Duo CPU @ 2.00GHz, and 4G of ram.
It took a while to figure out that forkIO currently uses select, so this is basically a select vs. epoll comparison with a large number of channels. select is of course O(n) in the number of channels (since it has to scan the return vector) while epoll (at least the userspace part) is constant-time.
http://www.kegel.com/c10k.html (a famous article, though now slightly dated) has a bunch of discussion about other approaches to this issue.
Anyway, cool stuff!!!
Hello Don, I’m having trouble with the epoll example on one of my machines (it works on one of my slower machines). It says:
epollCreate: unsupported operation (Function not
One thing that bothers me about bytestring though is that converting it is an O(n) operation. This means that even though you’ve hidden away the explicit conversion with “OverloadedStrings”, the operation is still there whenever you want to do something to the input.
Obviously in the hello examples we have here, the string is static, but it almost certain that any other server will need to manipulate the data. Is there any solution to that?
And when do you think epoll support will be available in GHC?
> the operation is still there whenever you want to do something to the input.
That’s not true if you’re using the network-bytestring package — which uses zero-copying to keep the input in bytestring form. For *literals*, the rewrite rule fires at compile time to do the conversion. So there should be no runtime overhead.. that’s the whole point of using bytestrings in the first place. To keep the data on and off the wire in unpacked form. And that’s precisely what my examples do.
So I don’t really understand your bytestring concern — you do 100% pure bytestring IO if you use it, and pay zero conversion costs, since all data is already in bytestring form (on disk, on the wire, and in the source)
For your first issue, looking at the code for epollCreate, it might fail if headers aren’t found when building the library (I’m assuming you’re on Linux).
You should try to get it running, because on my box, it blows away the Python epoll version.
Thanks for the reply Don.
About the bytestring concern, it mentions that to convert from a string it always takes O(n) operations. I assume this is because the structure of [char] is that of a linked list? So any time you need to operate on the data, it needs to be converted to and from a byte string costing 2*O(n) operations.
And about the epoll benchmarks, those were the performance levels I was expecting. However to be fair, you should note that both the Python and Erlang versions also receive data even though they don’t do anything with it, which is something the Haskell version doesn’t seem to do.
Overall I am interested in seeing epoll support come to GHC.
You’re on track, up until:
> So any time you need to operate on the data, it needs to be converted to and from a byte string costing 2*O(n) operations
Which isn’t true. You just operate on the data directly as a bytestring. The penalty applies only if you convert it to a [Char], which would be silly. Just leave the data in bytestring form, and enjoy the performance.
Where you able to update the benchmark examples?
I haven’t been able to fix the unsupported problem error. I am not very experienced in Haskell to be able to debug the problem. I want to wait until epoll is merged with GHC (which I hope is soon) before writing the server I have planned out.
> Just leave the data in bytestring form, and enjoy the performance.
I’m assuming the functions available for bytestring are as numerous and useful as they are for strings? What about unicode?
> functions available for bytestring are as numerous and useful as they are for strings?
Indeed. They provide the full list api, along with many other support libraries, such as utf8 encodings, parser combinators and so on. That’s why we use them for everything performance-oriented.
For unicode packed strings, I’d use the text library, http://hackage.haskell.org/package/text
Regarding epoll_create failing on your machine — could you take it offline, and send me the config.log from your event lib build? You’re not building the app with the -threaded flag are you? That won’t work.
@codexon, if you are having problems, you’ll have to give us some details. Please file a report with detailed info about your platform and distribution at http://github.com/tibbe/event/issues
@dons00, what are the Python performance numbers like on your box?
@bos I get just over 10k req/sec for the Python epoll version, and 15k req/sec for the Haskell one.
I’ve checked the configure logs both both machines, one where it works and one where it doesn’t. They are both about the same except the one that works is running Ubuntu 9.10 instead of 9.04.
It should probably be noted that compiling works just fine, the error happens at runtime. Does this still mean it may be a configuration problem?
Tried to reproduce this but got quite different numbers:
The T-ended things are compiled with -threaded.
bs2 is you bytestring-example with the forkIO removed as the epoll does not forkIO either.
Linux nar.taruti.net 2.6.32-ARCH #1 SMP PREEMPT Thu Jan 7 22:28:29 CET 2010 x86_64 Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz GenuineIntel GNU/Linux
The Glorious Glasgow Haskell Compilation System, version 6.12.1
str: Request rate: 11555.5 req/s (0.1 ms/req)
str: Request rate: 11437.5 req/s (0.1 ms/req)
str: Request rate: 11595.1 req/s (0.1 ms/req)
strT: Request rate: 8480.5 req/s (0.1 ms/req)
strT: Request rate: 8777.3 req/s (0.1 ms/req)
strT: Request rate: 8762.9 req/s (0.1 ms/req)
bs: Request rate: 17266.1 req/s (0.1 ms/req)
bs: Request rate: 17349.4 req/s (0.1 ms/req)
bs: Request rate: 17227.9 req/s (0.1 ms/req)
bsT: Request rate: 14416.9 req/s (0.1 ms/req)
bsT: Request rate: 13607.0 req/s (0.1 ms/req)
bsT: Request rate: 14386.1 req/s (0.1 ms/req)
epoll: Request rate: 16242.7 req/s (0.1 ms/req)
epoll: Request rate: 17910.9 req/s (0.1 ms/req)
epoll: Request rate: 16119.1 req/s (0.1 ms/req)
epollT: epollWait: interrupted (Interrupted system call) BROKEN!
bs2: Request rate: 18584.1 req/s (0.1 ms/req)
bs2: Request rate: 18580.1 req/s (0.1 ms/req)
bs2: Request rate: 18753.0 req/s (0.1 ms/req)
bs2T: Request rate: 17244.9 req/s (0.1 ms/req)
bs2T: Request rate: 16940.1 req/s (0.1 ms/req)
bs2T: Request rate: 16890.5 req/s (0.1 ms/req)
Don, you asked for applications other than servers that wanted such high concurrency. One example, is a network router, e.g. a corporate firewall that handles a few thousand client connections and routes them to the internet or various VPN’s depending on the endpoints. It might even implement the VPN so it would do IPSEC or SSL (de-)encapsulation. If it is handling a lot of VOIP traffic there could be as many as 60 packets per second per client (the frame rate of Speex is about 30 msec and it is full duplex, so 2 packets per frame). Of course we could even want to use something like that in a data center instead of a big Cisco box, so the amount of traffic increases again.
Kernel mode Haskell anyone? ;-)
Btw I wonder how the timings would chance if you eliminate the sendall (just accept the connection and close the socket immediately). I know we’re in a different world when 15k connections/sec sounds slow, but a lot of the time must be in the kernel rather than in GHC concurrency.