Haskell for Anything

Jan 21, 2010 at 11:18PM
Caleb Doxsey

Lately I've been playing around with Haskell a lot... particularly trying to get it to do things it's not ideally suited for. For example high performance binary deserialization.

Here's the problem I've been running into:

No Dictionary

As far as I can tell there's nothing like C#'s dictionary class available in Haskell. There are types like Data.Map, and there are some Trie types, but I've run into serious performance and memory issues.

Say you get data from a network stream as a lazy byte string... so far so good we have a very efficient lazy list of byte arrays. The format is something like this on the wire: Key ValueType Value where the Key is a null terminated string, the value type is one of several various types, and the values are things like ints, longs, bools,... and also other ByteStrings.

I wanted to build an interface to this set of data in two ways: First, and the most straightforward would be to return some sort of mapping object which closely resembles what the actual data is. In C# this would be a Dictionary (which is a Hash Table), Data.Map is a binary tree and there is a more efficient Trie class specifically for ByteString -> Value collections.

So I wrote some functions to build that object and all seemed to go ok. Except building these things seemed kind of slow. For example making a collection of 3,000 items took about a second... which I felt was unacceptable.

My initial thought was that perhaps a Trie is sometimes overkill for the kind of retrieval you may want to do. Maybe you are interested in only one of the key value pairs.

So I refactored the code to hand back a List of ByteString, Value pairs. The idea being that if you wanted the Trie you could build it off of these guys. Or you could just work directly with it (and use something like lookup). Sure finding something in a list is far less efficient then using a dictionary type, but you have to pay for that O(n) cost anyway (and much worse in fact to build the initial dictionary).

I thought this would be ideal, but I was quite wrong. The memory usage went nuts, sometimes as much as 1GB or more being used. This on about 50 3000 item collections. There's just no way its that much data, which meant there was a whole lot of orphaned objects, unnecessary memory allocations, maybe some unneeded boxing...

I've yet to solve this problem. I toyed with the ST Monad, and thought about using one of the various Vector types out there... That turned out to be really confusing since there are so many available.