
Originally Posted by
willvarfar
As large dictionaries are typically random access, and random access is precisely worst case for harddisks (SSDs of course its not), then once I overflow my 3GB of real RAM the performance of compressors drops like a stone. In fact, like a very heavy stone in very thin air from a great height. Hmm, a very sharp needle shaped stone in fact. Now I'm beginning to wonder what I remember from school about terminal velocities and things. Its always that way when you try to draw comparisons between computing and the real world. But swap is that bad, believe me.
So I'd recommend avoiding disk for random access structures e.g. dictionaries.
It might be that there are some datastructures that are more disk-friendly than the hashmaps and trees that I've used, just as there are more cache-conscious versions of these structures; I am not aware of anyone trying to utilise them - in fact, putting disk aside for a bit, cache conscious datastructure decisions might help people chasing speed.
But disk? Naw, avoid it.
The current LTCB king Durcilia used 13GB of real RAM in to mount the summit. I've seen machines with considerably more real RAM than that - 96GB and beyond is available in average single system image boxes these days, and you can always back it up with some SSD instead of disk.