
Originally Posted by
Piotr Tarsa
I'm no expert and haven't tried memory mapped files for implementing allocations beyond RAM size. Create a proof of concept code and you'll know the truth :]
I appreciate that background information. I'm not sure when I might get around to giving it a full try. Here's a rough sketch of what that might look like (using primarily syscalls):
1. get a temporary filename: e.g., mktemp(3)
2. create and open the file: e.g., creat(2)
3. expand the file to the required size: e.g., ftruncate(2)
4. memory-map the file read/write: e.g., mmap(2)
5. optionally close the file descriptor: e.g., close(2)
6. if #4 resulted in a valid mapping, pass the pointer returned by mmap(2) as argument 2 to divsufsort()
7. before the process exits, delete the temp file: e.g., unlink(2)
That's how you might do it on a posix system, e.g. Linux or OSX. Those are obviously not native Win32 functions. My background is *nix and as far as Windows I'm pretty much tone-deaf.
As is fairly obvious, that's a good bit more complex than just calling malloc(3). As with anything complex, there will probably be more than one way to do it with accompanying nuances and trade-offs. For instance, tmpfile(3)/mkstemp(3)/&c. appear to take care of #1, #2, #5, and #7 in one call.
What annoys me about something like this is all the choices and trade-offs. If the process dies in some unexpected way, what's the risk of leaving behind a potentially-enormous and hard-to-find file as an artifact? Are races and security holes possible? It just opens up a whole new can of worms versus a simple malloc.
Edit:
I went ahead and wrote the code that I outlined in this post, because since I had already taken the time to think through it, there was little excuse not to (aside from being stubborn and quarrelsome
). It's up on Google Code in the devel-test-map-writable branch.
It worked on enwik8 and a test input of 100M of zeroes (as expected). When I ran it on enwik9, the result was as follows:
Code:
[devel-test-map-writable]~/git/mk-bwts$ ./mk_bwts enwik9
Attempting to map temporary file: Cannot allocate memory
Failed to allocate 4000000000 bytes for suffix array. Abort.
[devel-test-map-writable]~/git/mk-bwts$
As far as I can tell, that's the equivalent to malloc failing, so it appears that mmap'ing a disk file doesn't substantially change the picture. Of course, there are now more knobs and levers to tinker with, and the possibility exists that the experiment somehow failed to create the conditions it was intended to test. If anything changes, I'll post.