Code:
* osman has joined #compression
<osman> hi everyone!
<osman> Hi "the respective bot" and Shelwien :)
* PAQer has joined #compression
* PAQer has quit IRC (Quit: Page closed)
* toffer has joined #compression
* Lasse has joined #compression
<Lasse> heyo
<Lasse> quicklz author here :)
<Lasse> gonna bbl
* Lasse has quit IRC (Client closed connection)
* Lasse has joined #compression
<osman> see you all
* osman has quit IRC (Quit: Page closed)
<Shelwien> damn, and I was sleeping...
<Shelwien> Somebody should have asked what osman is doing now ;)
* toffer has quit IRC (Quit: Page closed)
<Shelwien> ...
<Shelwien> as to qpress btw
<Shelwien> do you have any specific experience as to optimizing file i/o on windows?
<Shelwien> Lasse?
<Shelwien> qpress seems to be a rather rare example of an open source compressor with async i/o
<Shelwien> but its very "general"
<Shelwien> and I think it might be possible to further improve the speed under windows
<Shelwien> couldn't find any informative posts on that though
<Lasse> yeah, have a deal of trial-and-error experience
<Lasse> how would you optimize it?
<Shelwien> well, windows has some special i/o modes
<Shelwien> overlapped i/o for example
<Shelwien> and actually its a very complicated task
<Shelwien> as there're different optimal solutions for different cases
<Shelwien> one example is that when you call fwrite()
<Shelwien> it executes some standard library wrapper
<Shelwien> then goes to kernel32.dll and executes one more wrapper
<Shelwien> then to ntdll.dll and one more wrapper there, with int 2E/syscall/sysenter in it
<Shelwien> then stack arguments get copied to ring0 stack
<Shelwien> and some more weird stuff happens, until
execution reaches the filesystem driver and then hdd
driver
<Shelwien> and that's only one direction - i/o via a custom driver might be really noticeable faster
<Shelwien> (of course not on hdd level though, as completely ignoring windows caches is bad too)
<Shelwien> ...
<Lasse> FILE_FLAG_OVERLAPPED? tried that one, it has
problems depending on security settings, such as on NT and
Vista Home, where Windows starts a lazy writer that mucks
up everything
<Lasse> i'm not gonna touch FILE_FLAG_OVERLAPPED again :)
<Shelwien> well, i'd try
<Shelwien> for an example, there's this thing called XBMC
<Lasse> "XBMC Media Center"?
<Shelwien> XBox Media Center
<Shelwien> yeah
<Shelwien> its all skin-driven
<Shelwien> and has a pretty large skin archive to load
<Lasse> but what about it? :)
<Shelwien> (with images compressed with LZO btw)
<Shelwien> and it uses overlapped i/o ;)
<Lasse> ah
<Lasse> ok
<Lasse> but xbox probably don't have these security settings (forgot their names)
<Shelwien> XBMC is ported to windows/linux/macos
<Shelwien> well, and another thing is memory-mapped files
<Shelwien> and then the thing i wanted to talk about next ;)
<Shelwien> specifically, some delay modelling for i/o
<Lasse> the problem is that FILE_FLAG_OVERLAPPED makes it
possible to close a file beyond EOF so that you can read
deleted data on disk, so system admins often prevent this
on Windows Server. And all the Home and Basic editions
have it prevented by default because it doesn't let the
user specify them
<Shelwien> %)
<Lasse> and when this is deactivated, then Windows starts
a lazy writer that writes 0 in parallel with your own disk
I/O and that fucks everything up
<Shelwien> well, yeah, perfection here certainly requires
writing handlers for too much different cases ;)
<Lasse> mem mapped files aren't really meant to be used for sequential I/O
<Shelwien> i thought about it
<Shelwien> still it has benefit of avoiding data copying
<Lasse> I havn't timed mem I/O though
<Lasse> data copying?
<Shelwien> at least in theory the file data is read directly where they are mapped
<Shelwien> (same like with overlapped i/o)
<Lasse> ah, you mean bypassing cache?
<Lasse> yeah
<Shelwien> i think there's not only cache
<Shelwien> or how to say this...
<Shelwien> i mean, the data for a common readfile() are
read into some system buffer (or copied from cache)
<Lasse> but I'm already supporting FILE_FLAG_NO_BUFFERING which also prevents caching
<Shelwien> and then copied into your memory
<Lasse> yup
<Shelwien> and its even worse with writefile()
<Shelwien> as at least XP, it seems
<Shelwien> tries to really write the data and blocks it until all's done
<Lasse> WriteFile doesn't block if you issue just 8-64 kbyte at a time, or thereabout :)
<Lasse> yeah, they do
<Shelwien> anyway, what i wanted to say
<Shelwien> is that overlapped i/o and mapped files avoid some memcpys
<Lasse> FILE_FLAG_NO_BUFFERING does too which I'm using already :)
<Shelwien> it does anyway, i measured it, and it becomes slower when you write over a cluster size
<Shelwien> yeah, i noticed that ;)
<Shelwien> still, i think mapped i/o is worth trying too
<Lasse> maybe :)
<Shelwien> though for reading files like that
<Shelwien> you'd probably have to add some prefetching
<Shelwien> basically pages accesses to read the data before your (de)compressor would need it
<Lasse> yup
<Lasse> but optimizing disk I/O on Windows is a hell
<Shelwien> ...anyway, I'd like to have it all already
tested by somebody so that I could just read a nice report
and choose a method i'd like ;)
<Lasse> if the user has 1 physical disk, prefetch is hard because it interferes with writing
<Shelwien> yeah, that's why i mentioned i/o modelling ;)
<Lasse> i'd like that too ;D
* mathiasr has joined #compression
<mathiasr> Hello
<Shelwien> hi ;)
<Lasse> heyo mathiasr
<mathiasr> I've been reading encode.su forums for a while
<mathiasr> but I cannot register to post
<Shelwien> you can mail to "encode"
<Shelwien> apparently he has some problems with spammers and bots
<mathiasr> What is his complete email?
<Shelwien> http://encode.su/forum/sendmessage.php ?
<mathiasr> Does not work since I'm not registered !
<Shelwien> well, some encode@encode.su then i guess ;)
<Shelwien> ICQ# 224716672 also, but he's rarely online recently
<mathiasr> I tried that too does not work
<mathiasr> Could you send him a message asking him to take contact whith me mathiasr@free.fr
<Shelwien> ok
<mathiasr> Many thanks
* mathiasr has quit IRC (Quit: )
<Shelwien> i mean, i/o operation delays and read/write overlapping
<Shelwien> are series of data too, and can be predicted with a CM ;)
<Lasse> problem is, you have caches in disk, in controller
and in Windows. And, user can compress from a physical
disk to the same physical disk, or from one physical disk
to another. So many different cases to take care of. Also,
Windows kernel is indeterministic. WriteFile with 32 KB
blocks and you get 100 MB/s. Use 64 KB blocks and you get
50 MB/s. Use 4 MB blocks and get 100 MB/s again
<Shelwien> yeah, thats why i'm talking about i/o behavior modelling ;)
<Lasse> for example, see what I bumped into:
http://social.technet.microsoft.com/Forums/en-US/winserverfiles/thread/09dd046e-8127-4550-8e26-5fba7a5a0743
<Lasse> so I've pretty much given up optimizing for disk
I/O these days. Think disk I/O should be optimized by
Microsoft instead
<Lasse> unfortunatly that's not easy on Windows because it behaves indeterministic
<Lasse> I found that performance flaw on the forum, btw :)
<Shelwien> ah, thanks for that link, as its cases like that which I wanted to know about ;)
<Shelwien> as I might have ideas on how it should be done in theory
<Shelwien> but then if it doesn't work like that for many other people, it'd be kinda wasted time ;)
<Shelwien> one example is how many compressor writers
<Shelwien> like to read all the input into memory, then compress, and then write all the output
<Lasse> haha yeah
<Shelwien> thinking that its a fastest possible design, even if it kinda consumes too much memory ;)
<Lasse> THOR has pretty good I/O in most use cases (hardware cases I mentioned earlier)
<Lasse> impressive since he's apparently using Pascal
<Lasse> which probably has some naive I/O wrapper
<Shelwien> well, stuff like that is most weird
<Lasse> yeah
<Shelwien> like when replacing fread/fwrite with winapi
calls caused a slight slowdown in one of my experiments
<Lasse> fread/fwrite performs very fast using 4 MB blocks,
whereas ReadFile/WriteFile performs extremely slow (at
perhaps 60% of optimal speed) with the same 4 MB blocks
<Lasse> exactly
<Lasse> THOR is using 32 KB blocks
<Lasse> afair
<Shelwien> btw, did you try different compilers?
<Shelwien> like gcc vs intelc?
<Lasse> not for testing file I/O (tested for in-memory performance, though)
<Shelwien> and?
<Lasse> icc and vs was ~30% faster than gcc
<Shelwien> err... which gcc do you use then?
<Lasse> I used the very latest of all three, about 4-5 months ago
<Shelwien> gcc 4.3+ is usually comparable to intel
<Shelwien> which is significantly faster than vs
<Shelwien> also i've seen quite a few cases, exactly with LZ-like stuff
<Shelwien> where gcc-compiled program was faster
<Lasse> in the few tests where I've compared icc and vs,
they have been the *exact* same speed. So exact that I was
thinking if MS bought Intel's backend or something
<Shelwien> but that implies using PGO and recent gcc
<Shelwien> ah, they might have
<Shelwien> or, more like its probably somewhat of a joint development
<Shelwien> as intelc was initially a plugin for VS and all
<Lasse> yea