Ill check it out and post results later.Originally Posted by Bulat Ziganshin
Ill check it out and post results later.Originally Posted by Bulat Ziganshin
for my box, 256k still much faster:
<div class=""jscript""><pre>>timer tor-128k.exe -2 dll700.dll
Kernel Time = 6.289 = 00:00:06.289 = 9%
User Time = 32.767 = 00:00:32.767 = 50%
Process Time = 39.056 = 00:00:39.056 = 59%
Global Time = 65.204 = 00:01:05.204 = 100%
>timer tor-256k.exe -2 dll700.dll
Kernel Time = 5.848 = 00:00:05.848 = 9%
User Time = 32.526 = 00:00:32.526 = 52%
Process Time = 38.375 = 00:00:38.375 = 62%
Global Time = 61.559 = 00:01:01.559 = 100%</pre>[/QUOTE]
i think that this depends on HDD firmware
nor meI didnt receive a activation mail yet.
for me too, ive tested it also (actually, once ive wrote rather universal i/o library)Originally Posted by Christian
[QUOTE=Christian]<div class=""quoting"">Quoting: Christian</div>For HDDs simple IO is almost as fast as async I/O</div>
well, ive compared it to tornado, may be its really just not optimized. i will experiment with slug if you claim that it uses simple i/o
Here are the results:
Code:------------------ tor-128k -3 ------------------ User Time = 12.343 Global Time = 21.938 User Time = 12.125 Global Time = 20.828 User Time = 12.750 Global Time = 24.250 User Time = 12.187 Global Time = 22.437 -> 425.802.874 ------------------ tor-256k -3 ------------------ User Time = 12.109 Global Time = 22.094 User Time = 12.140 Global Time = 21.844 User Time = 12.265 Global Time = 23.750 User Time = 12.218 Global Time = 22.031 -> 425.802.874
Hmmm, maybe Filemon doesn't work correctly, but it seems that tor-128k is reading 128k blocks the first 8M only. After that, it always reads 4k followed by 124k. It still outputs 8M blocks. It's the same with tor-256k - 256k on the first 8M, than 4k + 252k. Maybe this helps. Additionally, maybe the 8M are hurting, too.
I got it instantly, it only landed into spam folder (gmail).Originally Posted by Christian
Its compression is slightly worse and speed is really close. But compression speed (process time) might be better on systems with smaller cache. Maybe LovePimple can shed some light on the improvements.Originally Posted by joey
If you look at my tools youll notice that I dont like to many commandline parameters. Maybe Ill add a switch, but please dont count on it. Additionally, I chose 64k because many very known tools use this size, too.Originally Posted by joey
No plans for that, sorry. Id really like to, but I dont have enough time. And the little spare time I have I spend with my girlfriend/friends/hobbies and sometimes developing compression algorithms.Originally Posted by joey
I even dont know if Slugs IO works good on other systems (except mine and my girlfriends laptop). But yes, it uses simple IO (e.g. fread and fwrite). The idea is to let the OS do all the async stuff. If we read/write only small blocks the OS will do the async work with its read-ahead and cache-behind systems. We just have to make it easy for the OS to guess - so, Slug always reads/writes exactly 64K. FYI, Thor always reads/writes exactly 32K. Maybe this is all bogus, but explorer, copy and some other tools work like this, too. Just use Filemon and look around a little bit.Originally Posted by Bulat Ziganshin
thanks, ill check itOriginally Posted by Christian
well, i mean that i thought that thor at least used some b/g thread or async calls, even with the same 32k blocks - i think its hard to check this with Filemon?Originally Posted by Christian
Yes, you can not check this with Filemon. But Process Explorer shows only one thread.Originally Posted by Bulat Ziganshin
Im off to bed. Goodnight everyone!
there is also async i/oOriginally Posted by Christian
![]()
Where do you find this in PE?Originally Posted by Bulat Ziganshin
Btw., my own tests showed that overlapped IO is not worth it when reading/writing small blocks (e.g. 32k, 64k). The OS does the same thing anyway - at least when youre not using things like "FILE_FLAG_NO_BUFFERING". Well, you might loose a little bit of memory bandwidth from the in-memory-copying, but it does not make any difference here.
i mean that such apis exist and they may be used even in 1-threaded program. but if this is can be checked by analyzing executable, its greatOriginally Posted by Christian
this depends. memcpy is 230mb/sec and tor -1 is >50mb/sec on my boxOriginally Posted by Christian
I dont know a way to check if a program uses overlapped IO. But I can tell that Thor does not use threads and reads/writes 32k all of the time. Further, I checked overlapped IO and it didnt make a difference while being more complex.Originally Posted by Bulat Ziganshin
Right, but L2-cache is coming into play here. And cached copying is much much faster.Originally Posted by Bulat Ziganshin
Nonetheless, imo, the best solution is to use threading. Its platform independent and you dont have to rely on the OS. But since simple IO works surprisinly good, I prefer it - well, because its simple.
Actually you can use a disassembler and check the fopen/CreateFile/whatever calls and look at the used flags. But I distaste such practice strongly.Originally Posted by Bulat Ziganshin
Btw., Im hoping that Metacompressor goes online again. Id love to have some more detailed results for Slug. Testing on my system all the time is useless - because Slug was tailored to work good on it.