probably this xeon is hyper-threaded single-core
probably this xeon is hyper-threaded single-core
@bulat
thank you for the hint
it seems you are right.
1 processor = 1 core with 2 threads
made a few tests using natural language text (english.100MB) and linux kernel sources
1. sorting:
2. encoding
- btw and st4 sorting has the same speed as in grzip
- bsc supports huge dictionaries, and speed almost not degrades, meaning that we can improve compression by 5-10% for free
- new st5 mode is amazing! 2x faster that bwt with only 2% less compression
3. filters
- with bwt, qlfc provides 1.5% better compression and 10% less speed compared to grzip -m1
- st4+mtf+huf is still speed king
on english text, lzp+bwt+qlfc with maximum dictionary provides the same compression level as dict+lzp+ppmd while compressind and extracting 30% faster (single-threaded performance). on linux sources, compression ratio is 4% worse while speed is still 30% better. one problem, though, is that ppmd effectively splits all data into 7-8 mb blocks so it can be easily multithreaded, at least for extraction. bsc with maximum block size is single threaded
- -H24 or so slightly increases compression level
- -M32 makes compression faster and, for ST, even better
- other filters just waste time on my tests, so i've disabled them
so, i recommend the following settings:
fast: grzip -m4
optimum: -m2 -b25 -M32 [-r -s]
best: -m3 -b1024 -H28 -M32
suggestions:
1. give us back mtf encoder!
2. is it possible to multi-thread single-block compression? in particular, split sorting work into several independent buckets, overlap sorting and encoding, make encoding block a 1/4 of sorted block or encode in 1 mb sub-blocks. btw it seems that bsc -t -b1024 demonstrates some internal multithreading while -T works single-threaded
3. increase default block size to 25 mb or so
4. is it possible to make other filters faster, at least on detection stage? otherwise -r -s makes text compression much faster w/o losing ratio
5. add exe filter, preferably dispack with obj/exe/data detector
6. bssc provides great speed/compression for executables by using 22-bit ST transform. you may try...
7. there is an idea that DICT can provide better boost for BWT compressors that operate on 16-bit words..
non-library code:
1. allow to use stdin/stdout/nul, count bytes read/written in i/o procedures instead of using ftell
2. mimic bzip2 syntax, port program to unix and advertise it as possible bzip2 successor
3. ST stands for Shindler transform![]()
I tested BSC on mix of TXT,Word-DOC files... Dual core CPU...
BSC (setup block=64MB) is 8,5x(!) faster than Nanozip (memory=400MB, nz_cm mode), and penalty is only +7% in size.
Unbelievable...
I am working on this. Probably I will add fast mode in next release.
BWT - Already have some optimizations.
ST4/ST5/LZP - Not possible with current implementation.
QLFC - Splitting of ST/BWT output in multiple blocks and process them in parallel is very good idea. Probably I will add this feature in next release.
Sound good. I will add this in next release.
Unfortunately no. You can have a looks in source codeSegmentation(slowest part) is based on work of Dmitry Shkarin. And I make segmentation work as fast as possible.
I don't know. May be better to add bsc to FreeARC, instead of adding filters to bsc itself?
Sound good. I will add ST3 in next release.
Same as 5.
My goal is a data compression library. So I hope that someone can help me to build a nice compressor around it.
I know![]()
Enjoy coding, enjoy life!
Is the source code available somewhere?
MOC Benchmark Failed with -m1 -m2 -m3 options !
the file is :
http://hea-www.harvard.edu/~fine/ima...ion-Nebula.jpg
![]()
MOC Benchmark passed with options:
-m1 -p
-m2 -p
-m3 -p
All is OK!
Hi!
Block Sorting Compressor version 2.0.0 is available for testing:
[+]: Bug fixes.
[+]: Order 3 sort transform.
[+]: Better multi-core systems support. Even if you compressing file as single block.
[+]: Fast "-f " compression mode. Work best with ST4&ST5 transforms.
[+]: Source code is available.
Source code and binaries for Windows available on my website http://libbsc.com/default.aspx
Enjoy coding, enjoy life!
enwik9 tests for new 2.0.0 version:
1. enwik9 was compressed to 163888465 bytes in 237.193 seconds and decompressed in 199.031 seconds on Intel Core 2 Quad Q9400, 8GB RAM, Win7 x64 using 5095Mb RAM with "-b1000p" command line option.
2. enwik9 was compressed to 201321919 bytes in 25.834 seconds and decompressed in 64.413 seconds on Intel Core 2 Quad Q9400, 8GB RAM, Win7 x64 using 1624Mb RAM "-b80p -m2f" command line option.
Enjoy coding, enjoy life!
Athlon 64 X2 4200+, 2GB RAM, Win7 x86. bsc103 = BSC 1.0.3, bsc200 = BSC 2.0.0.
Enwik8, default settings:
C:\Users\Black_Fox\Desktop\bsc-x86>bsc103 e enwik8 enw103
enwik8 compressed 100000000 into 23528564 in 29.046 seconds.
C:\Users\Black_Fox\Desktop\bsc-x86>bsc103 d enw103 enw103_
enw103 decompressed 23528564 into 100000000 in 15.506 seconds.
C:\Users\Black_Fox\Desktop\bsc-x86>bsc200 e enwik8 enw200
enwik8 compressed 100000000 into 22251068 in 29.105 seconds.
C:\Users\Black_Fox\Desktop\bsc-x86>bsc200 d enw200 enw200_
enw200 decompressed 22251068 into 100000000 in 16.230 seconds.
Enwik8, -b128:
C:\Users\Black_Fox\Desktop\bsc-x86>bsc103 e enwik8 enw103 -b128
enwik8 compressed 100000000 into 20769950 in 58.665 seconds.
C:\Users\Black_Fox\Desktop\bsc-x86>bsc103 d enw103 enw103_
enw103 decompressed 20769950 into 100000000 in 30.342 seconds.
C:\Users\Black_Fox\Desktop\bsc-x86>bsc200 e enwik8 enw200 -b128
enwik8 compressed 100000000 into 20771832 in 42.290 seconds.
C:\Users\Black_Fox\Desktop\bsc-x86>bsc200 d enw200 enw200_
enw200 decompressed 20771832 into 100000000 in 26.362 seconds.
avatarized.psd, default settings:
C:\Users\Black_Fox\Desktop\bsc-x86>bsc103 e avatarized.psd ava103
avatarized.psd compressed 120641788 into 60955976 in 48.686 seconds.
C:\Users\Black_Fox\Desktop\bsc-x86>bsc103 d ava103 ava103_
ava103 decompressed 60955976 into 120641788 in 32.064 seconds.
C:\Users\Black_Fox\Desktop\bsc-x86>bsc200 e avatarized.psd ava200
avatarized.psd compressed 120641788 into 49399442 in 47.870 seconds.
C:\Users\Black_Fox\Desktop\bsc-x86>bsc200 d ava200 ava200_
ava200 decompressed 49399442 into 120641788 in 33.772 seconds.
avatarized.psd, -b128:
C:\Users\Black_Fox\Desktop\bsc-x86>bsc103 e avatarized.psd ava103 -b128
avatarized.psd compressed 120641788 into 16361477 in 58.388 seconds.
C:\Users\Black_Fox\Desktop\bsc-x86>bsc103 d ava103 ava103_
ava103 decompressed 16361477 into 120641788 in 33.971 seconds.
C:\Users\Black_Fox\Desktop\bsc-x86>bsc200 e avatarized.psd ava200 -b128
avatarized.psd compressed 120641788 into 16239295 in 42.077 seconds.
C:\Users\Black_Fox\Desktop\bsc-x86>bsc200 d ava200 ava200_
ava200 decompressed 16239295 into 120641788 in 31.125 seconds.
Some other compressors' performance at avatarized.psd:
ZIP, Best (via WinRAR): 94272788
7z, Ultra 64MB: 14560884
FreeArc -mex5: 16116052
Note that avatarized.psd is first large compressible binary file I stumbled upon during tests![]()
I am... Black_Fox... my discontinued benchmark
"No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates
hello,
for those interested you will find attached builds for linux x86/x64 and the crude buildscripts. compiled with very conservative switches, quick tests were ok.
best regards
evg
new -m2 -f mode is truly amazing - it provides quite the same compression as grzip -m2 while being 2x faster! with the 30 mb/sec compression speed on my core2quad@3.25GHz it's the best candidate for freearc -m3 text compression. btw, rar -m5 provides almost the same ratio on enwik8 while being 12x slower!!!
what about faster and slower methods? let's see - st4/st5 transformation by itself needs only 1.5-2 secs (for enwik8 on my cpu). if you can build some weak encoding that uses just another 1.5-2 seconds, resulting speed on quad-cpu will be about 100 mb/sec
on the other direction, i.e. good compression with ~10mb/sec speed on my quad-cpu, the only way i see is to parallelize bwt itself. another alternative is to detect number of cores and size of file and split it accordingly but it means weaker compression on small files that is not so interesting for strong, bwt-based method
it requires support of 16-bit elements in bwt/st/coders (also lzp and so on). well, it's a big chunk of work without guaranteed results7. there is an idea that DICT can provide better boost for BWT compressors that operate on 16-bit words.
btw, you can find some ideas for lzp in freearc lzp sources. i got Shkarin's algorithm and even improved it a bit. in particular, my code doesn't make random memory access on every decompression step, so decompression with large hash is 2x faster than in your/Dmitry code. also, you may try multiplying hash:
hash(x) = (x*123456791) >> (32-n)
probably, it has the same speed as your code but a bit better distruibution
i think that cycle in bsc_unbwt_reconstruct may be multithreaded if several initial indexes are saved with bwt block. for example, we may save index after every 1 mb of bwt input data, so this cycle may be run independent for every 1mb unbwt output blocks. it should allow -m3 decompression to reach 90% utilization on multi-core cpus
cycle in bsc_unbwt_sort filling P[] also may be multithreaded but it will require to store bucket[] for every sub-block counted independently. this stats may be used to improve post-bwt compression
btw, how fast is slow unbwt (second part in bsc_unbwt_reconstruct)? is it the same as -s algorithm in bzip2? it would be useful to have unbwt with smaller memreqs as bzip2 -s decompression
No, this is not the same like in bzip2. I have two version of bwt reconstruction algorithms for different cases, but all of them have the same memory requirements. The first one is for case then the is no symbol with frequencey >=16Mb, for this case I can compact symbol and it frequency to unsigned int. The second case uses only frequency and binary search to reconstruct symbol. For ST I using a three cases if length less 8Mb, if is no symbol with frequency >=8Mb and the last case.
Note: ST can be easy paralleled for two cores, just by starting processing of data from begin and from the end. The parallel version for more cores is also possible, but will be more tricky.
Last edited by Gribok; 7th May 2010 at 04:51.
Enjoy coding, enjoy life!
To whom who have a Linux. I got an email that my multi-threading optimization in bsc is not working in Linux. Guys, could you please test bsc on Linux. I would appreciate also if someone can create a makefile and binaries for x86 and x64 for Linux, so I could upload them to web site. Thanks.
Enjoy coding, enjoy life!
New version 2.1.0 is available for testing on my website:
[+]: Slightly faster compression and decompression on multi-core systems when compressing file as a single block.
[+]: makefile for Linux systems. Tested on Linux Mint 9.
[+]: Removed compilation errors for GNU C++ compiler.
Note: this version is not compatible with previous one.
Enjoy coding, enjoy life!
Hi,
from my user point of view bsc Version 2.1.0 is among the best(open-source; multithreaded; strong compression ratio for text; tunable) compressors, so far in my opinion.
Yet, a thing remain to be added/improved(maybe via a tune option): better decompression ratio.
While -b2,-pb2,-m0,-m1,-m2,-pm0,-pm1,-pm2 give better compression speed, decompression speed remains virtually the same, this crippled my greedy expectations, he-he.
591,360 bsc_INTEL.exe (32bit, from bsc-2.1.0-x86.zip)
229,376 bsc_Microsoft.exe (32bit, compiled with /Ox Visual C++ Toolkit 2003)
Following is the test(on Pentium T3400 Merom-1M, 2 cores, 65nm, 2166 MHz) on all english OSHO books, this is a real-world text not some synthetic nonsense:
206,908,949 Folio VIP - Osho's Books on CD-ROM Part 1-2-3-4-5.txt
101,903,677 Folio VIP - Osho's Books on CD-ROM Part 1-2-3-4-5.txt.LZ4
078,630,023 Folio VIP - Osho's Books on CD-ROM Part 1-2-3-4-5.txt.QuickLZ
062,740,017 Folio VIP - Osho's Books on CD-ROM Part 1-2-3-4-5.txt.AIN2.22
062,620,125 Folio VIP - Osho's Books on CD-ROM Part 1-2-3-4-5.txt.ULZ
062,218,572 Folio VIP - Osho's Books on CD-ROM Part 1-2-3-4-5.txt.PKZIP2.50
055,891,414 Folio VIP - Osho's Books on CD-ROM Part 1-2-3-4-5.txt.zipDeflate64
048,780,809 Folio VIP - Osho's Books on CD-ROM Part 1-2-3-4-5.txt.lzt [-29]
042,723,543 Folio VIP - Osho's Books on CD-ROM Part 1-2-3-4-5.txt.lzt [-39]
041,671,476 Folio VIP - Osho's Books on CD-ROM Part 1-2-3-4-5.txt.Plzip0.5
040,108,833 Folio VIP - Osho's Books on CD-ROM Part 1-2-3-4-5.txt.PPMSI1 [-o5]
035,109,880 Folio VIP - Osho's Books on CD-ROM Part 1-2-3-4-5.txt.4x4_4t [4t]
032,766,449 Folio VIP - Osho's Books on CD-ROM Part 1-2-3-4-5.txt.bsc [b25]
031,050,654 Folio VIP - Osho's Books on CD-ROM Part 1-2-3-4-5.txt.zpaq1.00 [c3]
030,316,502 Folio VIP - Osho's Books on CD-ROM Part 1-2-3-4-5.txt.bsc [b200]
Compression:
Decompression:Code:C:\WorkTemp\_Osho's Books_test>bsc_INTEL.exe e OSHO.TXT OSHO.TXT.bsc_INTEL This is bsc, Block Sorting Compressor. Version 2.1.0. 17 May 2010. Copyright (c) 2009-2010 Ilya Grebnov <Ilya.Grebnov@libbsc.com>. OSHO.TXT compressed 206908949 into 32766449 in 36.500 seconds. C:\WorkTemp\_Osho's Books_test>bsc_Microsoft.exe e OSHO.TXT OSHO.TXT.bsc_Microsoft This is bsc, Block Sorting Compressor. Version 2.1.0. 17 May 2010. Copyright (c) 2009-2010 Ilya Grebnov <Ilya.Grebnov@libbsc.com>. OSHO.TXT compressed 206908949 into 32766449 in 61.969 seconds.
A nifty console tool with a nifty web-site indeed,Code:C:\WorkTemp\_Osho's Books_test>bsc_INTEL.exe d OSHO.TXT.bsc_INTEL OSHO.TXT This is bsc, Block Sorting Compressor. Version 2.1.0. 17 May 2010. Copyright (c) 2009-2010 Ilya Grebnov <Ilya.Grebnov@libbsc.com>. OSHO.TXT.bsc_INTEL decompressed 32766449 into 206908949 in 24.531 seconds. C:\WorkTemp\_Osho's Books_test>bsc_Microsoft.exe d OSHO.TXT.bsc_Microsoft OSHO.TXT This is bsc, Block Sorting Compressor. Version 2.1.0. 17 May 2010. Copyright (c) 2009-2010 Ilya Grebnov <Ilya.Grebnov@libbsc.com>. OSHO.TXT.bsc_Microsoft decompressed 32766449 into 206908949 in 47.188 seconds.
good luck, regards.
I have a "noob" question - can MPI give advantage before OMP ?
Or MPI can be used only for cluster systems ?
Article in other languages:
English
Chinese
French
Last edited by Surfer; 28th May 2010 at 16:03.
@gribok
you wrote: "And I don't know any MPI based compressor."
---
here is one:
---
http://compression.ca/mpibzip2/
---
Parallel MPI BZIP2 (MPIBZIP2) by Jeff Gilchrist
---
MPIBZIP2 is a parallel implementation of the
bzip2 block-sorting file compressor that uses MPI
and achieves significant speedup on cluster machines.
The output of this version is fully compatible with bzip2 v1.0.2 or newer
(ie: anything compressed with mpibzip2 can be decompressed with bzip2).
MPIBZIP2 should work on any system
that has a pthreads compatible C++ compiler (such as gcc).
It has been tested on: Linux and Solaris.
---
source from 2007-07-18 is downloadable:
http://compression.ca/mpibzip2/mpibzip2-0.6.tar.gz
---
best regards
and thanks again for your wonderful bsc
New version 2.1.5 is available for testing. This is a service release with some minor changes:
[+]: Improved multi-core systems support.
[+]: Improved segmentation algorithm.
[+]: Improved Linux systems support.
You can download new version from my web site http://libbsc.com/default.aspx
Enjoy coding, enjoy life!
Ilya, can you please technically explain changes in two last versions?
And can you please explain how you are able to use 1 GB block with libdivsufsort! Max allocation block is 2 GB = 512 M int elements... What kind of trick do you use? Do you use divbwt() for BWT construction, if so, is divbwt() able to deal with 1 GB block, or you use divsufsort() for suffix array construction?
![]()