Page 3 of 3 FirstFirst 123
Results 61 to 79 of 79

Thread: BCM v0.08 - The ultimate BWT-based file compressor!

  1. #61
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    406
    Thanked 403 Times in 153 Posts
    I'm not really happy with my dynamic mixing results...

    Anyway, the new, BCM v0.09 will be released very soon. I have two completely optimized versions - one with two SSE stages and second with three SSE stages. Just not decided yet which one is supreme in terms of efficiency and which one to use... Apart from a few SSE stages that I described in many posts here, in the past, current BCM has an improved main CM model and this is important...

    Finally, I'm about to back to the GZIP/BZIP2 like interface. i.e.

    bcm [options] file [output]


    Block sizes are in megabytes. Probably 'finished' instead of a 'done' message...


  2. #62
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,572
    Thanks
    783
    Thanked 687 Times in 372 Posts
    SEE?

  3. #63
    Member
    Join Date
    May 2009
    Location
    China
    Posts
    36
    Thanks
    0
    Thanked 0 Times in 0 Posts
    nice work, waiting for v0.09

  4. #64
    Member
    Join Date
    Jun 2008
    Location
    USA
    Posts
    111
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Lame request: please build a DOS (DJGPP?) version too.

  5. #65
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    406
    Thanked 403 Times in 153 Posts
    SEE (Secondary Escape Estimation) is not really possible with CM since it has no escapes.

    DOS? The thing is out. It might be cool if compressor/decompressor needs less than, say, 64 KB. So we will able to run such a program even under MS-DOS boot diskette. Never step back, I'm really thinking about 64-bit, multi-threaded, etc. releases!

    A question. Like I said I have two versions for now:
    1. 2 SSE stages. Moderate speed penalty over v0.08, nice compression gain.
    2. 3 SSE stages. Notable speed penalty over v0.08, really nice compression gain.

    I'm still not fully tested the final/optimized builds of both. But according to regular builds the speed difference of v0.09(ver.1) and v0.08 is much like the penalty of v0.08 over v0.07 - i.e. not that huge.

    So, which one to use?


  6. #66
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,982
    Thanks
    298
    Thanked 1,309 Times in 745 Posts
    Actually most of console compressors with available sources can be
    easily ported to dos. For that they have to be statically built with
    MS/Intel compiler using /FIXED:NO linker option (relocations),
    and then processed with something like WDOSX

  7. #67
    Member
    Join Date
    Jun 2008
    Location
    USA
    Posts
    111
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien View Post
    Actually most of console compressors with available sources can be
    easily ported to dos. For that they have to be statically built with
    MS/Intel compiler using /FIXED:NO linker option (relocations),
    and then processed with something like WDOSX
    I'm aware of this, also HX can run some Win32 stuff even without relocations. Typically OpenWatcom Win32 .EXEs work well with either (although slower than GCC). Hence I often also prefer DJGPP compiles (faster), esp. because they have no "non-free" and buggy dependencies (annoying). I still often see Win32 apps (as Nania found out the hard way) that have quirks or don't work right in Vista, hence I'm a bit worried that Win7 is the same.

  8. #68
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    406
    Thanked 403 Times in 153 Posts
    The people keep silence about any BCM suggestions.

    Anyway, I have chosen the variant with 3 SSE stages. So, now I'm thinking about serious interface changes.

    1. GZIP-like interface, e|d, c|d
    2. Ratio?
    3. Input/output file sizes
    4. Elapsed time?
    5. Display the file name?
    6. ?


  9. #69
    Tester
    Black_Fox's Avatar
    Join Date
    May 2008
    Location
    [CZE] Czechia
    Posts
    471
    Thanks
    26
    Thanked 9 Times in 8 Posts
    SizeBefore -> SizeAfter (PercentageOfOriginal) in TimeSeconds.

    EDIT: One little clarification.
    Last edited by Black_Fox; 29th July 2009 at 15:13.
    I am... Black_Fox... my discontinued benchmark
    "No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates

  10. #70
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,497
    Thanks
    26
    Thanked 132 Times in 102 Posts
    I would prefer automatic behaviour ie:

    On command:
    bcm bcmfile it should decompress it, and

    On command:
    bcm nonbcmfile it sould compress it.

    It should differentate bcm files from nonbcm files by searching for bcm header.

    This way it would work well using Drag&Drop.

  11. #71
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    406
    Thanked 403 Times in 153 Posts
    Draft example:
    Code:
    > bcm book1
    BCM file compressor (v0.09)
    Copyright (c) 2009 Ilia Muraviev
    Compressing 0.73 MB block...
    768771 => 208726 (2.172 bpb)
    Elapsed time: 0.604s

  12. #72
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    412
    Thanks
    38
    Thanked 64 Times in 38 Posts
    i propose: write the results in one line

    > bcm book1
    BCM file compressor (v0.09)
    Copyright (c) 2009 Ilia Muraviev
    Compressing 0.73 MB block...
    ...
    time: 0.604s, used mem: 1024 MB, 208726/768771=27,2%

    best regards

  13. #73
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    406
    Thanked 403 Times in 153 Posts
    The results of hopefully the final release candidate:

    book1 -> 208,726 bytes
    calgary.tar -> 775,271 bytes
    world95.txt -> 463,725 bytes
    fp.log -> 551,378 bytes
    ENWIK5 -> 1,195,166 bytes
    ENWIK8 -> 20,625,697 bytes
    bible.txt -> 721,591 bytes
    3200.txt-> 3,641,305 bytes


  14. #74
    Member
    Join Date
    May 2009
    Location
    China
    Posts
    36
    Thanks
    0
    Thanked 0 Times in 0 Posts
    When to release?

  15. #75
    Member Skymmer's Avatar
    Join Date
    Mar 2009
    Location
    Russia
    Posts
    688
    Thanks
    41
    Thanked 174 Times in 88 Posts
    A couple of suggestions from me.
    1.) A total percentage indicator for the whole file not for the current block.
    2.) More info when BCM started without any parameters. I mean explanation what does e# means, what is the default value and how much memory is required for # value. Something like: "BCM requires 5# of memory"
    3.) To allow no output file to be given in command line at encoding and to make smart behaviour here. I mean adding .bcm extension for input files without extension and changing extension to .bcm for files with extension. For example: bcm e enwik8 will produce enwik8.bcm on output. bcm e Test.dat will produce Test.bcm
    4.) Control if input and output files are the same because f.e. bcm enwik8 enwik8 makes not very good thing. Original file is destroyed and instead of it corrupted variant is created with the same name but only 6 byte long where every byte is 0xFF

  16. #76
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    406
    Thanked 403 Times in 153 Posts

    Exclamation

    The compression of the brand new BCM was finished a long time ago. I just have no spare time completely and thinking about more robust command-line user interface and error handling.

    The output can be separated into a few groups:

    1. Banner

    Code:
    BCM v0.09 (c) 2009 Ilia Muraviev
    Looks quite OK. Probably I may add the "(c) Copyright" or something.

    2. Progress Info

    Code:
    Compressing 12.5 MB block...
     23%
    0.1 precision for MB is OK. No float representation of percentage completed is also OK I guess.

    3. Done/Summary Message

    Code:
    1000000 -> 12340 in 4.123 sec
    Summary, in one line. Input Size -> Output Size and execution time. 0.001 float precision is OK, but I'm not that sure. Probably, 0.01 or even 0.1 might be OK as well.

    4. Error Messages

    Code:
    enwik8 already exists; Not overwritten
    Data error: n=121234124
    Data error: p=100
    etc.
    Never know how it should be. Probably "DATA ERROR: p=100" or "ERROR: xxx", or ...

    5. Usage Screen

    Code:
    Usage: BCM [options] file [output]
    
    Options:
      -b# Set block size to # MB (Default: -b64)
      -d  Decompress
      -f  Force overwrite
    Or "Usage: bcm [options] ..." (Lower case). Or something else.

    Any suggestions and ideas are welcomed!


  17. #77
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    412
    Thanks
    38
    Thanked 64 Times in 38 Posts
    >1. Banner
    >
    >BCM v0.09 (c) 2009 Ilia Muraviev

    nice

    >2. Progress Info
    >
    >Compressing 12.5 MB block... 23%

    process info maybe in one line ?

    >3. Done/Summary Message
    >
    >1000000 -> 12340 in 4.123 sec

    wonderful

    "1000000 -> 12340 in 4 sec" might be OK as well

    i propose:

    "1000000 -> 12340 in 4 sec .. using 1024 MB RAM"

    .. the amount of maximal used RAM memory

    >4. Error Messages

    i propose:

    error: enwik8 already exists; Not overwritten
    error: Data read error in position 121234124
    error: Data processing error in position 121234124
    error: parameter error p=100
    etc.

    >5. Usage Screen

    >Usage: BCM [options] file [output]
    >
    >Options:
    > -b# Set block size to # MB (Default: -b64)
    > -d Decompress
    > -f Force overwrite
    >

    if not given in commandline
    then resulting filename = "file".bcm

    may be all letters in lower case looks better

    best regards
    want to test

  18. #78
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    406
    Thanked 403 Times in 153 Posts
    OK, just one comment. Great activity. Good.

    2. I don't like one-line progress idea...

    3. 0.1 precision might be better - to measure processing time that less than one second.
    Memory usage is fixed. The "using 1024 MB RAM" in the summary message is bad idea. Such thing is for tree based programs - i.e. in such cases memory usage are variable. New BCM uses (BLOCKSIZE*5)+~8 MB+some memory can be allocated during sorting stage.

    5. The UPPER CASE program name is a Windows standard. Check out the CABARC help screen:
    Code:
    Microsoft (R) Cabinet Tool - Version 1.00.0601 (03/18/97)
    Copyright (c) Microsoft Corp 1996-1997. All rights reserved.
    
    Usage: CABARC [options] command cabfile [@list] [files] [dest_dir]
    
    Commands:
       L   List contents of cabinet (e.g. cabarc l test.cab)
       N   Create new cabinet (e.g. cabarc n test.cab *.c app.mak *.h)
       X   Extract file(s) from cabinet (e.g. cabarc x test.cab foo*.c)
    
    Options:
      -c   Confirm files to be operated on
      -o   When extracting, overwrite without asking for confirmation
      -m   Set compression type [LZX:<15..21>|MSZIP|NONE], (default is MSZIP
      -p   Preserve path names (absolute paths not allowed)
      -P   Strip specified prefix from files when added
      -r   Recurse into subdirectories when adding files (see -p also)
      -s   Reserve space in cabinet for signing (e.g. -s 6144 reserves 6K by
      -i   Set cabinet set ID when creating cabinets (default is 0)
    
    Notes
    -----
    When creating a cabinet, the plus sign (+) may be used as a filename
    to force a folder boundary; e.g. cabarc n test.cab *.c test.h + *.bmp
    
    When extracting files to disk, the <dest_dir>, if provided, must end in
    a backslash; e.g. cabarc x test.cab bar*.cpp *.h d:\test\
    
    The -P (strip prefix) option can be used to strip out path information
    e.g. cabarc -r -p -P myproj\ a test.cab myproj\balloon\*.*
    The -P option can be used multiple times to strip out multiple paths

  19. #79
    Member
    Join Date
    Aug 2008
    Location
    Saint Petersburg, Russia
    Posts
    215
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by encode View Post
    5. The UPPER CASE program name is a Windows standard. Check out the CABARC help screen:
    I-i guess that's more of an anachronistic DOS standard

Page 3 of 3 FirstFirst 123

Similar Threads

  1. BCM v0.09 - The ultimate BWT-based file compressor!
    By encode in forum Data Compression
    Replies: 22
    Last Post: 6th March 2016, 09:26
  2. PPMX v0.05 - new PPM-based compressor
    By encode in forum Data Compression
    Replies: 49
    Last Post: 28th July 2010, 02:47
  3. BCM v0.01 - New BWT+CM-based compressor
    By encode in forum Data Compression
    Replies: 81
    Last Post: 9th February 2009, 15:47
  4. Blizzard - Fast BWT file compressor!!!
    By LovePimple in forum Data Compression
    Replies: 40
    Last Post: 6th July 2008, 14:48
  5. DARK - a new BWT-based command-line archiver
    By encode in forum Forum Archive
    Replies: 138
    Last Post: 23rd September 2006, 21:42

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •