Page 1 of 2 12 LastLast
Results 1 to 30 of 34

Thread: lzma recompressor

  1. #1
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,348
    Thanks
    212
    Thanked 1,012 Times in 537 Posts

    lzma recompressor

    http://nishi.dreamhosters.com/u/lzmarec_v3_bin.rar

    lzma recompressor - a tool which detects lzma streams in files,
    decodes them, and compresses with a stronger entropy coder.

    02-03-2011 00:37 v3 (coder v17e)
    - first complete version
    - known issue - skips 1-2 bytes when restoring incomplete streams

    recompression stats:
    Code:
          icl.exe  SFC        enwik8
    lzma  996740   12172597   24557177   
    17e   956937   11875485   24318308
    Related thread: http://encode.su/threads/1177-lzma-s...cy-measurement

  2. #2
    Member
    Join Date
    May 2008
    Location
    Kuwait
    Posts
    333
    Thanks
    36
    Thanked 36 Times in 21 Posts
    Great news. I think its possible to make a preprocessor:
    1- read and store compression setting and save with no compression.
    2- restore the old compression or option to improve compression.
    3- release source code or combine with "precomp"..

    I'll test it with products that has Lzma streams.

  3. #3
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    How about plugging the coder to LZMA? I think it would be more interesting than a recompressor.

  4. #4
    Member
    Join Date
    May 2008
    Location
    Kuwait
    Posts
    333
    Thanks
    36
    Thanked 36 Times in 21 Posts
    FoxitReader431_enu_Setup.exe

    Original 7,915,000
    7ZIPed 7,864,786

    Recomp 7,767,352
    7ZIPed 7,732,932


    - Testing with 7-zip files will reduce file size but generate a broken & non valid archive.
    - Worked on UPXed files and file size is reduced but resulted file is not working .. i think it needs alignment or something.

  5. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,348
    Thanks
    212
    Thanked 1,012 Times in 537 Posts
    I'd post a bugfix later today, for now there's a question though.
    Now its like this:
    Code:
    lzmarec d file.lzma file.rec
    lzmarec c file.rec file.unp.lzma
    ie "d" command is used to recompress the file and "c" to restore it.
    Historically it makes some sense, because "d" decodes lzma and "c" encodes it.
    But now in v3+ "d" actually compresses the file, and "c" decompresses it...
    so should I swap the commands?

  6. #6
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Shelwien View Post
    I'd post a bugfix later today, for now there's a question though.
    Now its like this:
    Code:
    lzmarec d file.lzma file.rec
    lzmarec c file.rec file.unp.lzma
    ie "d" command is used to recompress the file and "c" to restore it.
    Historically it makes some sense, because "d" decodes lzma and "c" encodes it.
    But now in v3+ "d" actually compresses the file, and "c" decompresses it...
    so should I swap the commands?
    I'd say it depends on your goals. If it's supposed to be more than a plaything than yes, it's weird now.
    Otherwise it doesn't matter.

  7. #7
    Member
    Join Date
    May 2008
    Location
    Kuwait
    Posts
    333
    Thanks
    36
    Thanked 36 Times in 21 Posts
    its working fine now and it can restore original file but how come is can improve compressed 7z files more.. can you explain what it does.

  8. #8
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    563
    Thanks
    213
    Thanked 200 Times in 93 Posts
    +1 for swapping ("c" for the first step, "d" for the second step), I thinks it's clearer that way round.

    If I read the output correctly, on non-lzma data lzmarec often finds lots of small "streams" (size < 32 bytes) that slow it down. In Precomp I added a "minimal size" switch to skip small streams because they are false positives often and it doesn't make much sense anyway to try recompress such small streams because savings will be small or they even get larger. I don't know if something like this works for lzmarec too, just something I noticed.

    For large XZ files (created with 7-Zip, if it matters), it seems that only a part of the LZMA stream (or the first of the streams) is detected, f.e. testing with a 214 K file, only one stream with size 58 K was detected (and the restored file missed one byte afterwards, but I guess that's the "known issue").
    Last edited by schnaader; 2nd March 2011 at 18:47.
    http://schnaader.info
    Damn kids. They're all alike.

  9. #9
    Member
    Join Date
    May 2008
    Location
    Kuwait
    Posts
    333
    Thanks
    36
    Thanked 36 Times in 21 Posts
    Quote Originally Posted by schnaader View Post
    +1 for swapping ("c" for the first step, "d" for the second step), I thinks it's clearer that way round.

    If I read the output correctly, on non-lzma data lzmarec often finds lots of small "streams" (size < 32 bytes) that slow it down. In Precomp I added a "minimal size" switch to skip small streams because they are false positives often and it doesn't make much sense anyway to try recompress such small streams because savings will be small or they even get larger. I don't know if something like this works for lzmarec too, just something I noticed.

    For large XZ files (created with 7-Zip, if it matters), it seems that only a part of the LZMA stream (or the first of the streams) is detected, f.e. testing with a 214 K file, only one stream with size 58 K was detected (and the restored file missed one byte afterwards, but I guess that's the "known issue").
    I've noticed this while testing EXE files as speed dropped to under 1bit/s and stopping at false streams (size=0) so limiting stream size should solve such drawback.

    It also seems to work with LZMA only and not LZMA2 as i've tested with 7zip
    Last edited by maadjordan; 2nd March 2011 at 19:25.

  10. #10
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,348
    Thanks
    212
    Thanked 1,012 Times in 537 Posts
    http://nishi.dreamhosters.com/u/lzmarec_v4_bin.rar

    02-03-2011 22:12 v4
    + parser support for lzma's EOF code (distance=-1)
    (proper detection for end of streams written by LZMA SDK)
    + coder support for lzma's EOF code (a little redundancy)
    + rangecoder flush detection with 1-3 FFNum bytes (was only 0 before).
    + fixed the miscalculated stream size for incomplete streams
    + progress output for restoring
    + swap encode/decode function names
    (now c=compress, d=decompress, as expected)
    + check actual usable stream size instead of parsed size

  11. #11
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,348
    Thanks
    212
    Thanked 1,012 Times in 537 Posts
    @maadjordan:

    > Great news. I think its possible to make a preprocessor:
    > 1- read and store compression setting and save with no compression.

    Actually v2 did just that (you can get it from http://nishi.dreamhosters.com/u/lzmarec_v2_bin.rar,
    v0 and v1 are actually there too).
    But in this case you likely won't be able to compress the data produced by my parser
    even to original lzma size, even with paq8.
    lzmarec is different from precomp and doesn't completely decode the data,
    it only removes lzma's entropy coding.

    > 2- restore the old compression or option to improve compression.

    Well, we kinda have this now.

    > 3- release source code or combine with "precomp"..

    I can probably make a library if schnaader is interested.
    As to source, its not really a problem to release it, but
    I doubt that it would be of any help to schnaader, as its not
    in a library form atm.

    > I'll test it with products that has Lzma streams.

    Ok, thanks.

    > its working fine now and it can restore original file but how come
    > is can improve compressed 7z files more..
    > can you explain what it does.

    LZMA (same as most other LZ codecs) is a combination of two nearly independent parts -
    LZ transform (similar to rep, in a way) and statistical coder.
    The statistical part of lzma is basically a simple order1 bitwise CM with linear counters,
    and its possible to improve compression by replacing it with a stronger
    (but slower) CM.
    So lzmarec removes lzma's entropy coding and recompresses the lzma's LZ transform data
    with a custom CM coder.

    Unfortunately full decoding, like precomp does, is not feasible for lzma recompression -
    lzma encoding is slow as it is, and there're too many options to bruteforce.

    > I've noticed this while testing EXE files as speed dropped to under
    > 1bit/s and stopping at false streams (size=0) so limiting stream
    > size should solve such drawback.

    In v4 I modified the checks to skip such streams, but actually it won't
    make it any faster. With lzma its relatively hard to encounter any invalid
    codes, almost any 20-30 random bytes can be decoded w/o any errors, but
    it seems that some types of data (zero runs, for example) are even harder
    to verify, as they seem like a valid lzma stream with any options.
    Note also that lzmarec actually tries all 225 lc/lp/pb combinations at
    all file offsets, until it finds a stream which is long enough (1k+ atm).

    > It also seems to work with LZMA only and not LZMA2 as i've tested with 7zip

    Yes. Unfortunately LZMA2 is not a solid stream, so it requires special support,
    which I don't plan to implement atm.


    @m^2:
    > How about plugging the coder to LZMA? I think it would be more
    > interesting than a recompressor.

    That's the plan actually - see http://nishi.dreamhosters.com/u/lzmarec_v1_bin.rar for example.
    lzmarec appeared only because I already had original and improved lzma entropy coders and
    wanted to test some things for another project.


    @schnaader:
    > If I read the output correctly, on non-lzma data lzmarec often finds
    > lots of small "streams" (size < 32 bytes) that slow it down.

    Again, that's a misunderstanding. lzmarec never reacted to streams
    smaller than 1024 bytes from the start. But it cuts the tails
    of incomplete streams to make sure that it can reproduce them
    (and won't miss a real stream because of overlap with misdetection)
    and it could become even 0 after that... changed the check now,
    but it won't affect the speed (actually would probably make it slower).

    > I don't know if something like this works for lzmarec too, just
    > something I noticed.

    Actually at first I indended to check for 00 at stream start and
    some other things, like a signature check, but then lzmarec started
    working and detected a lot of streams which didn't match my theory.

    > For large XZ files (created with 7-Zip, if it matters), it seems
    > that only a part of the LZMA stream (or the first of the streams) is
    > detected, f.e. testing with a 214 K file, only one stream with size
    > 58 K was detected (and the restored file missed one byte afterwards,
    > but I guess that's the "known issue").

    Hopefully there shouldn't be missing bytes anymore (please tell if
    you'd find such a case), but LZMA2 probably won't be supported -
    the compression there is the same lzma, so its actually possible
    to put it into lzmarec format, but there're additional blockwise
    headers like this:

    Code:
    00000000  -  EOS
    00000001 U U  -  Uncompressed Reset Dic
    00000010 U U  -  Uncompressed No Reset
    100uuuuu U U P P  -  LZMA no reset
    101uuuuu U U P P  -  LZMA reset state
    110uuuuu U U P P S  -  LZMA reset state + new prop
    111uuuuu U U P P S  -  LZMA reset state + new prop + reset dic
      u, U - Unpack Size
      P - Pack Size
      S - Props
    And afaiu its necessary to explicitly parse that for detection,
    pure lzma bruteforce won't be able to detect where block ends,
    even with weakened checks to support dictionary reuse
    (which would make it much slower btw).
    Anyway, atm I'm not interested.

  12. #12
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    563
    Thanks
    213
    Thanked 200 Times in 93 Posts
    Ah, I understand the problems with speed/detection now. Perhaps it would be interesting to add a faster mode anyway, although it would miss some LZMA streams, but it of course depends on a lot of things - mainly on how notable the speed gain is and how high the percentage of remaining streams is.

    Quote Originally Posted by Shelwien View Post
    > 3- release source code or combine with "precomp"..

    I can probably make a library if schnaader is interested.
    As to source, its not really a problem to release it, but
    I doubt that it would be of any help to schnaader, as its not
    in a library form atm.
    Combining with Precomp would be nice, but too soon atm. There are some Precomp changes I'd prefer to do before this and I guess lzmarec will also be improved in the next time. Also, lzmarec fits in the usual chains just fine (f.e. Precomp->lzmarec->srep->7-Zip) and apart from FreeArc, we are far away from a combined "just one call/click" compressor.

    Quote Originally Posted by Shelwien View Post
    Hopefully there shouldn't be missing bytes anymore (please tell if
    you'd find such a case)
    Yes, the .7z and .xz files work now. By the way, XZ will likely be supported in the next Precomp release, so at least there, don't worry about LZMA2
    Last edited by schnaader; 3rd March 2011 at 03:31.
    http://schnaader.info
    Damn kids. They're all alike.

  13. #13
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,348
    Thanks
    212
    Thanked 1,012 Times in 537 Posts
    > Ah, I understand the problems with speed/detection now. Perhaps it
    > would be interesting to add a faster mode anyway, although it would
    > miss some LZMA streams, but it of course depends on a lot of things -
    > mainly on how notable the speed gain is and how high the
    > percentage of remaining streams is.

    The main idea for that would be to reduce the set of tested lzma
    options, because of 225 combinations most are likely never used.
    But atm I don't have a list of common lzma options... from what
    I tested, there're only 08 5D 5E 6C option byte values, but there
    should be more.
    Reducing 225 to ~4 iterations would certainly give it a huge speed
    boost though :)

    > Combining with Precomp would be nice, but too soon atm. There are
    > some Precomp changes I'd prefer to do before this and I guess
    > lzmarec will also be improved in the next time.

    That's ok, but its interesting what kind of API you'd suggest
    for such a library.

    > Also, lzmarec fits in the usual chains just fine (f.e.
    > Precomp->lzmarec->srep->7-Zip)

    More like precomp->lzmarec->srep->7-Zip->lzmarec then :)
    Also I wonder whether doing srep before recompression is actually a bad idea...

    > Yes, the .7z and .xz files work now.

    Good, thanks. I wonder if we'd be able to locate lzma streams in some
    unexpected places... like, for example, its used to compress jpeg metainfo
    by winzip jpeg codec.

    > By the way, XZ will likely be supported in the next Precomp release,
    > so at least there, don't worry about LZMA2

    lzma is the first case where I had to recompress modern arithmetic coding,
    so there were several technical problems and overall it should be
    a useful experience.
    But I don't see anything like that in lzma2, so I'd leave it to you :)

  14. #14
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    563
    Thanks
    213
    Thanked 200 Times in 93 Posts
    Quote Originally Posted by Shelwien View Post
    > Combining with Precomp would be nice, but too soon atm. There are
    > some Precomp changes I'd prefer to do before this and I guess
    > lzmarec will also be improved in the next time.

    That's ok, but its interesting what kind of API you'd suggest
    for such a library.
    At the moment, Precomp still uses lots of temporary files, two simple functions (like lzma_recompress() and lzma_restore()) that work on files would be enough for that. But as I want to leave that path soon, something that works in memory would be better. Although the API is not perfect, gzip/bzip2 are my favorites at the moment because their state-machine streaming approach makes it possible to pass the file in parts and keep memory use low for cases where you don't need the (whole) output data or can pass it on to other parts of the chain right away.

    Quote Originally Posted by Shelwien View Post
    > Also, lzmarec fits in the usual chains just fine (f.e.
    > Precomp->lzmarec->srep->7-Zip)

    More like precomp->lzmarec->srep->7-Zip->lzmarec then
    Also I wonder whether doing srep before recompression is actually a bad idea...
    Some tests with FlashMX.pdf until I find a better testset:

    Code:
    PDF:          4,526,946
    PCF:         26,935,352
    PCF+srep08:  19,392,712
    PCF+srep295: 17,297,784
    
    PDF+7z:         3,706,680
    PDF+7z+lzmarec: 3,657,702 (98.68%)
    
    PCF+7z:         2,753,224
    PCF+7z+lzmarec: 2,603,437 (94.56%)
    
    PCF+srep08+7z:         2,754,817
    PCF+srep08+7z+lzmarec: 2,616,778 (94,99%)
    
    PCF+srep295+7z:         2,754,021
    PCF+srep295+7z+lzmarec: 2,623,245 (95,25%)
    
    7z = 7-Zip GUI, setting Ultra
    lzmarec = lzmarec v4
    PCF = Precomp 0.4.1 -c-
    At least in this case, it seems using srep before 7-Zip makes lzmarec results worse, but only a bit. I'll look for a testset that is much larger, contains some lzma streams itself and where srep improves 7-Zip ratio.
    Last edited by schnaader; 3rd March 2011 at 21:31.
    http://schnaader.info
    Damn kids. They're all alike.

  15. #15
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,348
    Thanks
    212
    Thanked 1,012 Times in 537 Posts
    > At the moment, Precomp still uses lots of temporary files, two
    > simple functions (like lzma_recompress() and lzma_restore()) that
    > work on files would be enough for that.

    That's funny, because this is lzmarec's main():
    Code:
    ALIGN(4096) 
    static union {
      LzmaModel<1> M0;
      LzmaModel<0> M1;
    };
    
    int main( int argc, char** argv ) {
      if( argc<4 ) return 1;
      FILE* f = fopen( argv[2], "rb" ); if( f==0 ) return 1;
      FILE* g = fopen( argv[3], "wb" ); if( g==0 ) return 2;
      if( argv[1][0]=='c' ) {
        M0.encodefile( f, g );
      } else {
        M1.decodefile( f, g );
      }
      fclose( f );
      fclose( g );
      return 0;
    }
    However all the detection/decoding/recompression is done in memory
    and files are read/written only sequentially, so it can be easily
    modified to work with pipes (tell me if anybody needs that).

    > But as I want to leave that path soon, something that works in
    > memory would be better. Although the API is not perfect, gzip/bzip2
    > are my favorites at the moment because their state-machine streaming
    > approach makes it possible to pass the file in parts and keep memory
    > use low for cases where you don't need the (whole) output data or
    > can pass it on to other parts of the chain right away.

    Once again I'd like to remind about that - http://stackoverflow.com/questions/4...-demo-source-2
    Afaik, all the other possibilities (iterators,callbacks,state machines)
    are much harder to implement and maintain.
    For example, the original lzma decoder is implemented 2 times in the sources -
    real decoder and a "dummy" which only counts bytes which real one would read.
    So real decoder works until less than LZMA_REQUIRED_INPUT_MAX (=20) bytes
    left in the buffer, then it starts "data mining" by testing whether
    there're enough bytes for each iteration.
    And, I guess, it may be a cool speed optimization, especially if you have
    a way to automatically generate the "checked" decoder instance,
    but do you really want to maintain such code?..

    > At least in this case, it seems using srep before 7-Zip makes
    > lzmarec results worse, but only a bit.

    Well, I meant that using srep before lzmarec can reduce the amount
    of non-lzma data to check, and thus significantly improve the speed,
    taking into account how slow lzmarec detection is.
    But anyway, thanks for testing that.

  16. #16
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    I think that it could be better to dump (or make optional) stream detection and implement support for common LZMA file types. Sure reduces strength, but only in rare cases and performance improvement would be big.
    Though I still think that just making it a full general purpose codec would be better.

    However IF you intend to support this recompression code, I would find a use for it in the precomp competitor I will be writing soon.

  17. #17
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,348
    Thanks
    212
    Thanked 1,012 Times in 537 Posts
    > I think that it could be better to dump (or make optional) stream
    > detection and implement support for common LZMA file types.

    Actually its what I did before making a detector -
    here's the coder used in v3-4 for consistency
    http://nishi.dreamhosters.com/u/lzmarec_v1a_bin.rar
    (compression w/o detector should be slightly better because lzmarec
    adds some overhead for lossless coding of incomplete streams).

    > Sure reduces strength, but only in rare cases and performance
    > improvement would be big.

    That's not exactly true, because the delay before lzmarec finds
    the lzma stream in a .7z archive is insignificant, and then
    processing is the same. In fact, integrated lzmarec would be
    probably faster than v1 parts working with intermediate dump file
    w/o detection.

    > Though I still think that just making it a full general purpose codec would be better.

    That's what I'm trying to do, although in a very roundabout way.
    I need (1) LZ77 codec; (2) not worse compression than lzma; (3) compatible
    with my framework, which lzma is not.
    But I never paid much attention to LZ (and still not really interested),
    so (2) is hard. And refactoring lzma to fit my needs is hard too, guessing
    from the time it took to refactor just the decoder (also its not fun to
    use a 3rd party codec when I can make one myself, given time).
    So I thought up a solution like this: improve the CM entropy coding part
    of lzma (which I know how to do), then write a simple encoder for it
    (w/o full parsing optimization etc) - worse LZ + better CM = ~lzma result
    with completely new codec (and encoder can be later improved too).
    lzmarec is just a side effect - I wanted to release (and test etc) the
    new entropy coder in some form, also collect info about recompressor design,
    as I plan to continue with more popular formats (deflate,jpeg,mp3).
    (I don't intend to directly compete with precomp though - I'm writing
    an integrated archiver).

    > However IF you intend to support this recompression code, I would
    > find a use for it in the precomp competitor I will be writing soon.

    Its not a problem if I don't have to write some new complex logic for it
    or redesign something. Its actually a very small project (~30k C++ source -
    smaller than original lzma decoder) and I don't have any reason to continue
    working on it, but I'm willing to fix bugs if somebody reports them, and
    make a library for it with encodefile() and decodefile() mentioned above.
    (Alternatively, a coroutine API - see ProcessFile() in http://stackoverflow.com/questions/3...ne-demo-source
    is also possible, I can export a function similar to coro_process() there).

  18. #18
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Shelwien View Post
    > I think that it could be better to dump (or make optional) stream
    > detection and implement support for common LZMA file types.

    Actually its what I did before making a detector -
    here's the coder used in v3-4 for consistency
    http://nishi.dreamhosters.com/u/lzmarec_v1a_bin.rar
    (compression w/o detector should be slightly better because lzmarec
    adds some overhead for lossless coding of incomplete streams).

    > Sure reduces strength, but only in rare cases and performance
    > improvement would be big.

    That's not exactly true, because the delay before lzmarec finds
    the lzma stream in a .7z archive is insignificant, and then
    processing is the same. In fact, integrated lzmarec would be
    probably faster than v1 parts working with intermediate dump file
    w/o detection.
    Unless the .7z comes with a PPMd stream. :P
    The current implementation might be good on average, as long as one limits the use to .7z and .xz, but what concerns me is that it's speed is highly unstable. Also when there's significant amount of non-LZMA data (i.e. many FreeArc archives), the average drops a lot.
    I think that good support for a few types or even one is better than not-so-great support for many more, especially when almost all world's LZMA data is in the form of .7z with some as .xz, .msi, .exe and very little elsewhere. Though it's just my point of view.

    Quote Originally Posted by Shelwien View Post
    > Though I still think that just making it a full general purpose codec would be better.

    That's what I'm trying to do, although in a very roundabout way.
    I need (1) LZ77 codec; (2) not worse compression than lzma; (3) compatible
    with my framework, which lzma is not.
    But I never paid much attention to LZ (and still not really interested),
    so (2) is hard. And refactoring lzma to fit my needs is hard too, guessing
    from the time it took to refactor just the decoder (also its not fun to
    use a 3rd party codec when I can make one myself, given time).
    So I thought up a solution like this: improve the CM entropy coding part
    of lzma (which I know how to do), then write a simple encoder for it
    (w/o full parsing optimization etc) - worse LZ + better CM = ~lzma result
    with completely new codec (and encoder can be later improved too).
    I understand.
    Quote Originally Posted by Shelwien View Post
    lzmarec is just a side effect - I wanted to release (and test etc) the
    new entropy coder in some form, also collect info about recompressor design,
    as I plan to continue with more popular formats (deflate,jpeg,mp3).
    (I don't intend to directly compete with precomp though - I'm writing
    an integrated archiver).
    Nice that there are more people that got interested in recompression. I think that now it's one of the things that give the best gain/work ratio.

    Quote Originally Posted by Shelwien View Post
    > However IF you intend to support this recompression code, I would
    > find a use for it in the precomp competitor I will be writing soon.

    Its not a problem if I don't have to write some new complex logic for it
    or redesign something. Its actually a very small project (~30k C++ source -
    smaller than original lzma decoder) and I don't have any reason to continue
    working on it, but I'm willing to fix bugs if somebody reports them
    Bugs are the core of what I meant. I'd like to avoid having to dive into totally uncommented (just a guess) and likely complicated code that does things out of the scope of my knowledge or interest (CM).

    Quote Originally Posted by Shelwien View Post
    , and
    make a library for it with encodefile() and decodefile() mentioned above.
    (Alternatively, a coroutine API - see ProcessFile() in http://stackoverflow.com/questions/3...ne-demo-source
    is also possible, I can export a function similar to coro_process() there).
    For my use a simple streaming interface would be best, in my program a codec is something that takes a file or a stream and returns a set of files or streams.
    Though my use for it is a rather distant future, end of this year maybe, I'm going to do solely deflate now.

  19. #19
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,348
    Thanks
    212
    Thanked 1,012 Times in 537 Posts
    > Nice that there are more people that got interested in recompression.

    Technically, I was one of the first though
    http://replay.waybackmachine.org/200...bout_us_i.html
    (actually made in 2004)

    http://nishi.dreamhosters.com/u/pjpg_v0_bin.rar

    > I think that now it's one of the things that give the best gain/work ratio.

    That's unlikely. Making new codecs is much easier, and making GUI apps from a bunch
    of 3rd-party components is more profitable.
    But at this point it became hard to compete with existing archive formats
    without some major changes - further breakthroughs in universal compression
    are very unlikely, and recompression seems like the main option.
    Stuffit, which currently specializes in it, is not especially popular though.

    > Bugs are the core of what I meant. I'd like to avoid having to dive
    > into totally uncommented (just a guess) and likely complicated code

    Well, I don't plan to disappear anytime soon.

    > that does things out of the scope of my knowledge or interest (CM).

    Now you're wrong here. Recompression = format parser + custom CM coder.
    Deflate is the only exception where you usually can reproduce the
    original binary stream given uncompressed data - because zlib is used
    everywhere (so encoding algo is the same) _and_ because its fast
    enough for option bruteforce to make sense - eg. for lzma it won't work
    the same, as there're much more encoding options + encoder versions,
    and encoding is 1000x slower than deflate.

    So for most formats, the only practical method is entropy coding improvement,
    I'd even tested it with http://encode.su/threads/271-mp3dump
    Its clear that even best available universal compressors are not
    able to gain enough at uncompressed structured data.
    So the only working solution for most formats is to provide a specialized
    compression backend with the parser (its also the best considering
    processing speed).

    And actually that also applies to deflate - "incompatible" deflate streams
    multiply with time, and afaik schnaader is already experimenting with
    an approach similar to what I did in lzmarec.

  20. #20
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Shelwien View Post
    > Nice that there are more people that got interested in recompression.

    Technically, I was one of the first though
    http://replay.waybackmachine.org/200...bout_us_i.html
    (actually made in 2004)

    http://nishi.dreamhosters.com/u/pjpg_v0_bin.rar
    Interesting. I saw you listed as a team member, but because I saw your later experiments with MP3 recompression and it didn't look like you got experience with it, I thought there's error somewhere. Could you shed some more light?
    Quote Originally Posted by Shelwien View Post
    > I think that now it's one of the things that give the best gain/work ratio.

    That's unlikely. Making new codecs is much easier, and making GUI apps from a bunch
    of 3rd-party components is more profitable.
    Profit is another matter, I was talking compression only. And I don't think that saving 10-20% on some popular file type with a better codec is easier than saving 10-20% with recompression.
    Quote Originally Posted by Shelwien View Post
    But at this point it became hard to compete with existing archive formats
    without some major changes - further breakthroughs in universal compression
    are very unlikely, and recompression seems like the main option.
    I see one more easy gain that I find orthogonal to recompression. Splitting files (and streams) out of containers and treating them according to what they are. The most extreme example that I see is bit-lossy compression of VM images. Parse extfs structures defragmenting files and removing deleted stuff, then apply file-specific codecs and you can easily halve the size (compared to straight compression). And virtualization is a big and growing theme, VM images get backed up, there's quite a lot of money to be made this way. And the main reason why I'm not implementing it is that it's a monkey job.
    Aside from the VM stuff, splitting a'la rar is far more interesting and at least for now would be good though.

    Quote Originally Posted by Shelwien View Post
    Stuffit, which currently specializes in it, is not especially popular though.
    Stuffit problem is not compression. I don't know what is, but compression is great.

    Quote Originally Posted by Shelwien View Post
    > Bugs are the core of what I meant. I'd like to avoid having to dive
    > into totally uncommented (just a guess) and likely complicated code

    Well, I don't plan to disappear anytime soon.

    > that does things out of the scope of my knowledge or interest (CM).

    Now you're wrong here. Recompression = format parser + custom CM coder.
    CM is just the thing that you're applying, it might be the best here but not the only and certainly - not the only way to do recompression.

    Quote Originally Posted by Shelwien View Post
    Deflate is the only exception where you usually can reproduce the
    original binary stream given uncompressed data - because zlib is used
    everywhere (so encoding algo is the same) _and_ because its fast
    enough for option bruteforce to make sense - eg. for lzma it won't work
    the same, as there're much more encoding options + encoder versions,
    and encoding is 1000x slower than deflate.
    Bzip2. FLAC. I didn't think about it much further than this though. I see that many multimedia formats have larger number of competing encoders.

    Quote Originally Posted by Shelwien View Post
    So for most formats, the only practical method is entropy coding improvement,
    I'd even tested it with http://encode.su/threads/271-mp3dump

    Its clear that even best available universal compressors are not
    able to gain enough at uncompressed structured data.
    So the only working solution for most formats is to provide a specialized
    compression backend with the parser (its also the best considering
    processing speed).

    And actually that also applies to deflate - "incompatible" deflate streams
    multiply with time, and afaik schnaader is already experimenting with
    an approach similar to what I did in lzmarec.
    Interesting.

  21. #21
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,348
    Thanks
    212
    Thanked 1,012 Times in 537 Posts
    > Interesting. I saw you listed as a team member,

    In fact I don't know half of the people listed there :)
    (some of them probably written the GUI though).
    Afair it went like this: Dmitry Vatolin came up with the idea
    to recompress mp3s, then Alexey Grishin (who's not mentioned there :)
    made the original mp3 parser and A.Sterjantov made the coder
    (barbershop.mp3 4282121->3992667), then I was called to improve
    compression, which I did (->3857515), but later had to write a new parser too
    (there were severe issues with mp3 extensions, fseek calls in the
    parser, lossless coding of broken frames etc).
    There was a notable problem with my first model though - it was made
    with float-point, so each executable build was compatible only with itself :)
    So after some workaround attempts (like a float emulation class) I
    had to rewrite the model again with integer components, and along
    the way compression improved again (->3775712).
    Then there were experiments with controlled information loss, where
    David Kharabadze was mainly involved afair, and then this ended up
    with a proprietary lossy audio codec development somehow
    (which continues even now).
    Anyway, its a fact that I'm the only person responsible for the
    final mp3zip/soundslimmer compression engine, and since then I didn't
    hear anything from them (soundgenetics) about its maintenance,
    so I'd not expect any news on that side.

    > but because I saw your later experiments with MP3 recompression and
    > it didn't look like you got experience with it,

    That wasn't my experiment technically. I signed a NDA so I can't just
    release the mp3zip source even now.
    So I tried posting just the parser (soundgenetics can't have
    exclusive rights on mp3 parsing :), to see whether its possible to
    gain something with modern universal coders and simple
    preprocessing, but in the end had to do most of the work myself :)

    > I was talking compression only. And I don't think that saving 10-20%
    > on some popular file type with a better codec is easier than saving
    > 10-20% with recompression.

    And I meant lossy codecs. Its much easier to make a new lossy format
    with 10-20% smaller size at the same perceptible quality, than make
    a lossless recompressor with a similar effect.
    Also the people with money (corporations etc) surprisingly don't
    care about losslessness. They do care about compression (well, it
    can be directly recalculated into money via bandwidth and storage costs),
    but lossless reconstruction of popular formats is more of a bother
    for them (because of licensing etc) - what they need is a proprietary
    format with all the possible DRM and watermarking options.

    > I see one more easy gain that I find orthogonal to recompression.
    > Splitting files (and streams) out of containers and treating them
    > according to what they are. The most extreme example that I see is
    > bit-lossy compression of VM images.

    That sounds like lossy recompression (it exists too; for example,
    mp3zip can discard irrelevant mp3 header details, which don't
    affect decoded wavs (3775712->3741518), and stuffit has such options
    for jpeg recompression).

    > And virtualization is a big and growing theme, VM images get backed
    > up, there's quite a lot of money to be made this way.

    Yeah, seems that Lasse already noticed that - http://exdupe.com/
    Also seems that its more about speed than compression though.

    > And the main reason why I'm not implementing it is that it's a monkey job.

    You are wrong. Lossless recompression is tricky even for a simple format,
    it requires more thinking than coding.
    And if its still not good enough for you, its possible to aim higher and
    try making a specialized parser generator (like http://flavor.sourceforge.net/ ,
    but something convenient for recompression).

    > Aside from the VM stuff, splitting a'la rar is far more interesting
    > and at least for now would be good though.

    "splitting a'la rar"? Are you talking about rar's codec switching and filters?
    Well, that's kinda obvious.

    > Stuffit problem is not compression. I don't know what is, but compression is great.

    Its stability is at nearly WinRK level, and its hard to get a trial version,
    and its GUI is annoying, and afair there's no reasonable CL version, so most benchmark
    sites can't test it. Also I guess they're happy enough with their Mac users and
    don't care much about windows side (there may be more people, but less money).

    > CM is just the thing that you're applying, it might be the best here
    > but not the only and certainly - not the only way to do recompression.

    No, its the only thing that makes sense there, not what I'm applying.
    When a coder chooses a code based on contextual statistics, its a CM, right?
    But that applies even to mainline jpeg. And lzma's entropy coder is a plain
    order1 bitwise CM, very similar to fpaq0p and such.
    Anyway, any "statistical coding" or "entropy coding" is basically CM,
    and for compression of format parser output its usually the main thing;
    I can't even think of a case where some transforms like LZ or BWT could
    be more important than entropy coding (for structured parser output) -
    usually they just don't make any sense at all.

    Also, when you're trying to losslessly recompress even a simple format,
    like AVI container (RIFF), for example, you'd usually end up with
    additional flags (like whether a certain structure is valid or whether
    it can be reproduced from other data with original encoder's algorithm),
    and its much cleaner (and faster) to encode these flags with a rangecoder,
    than ensure the losslessness with some other methods (eg. error correction schemes),
    or manually design an efficient bitcode for the flags.

    > 5.pjg

    Do you mean that it can't be decoded? That's ok actually :)
    Its a discontinued alpha version - I underestimated the paq8 jpeg model and hoped
    to beat it with a simple (but properly tuned) model, which obviously failed.
    To compete with paq, I need a completely different model design, and there're some
    issues with format parser (like no support for "progressive" jpegs), so I'd just rewrite
    it from scratch next time. The experience was very useful though

  22. #22
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Shelwien View Post
    > Interesting. I saw you listed as a team member,

    In fact I don't know half of the people listed there
    (some of them probably written the GUI though).
    Afair it went like this: Dmitry Vatolin came up with the idea
    to recompress mp3s, then Alexey Grishin (who's not mentioned there
    made the original mp3 parser and A.Sterjantov made the coder
    (barbershop.mp3 4282121->3992667), then I was called to improve
    compression, which I did (->3857515), but later had to write a new parser too
    (there were severe issues with mp3 extensions, fseek calls in the
    parser, lossless coding of broken frames etc).
    There was a notable problem with my first model though - it was made
    with float-point, so each executable build was compatible only with itself
    So after some workaround attempts (like a float emulation class) I
    had to rewrite the model again with integer components, and along
    the way compression improved again (->3775712).
    Then there were experiments with controlled information loss, where
    David Kharabadze was mainly involved afair, and then this ended up
    with a proprietary lossy audio codec development somehow
    (which continues even now).
    Anyway, its a fact that I'm the only person responsible for the
    final mp3zip/soundslimmer compression engine, and since then I didn't
    hear anything from them (soundgenetics) about its maintenance,
    so I'd not expect any news on that side.

    > but because I saw your later experiments with MP3 recompression and
    > it didn't look like you got experience with it,

    That wasn't my experiment technically. I signed a NDA so I can't just
    release the mp3zip source even now.
    So I tried posting just the parser (soundgenetics can't have
    exclusive rights on mp3 parsing , to see whether its possible to
    gain something with modern universal coders and simple
    preprocessing, but in the end had to do most of the work myself
    Thanks for the story.

    Quote Originally Posted by Shelwien View Post
    > I was talking compression only. And I don't think that saving 10-20%
    > on some popular file type with a better codec is easier than saving
    > 10-20% with recompression.

    And I meant lossy codecs. Its much easier to make a new lossy format
    with 10-20% smaller size at the same perceptible quality, than make
    a lossless recompressor with a similar effect.
    Also the people with money (corporations etc) surprisingly don't
    care about losslessness. They do care about compression (well, it
    can be directly recalculated into money via bandwidth and storage costs),
    but lossless reconstruction of popular formats is more of a bother
    for them (because of licensing etc) - what they need is a proprietary
    format with all the possible DRM and watermarking options.
    Mhm. I don't like it, but it's not unreasonable. For a consumer it doesn't matter, they won't notice a difference. I find lossless multimedia coding viable only for archival and it's more because I don't like throwing things away than because I spot the difference.

    Quote Originally Posted by Shelwien View Post
    > I see one more easy gain that I find orthogonal to recompression.
    > Splitting files (and streams) out of containers and treating them
    > according to what they are. The most extreme example that I see is
    > bit-lossy compression of VM images.

    That sounds like lossy recompression (it exists too; for example,
    mp3zip can discard irrelevant mp3 header details, which don't
    affect decoded wavs (3775712->374151, and stuffit has such options
    for jpeg recompression).
    With VMs lossy, in general both lossy and lossless. Take Ocarina for example.

    Quote Originally Posted by Shelwien View Post
    > And virtualization is a big and growing theme, VM images get backed
    > up, there's quite a lot of money to be made this way.

    Yeah, seems that Lasse already noticed that - http://exdupe.com/
    Also seems that its more about speed than compression though.
    Cool. I like it. It's only the first part of what I meant (i.e. no file-specific stuff if I see correctly, I guess no solid mode (where splitting lets you sort files better) either), but still nice.

    Quote Originally Posted by Shelwien View Post
    > And the main reason why I'm not implementing it is that it's a monkey job.

    You are wrong. Lossless recompression is tricky even for a simple format,
    it requires more thinking than coding.
    When considering containers only? All cases that I considered (MPEG PS, zip, tar) are just trivial and I can't think about a single design that would make it hard to do (assuming container specification is available).
    Quote Originally Posted by Shelwien View Post
    And if its still not good enough for you, its possible to aim higher and
    try making a specialized parser generator (like http://flavor.sourceforge.net/ ,
    but something convenient for recompression).
    I want to make a basic framework for combining splitters and decompressors, what comes later - I don't know.

    Quote Originally Posted by Shelwien View Post
    > Aside from the VM stuff, splitting a'la rar is far more interesting
    > and at least for now would be good though.

    "splitting a'la rar"? Are you talking about rar's codec switching and filters?
    Well, that's kinda obvious.
    Maybe it's kinda obvious but somehow rarely used. Rar, nanozip, (?) paq, maybe some obscure stuff like StuffIt. Emerging formats that want to rule the world all (with a possible exception of ZPAQ, I don't know) miss it.

    Quote Originally Posted by Shelwien View Post
    > Stuffit problem is not compression. I don't know what is, but compression is great.

    Its stability is at nearly WinRK level, and its hard to get a trial version,
    and its GUI is annoying, and afair there's no reasonable CL version, so most benchmark
    sites can't test it. Also I guess they're happy enough with their Mac users and
    don't care much about windows side (there may be more people, but less money).
    Mhm.

    Quote Originally Posted by Shelwien View Post
    > CM is just the thing that you're applying, it might be the best here
    > but not the only and certainly - not the only way to do recompression.

    No, its the only thing that makes sense there, not what I'm applying.
    When a coder chooses a code based on contextual statistics, its a CM, right?
    But that applies even to mainline jpeg. And lzma's entropy coder is a plain
    order1 bitwise CM, very similar to fpaq0p and such.
    Anyway, any "statistical coding" or "entropy coding" is basically CM,
    and for compression of format parser output its usually the main thing;
    I can't even think of a case where some transforms like LZ or BWT could
    be more important than entropy coding (for structured parser output) -
    usually they just don't make any sense at all.

    Also, when you're trying to losslessly recompress even a simple format,
    like AVI container (RIFF), for example, you'd usually end up with
    additional flags (like whether a certain structure is valid or whether
    it can be reproduced from other data with original encoder's algorithm),
    and its much cleaner (and faster) to encode these flags with a rangecoder,
    than ensure the losslessness with some other methods (eg. error correction schemes),
    or manually design an efficient bitcode for the flags.
    I guess you're right. I don't really want to discuss it as CM is out of the scope of my interest for now.

    Quote Originally Posted by Shelwien View Post
    > 5.pjg

    Do you mean that it can't be decoded? That's ok actually
    Its a discontinued alpha version - I underestimated the paq8 jpeg model and hoped
    to beat it with a simple (but properly tuned) model, which obviously failed.
    To compete with paq, I need a completely different model design, and there're some
    issues with format parser (like no support for "progressive" jpegs), so I'd just rewrite
    it from scratch next time. The experience was very useful though
    Actually I didn't notice that it doesn't decompress.
    I wanted to show you that I found a file that compresses normally with PackJPG and not at all with pjpg_v0, so when you get back into it one day, you could have a good starting point for improvement. But I guess that as you say it's not worthwhile to do it w/out full redesign, the sample is irrelevant.

  23. #23
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,348
    Thanks
    212
    Thanked 1,012 Times in 537 Posts
    > Mhm. I don't like it, but it's not unreasonable. For a consumer it
    > doesn't matter, they won't notice a difference.

    Yes; also the content providers who're mostly interested in saving
    bandwidth usually do have the original content, so having to recode
    between lossy formats is not a problem for them.
    Well, the situation somewhat changed since 2005 though, as now
    there're many filehostings and such.

    > I find lossless multimedia coding viable only for archival and it's
    > more because I don't like throwing things away than because I spot
    > the difference.

    Actually there're no problems with playback from recompression formats -
    for windows its a matter of a single directshow filter.

    >> You are wrong. Lossless recompression is tricky even for a simple format,
    >> it requires more thinking than coding.

    > When considering containers only? All cases that I considered
    > (MPEG PS, zip, tar) are just trivial and I can't think about a
    > single design that would make it hard to do (assuming container
    > specification is available).

    1. Specification being available doesn't mean that its correct
    and comprehensible. Are you that confident in your patent language skills?

    2. It might become less trivial if you'd consider sequential processing,
    which is a common practical requirement.

    3. The main problem usually is about lossless coding of slightly broken streams,
    especially with sequential processing - you have to handle parsing errors at
    any point, and discarding the whole stream in such cases is usually not an option.
    Also frequently it requires format changes. For example, my improved lzma model
    couldn't handle distance=-1, but I had to add a special flag for coding of that,
    because otherwise it was impossible to correctly terminate the SDK streams.
    It could be much more complicated if I tried to avoid extra redundancy there.

    > I want to make a basic framework for combining splitters and decompressors,
    > what comes later - I don't know.

    Here's another topic to discuss then -
    http://stackoverflow.com/questions/5...-a-byte-stream

    > Maybe it's kinda obvious but somehow rarely used.

    Not really. I'd also mention durilca,WinRK,uharc,ace (there're more probably; I don't
    really care to look into that). Also CCM has filters and Christian clearly intended
    to do codec switching.
    So its basically only zip and 7z that don't support it (and its probably still possible
    in 7z via multiway filters like bcj2).

    > I guess you're right. I don't really want to discuss it as CM is out of the scope
    > of my interest for now.

    I'd consider it in your place. Not necessarily CM, but arithmetic coding at least.
    There're special issues about its termination, so it requires special handling.
    Though if you're ok with adding redundancy by design, random access, and temp files,
    I guess its ok to just write it how its convenient for you.

  24. #24
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,348
    Thanks
    212
    Thanked 1,012 Times in 537 Posts
    Seems that I need to clarify some "known issues".

    1. LZMA stream detection works by attempting lzma decoding from each byte in the file,
    and 225 times per byte, because it tries all the possible lc/lp/pb combinations.
    When random data are decoded as lzma code, usually an error appears at some point
    (match distance out of bounds most of the time), normally it takes 20-40 input bytes to encounter
    such an error, but it can take much longer when data can be decoded as a sequence of literals
    (especially runs of 0).
    Thus the detection speed is 2-3kb/s on average, but goes down to 1byte/s at zero runs and the like.
    I can add a workaround for zero runs I guess, if somebody needs that.
    Also any ideas about detection speed improvement are welcome; I won't do 7z/xz detection though.

    2. Currently complete streams (where rc flush matches) of <1024 bytes are discarded, also incomplete
    streams (terminated on error) of ~<1500 bytes (1024 + size of 2048 compressed cached LZ records).

    3. When a long lzma stream is detected, its decoded without further detection until the end, so processing
    speed is much faster, around 2MB/s at least.

    4. lzmarec statically allocates 128M x2 for lzma and coder windows (they can be merged, but i'm lazy).
    So atm lzmarec can't fully parse long (>128M uncompressed) lzma streams with >128M dictionary setting.
    Also its behavior with streams longer than 4G is somewhat unpredictable.

  25. #25
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,348
    Thanks
    212
    Thanked 1,012 Times in 537 Posts
    http://nishi.dreamhosters.com/u/lzmarec_v4a_bin.rar

    17-03-2011 14:21 v4a
    + FIX: incomplete write for <128k files
    + FIX: crash on decoding of random files
    + stdio-to-stdout processing mode
    + heuristic for faster detection

  26. #26
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,348
    Thanks
    212
    Thanked 1,012 Times in 537 Posts
    http://nishi.dreamhosters.com/u/lzmarec_v4b_bin.rar

    22-03-2011 18:34 v4b
    + FIX: wrong handling of output buffer overflow in detector
    + internal rc flush coding in lzma restore
    + minor speed optimizations

    @schnaader: thanks for finding that bug

  27. #27
    Member chornobyl's Avatar
    Join Date
    May 2008
    Location
    ua/kiev
    Posts
    153
    Thanks
    0
    Thanked 0 Times in 0 Posts

    feature request

    Early version or lzmarec where able to partially decompress arbitrary lzma streams, even incomplete.
    It would be usefull to see full featured unpacker, like stuns.

  28. #28
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,348
    Thanks
    212
    Thanked 1,012 Times in 537 Posts
    Ah, yeah, thanks for reminding about that.
    But I'm kinda too lazy, so how about making a stream extractor using lzmarec logs?
    I mean, lzmarec prints the stream offsets,lens and option bytes, so it should be possible
    to extract the streams into files and decode them.

  29. #29
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,348
    Thanks
    212
    Thanked 1,012 Times in 537 Posts
    So I made it just like it was suggested before:
    http://nishi.dreamhosters.com/u/lzmadump_v0.rar

  30. #30
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,348
    Thanks
    212
    Thanked 1,012 Times in 537 Posts
    ..And even more - now it zeroes out the detected lzma streams in a copy of original file,
    and is able to insert back the exported streams:

    http://nishi.dreamhosters.com/u/lzmadump_v1.rar

Page 1 of 2 12 LastLast

Similar Threads

  1. lzma submodel shares / redundancy measurement
    By Shelwien in forum Data Compression
    Replies: 21
    Last Post: 10th December 2010, 19:42
  2. LZMA source
    By Shelwien in forum Data Compression
    Replies: 2
    Last Post: 29th March 2010, 19:45
  3. Implementation of JPEG2000 LLC based on LZMA
    By Raymond_NGhM in forum Data Compression
    Replies: 0
    Last Post: 19th March 2010, 02:14
  4. LZMA SDK is Public Domain now
    By Vacon in forum Data Compression
    Replies: 1
    Last Post: 27th November 2008, 21:07
  5. Data Compression Book with LZMA description [!]
    By encode in forum Forum Archive
    Replies: 11
    Last Post: 5th April 2008, 20:33

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •