Results 1 to 10 of 10

Thread: seg_file

  1. #1
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    213
    Thanked 1,018 Times in 540 Posts

    seg_file

    > how about making a small theme about durilca segmentation
    > and seg_file so we can point to it in the future?

    Well, here you go:

    1. I reformatted and patched some more stuff in seg_file
    (got rid of some of its dynamic allocation mostly).
    And also added an "unseg_file" utility to reconstruct
    the file from segments.

    http://ctxmodel.net/files/PPMd/segfile_sh1.rar

    Now its used like this:
    Code:
      seg_file.exe acrord32.exe
      unseg_file acrord32.unp
    2. The seg_file utility is written by Dmitry Shkarin,
    with original version at http://compression.ru/ds/seg_file.rar.
    It splits the file into several "segments", containing data blocks
    with similar statistics, while between different segments statistics
    (supposedly) don't match.
    Its estimation is based on a bytewise frequency model
    with simple hashed contexts (only 256 contexts by default, but tunable).

    3. This same (or unnoticeably improved maybe) algorithm is used in
    Shkarin's durilca compressors http://compression.ru/ds/durilca.rar
    for -t1 mode.

    Older versions of durilca had a hidden -l option for dumping the
    segments, in a way similar to what seg_file does.
    http://shelwien.googlepages.com/durilca2_002a.rar

    However, durilca -t1 -l might generate more segments
    due to disasm32 executable filter - the x86-like segments
    are processed with it and multiple preprocessed segments are
    generated on output.
    Last edited by Shelwien; 1st May 2009 at 21:29.

  2. #2
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    btw, i always wondered why -t1 is so slow (it makes durilca'light several times slower). may be, you know?

  3. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    213
    Thanked 1,018 Times in 540 Posts
    http://ctxmodel.net/files/PPMd/segfile_sh2.rar
    - Concatenated output added by toffer's request

    Code:
      seg_file.exe acrord32.exe
      unseg_file acrord32.unp
    
      seg_file1.exe acrord32.exe acrord32.seg
      unseg_file1 acrord32.seg acrord32.unp

  4. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    213
    Thanked 1,018 Times in 540 Posts
    Code:
                ppmd_sh8   +seg_file
    
    wcc386_exe    262611   258120
    
    A10_jpg       833406   833534
    acrord32_exe 1401739  1313479
    english_dic  1103549   995135
    FlashMX_pdf  3713922  3709897
    fp_log        586432   588381
    mso97_dll    1755440  1706441
    ohs_doc       798394   774183
    rafale_bmp    809858   846419
    vcfiu_hlp     611433   597859
    <total>     11614173 11365328

  5. #5
    Member Skymmer's Avatar
    Join Date
    Mar 2009
    Location
    Russia
    Posts
    681
    Thanks
    38
    Thanked 168 Times in 84 Posts
    Shelwien, thanks for the tool but unfortunately both seg_file and seg_file1 are failed on two test files I've tried to use.
    First file is DynamicBulkFileTextures.blk from BioShock game. Size is 586 033 664 bytes. Both seg_file and seg_file1 just causing a crash. As always Windows gives its weird crash report message and although there is some usefull info I suppose (I mean modules names, stack content, memory ranges etc.) I can't see a way to attach it. According to Process Explorer both SEGs quickly allocate 574 068 K of memory, then after few seconds they try to get 575 236 K or 575 240 K of memory and then crash happens. For seg_file1 the zero size output file is created. By the way, seg_file does nothing when the input filename is omitted. seg_file1 does nothing if output file is ommited but craches when both input\output are omitted. It would nice to craft some error parsing system for them.
    Second file is from the BioShock too. Its 4-recreationLevel.blk file with size of 271 602 688 bytes. It causes no crash but seg_file tries to allocate up to 802 580 K of memory causing heavy HDD thrashing (for my system of course) and does nothing, just sits in memory and no output segments are written, even after 10-15 mins of waiting. seg_file1 does the same except it tried to allocate more memory - 851 120 K.
    I want to participate in improving of your program but just don't know how to give more effective report. Maybe I should upload both problematic files somewhere so you can have a closer look at it ?
    Anyway, thanks

  6. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    213
    Thanked 1,018 Times in 540 Posts
    1. As I mentioned before, this program requires 7*filesize of memory, so
    its really troublesome.
    2. Its not my program, and I'm not really interested in improving it
    (I'd just write my own when needed), so I won't add memory overflow
    checks either.
    3. seg_file and seg_file1 is the same program with slightly different
    output, so they can't really have different memory usage.
    Also, seg_file1 proved to be almost completely unusable, as it
    just reorders blocks in the file without allowing to flush the model
    between them.
    4. You still can try it with your files after splitting them into parts
    of reasonable size. I'd suggest using something like rar -m0 -v100m
    for that.

  7. #7
    Member Skymmer's Avatar
    Join Date
    Mar 2009
    Location
    Russia
    Posts
    681
    Thanks
    38
    Thanked 168 Times in 84 Posts
    Shelwien, although, honestly speaking, I didn't expected such "pessimistic" responce from you, I understand your point. But I want to tell you that seg_file certainly could be a part of the great compression chain we're pulling here, no matter how banaly it sounds. Furthermore how its not your tool ? Since you provided modified sources and EXEs, now its partially your tool. For example I can't see that PAQ contributors say "We dont know anything. Its Matt who started it all !" in case somebody report issues with their version.

  8. #8
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    213
    Thanked 1,018 Times in 540 Posts
    Well, I'd not say that if it was easy, you know.
    Okay, I guess it would be easy to add a try{} block around
    all the processing to catch memory overflows, but i don't
    like compiling executables with support for exceptions, it
    makes them slow and less portable.
    But if seg_file crashing on large files is a real problem for you,
    I probably still can add it - so that it would print some error
    message and exit - maybe after creating a few files etc, so
    still not completely safe.
    However, properly supporting large files requires a completely
    different algorithm, and memory consumption is not the main
    problem there - its complexity grows nonlinearly, and entropy
    model is not good enough either. And handling memory overflows
    in a recursive algorithm with dynamic allocation is pretty tricky too
    (without C++ exceptions), so I don't see a reason to waste time on it.
    Anyway, I think it can be used as it is for now - with files split
    into small enough blocks.
    And eventually I might write a new implementation, which is
    much more likely if somebody would volunteer to discuss
    technical details with me.
    And as for current seg_file, I guess I can do some maintenance which
    doesn't require more than a few minutes.
    Last edited by Shelwien; 3rd May 2009 at 06:32.

  9. #9
    Member
    Join Date
    Oct 2013
    Location
    Filling a much-needed gap in the literature
    Posts
    350
    Thanks
    177
    Thanked 49 Times in 35 Posts
    The link for Shelwein's modified seg_file (segfile_sh2, in post #3) no longer works. If anybody has that lying around, please repost. Thanks in advance.

  10. #10
    Member FatBit's Avatar
    Join Date
    Jan 2012
    Location
    Prague, CZ
    Posts
    189
    Thanks
    0
    Thanked 36 Times in 27 Posts
    Files enclosed.

    FatBit
    Attached Files Attached Files

  11. Thanks (2):

    comp1 (15th April 2016),Paul W. (15th April 2016)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •