Results 1 to 30 of 30

Thread: paq8pxv - virtual machine

  1. #1
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    465
    Thanks
    193
    Thanked 289 Times in 157 Posts

    paq8pxv - virtual machine

    paq8pxv_v3
    Code:
    • jpeg model cfg (paq7 model)
    jpeg.cfg:
    Code:
    • +2 new context (paq7 model + 2 contexts from paq8a)
    • fix hash table lookup, now compression as expected.
    This is based on PAQ8PXD_V62 and PAQ8PXD_V17v2 ( https://encode.su/showthread.php?p=47706#post47706 )
    First version (v1) ( https://encode.su/threads/1464-Paq8p...ll=1#post59098 )
    Original attempt here ( https://encode.su/threads/1464-Paq8p...ll=1#post42973 ) ( attempt to mix VM from fpaqvm (vm/jit) to paq8pxd_v16 )


    Code:
    File list (18 bytes)
    Compressed from 18 to 22 bytes.
    
    
    1/1  Filename: mill.jpg (7132151 bytes)
    Block segmentation:
     0           | jpeg      |   7132151 [0 - 7132150]
    
    
     Segment data size: 14 bytes
    
    
     TN |Type name |Count      |Total size
    -----------------------------------------
      3 |jpeg      |         1 |   7132151
    -----------------------------------------
    Total level  0 |         1 |   7132151
    
    
    jpeg      stream(1).  Total 7132151
    Compressed model from 16490 to 13387 bytes
    Compressing jpeg    stream(1).  Total 7132151
    JPEG 4912x3264 Stream(1) compressed from 7132151 to 5635767 bytes
     Segment data compressed from 14 to 17 bytes
     Total 7132151 bytes compressed to 5635792 bytes.
    Time 422.71 sec, used 641 MB (672399440 bytes) of memory
    squeezechart:
    PAQ7 (24.12.2005) -5 Matt Mahoney 57,1 5636813




    Source:
    https://github.com/kaitz/paq8pxv


    It is what it is, nothing more.
    Attached Files Attached Files
    KZo


  2. Thanks (4):

    Darek (5th February 2019),Mike (4th February 2019),moisesmcardona (6th February 2019),Stephan Busch (7th February 2019)

  3. #2
    Member
    Join Date
    Jun 2009
    Location
    Puerto Rico
    Posts
    190
    Thanks
    89
    Thanked 18 Times in 14 Posts
    Hi, first of all, thanks for this new release.

    Second, the compressor creates .paq8pxdv(version) instead of .paq8pxv(version), so I'm wondering, is the compressor name actually paq8pxv or paq8pxdv, and if it's the first case, can you fix this tiny string in the code? Or do you have the command line to compile this?

    Thanks!

  4. #3
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    465
    Thanks
    193
    Thanked 289 Times in 157 Posts
    I updated jpeg.cfg file in github.

    Will fix version info in next release.

    My plan if i will pull this off is to use config file for compressor. It will look something like this:
    Code:
    stream -1
    detect recur.det
    decode recur.dec
    encode recur.enc
    compress -1
    stream 0
    detect -1
    decode -1
    encode -1
    compress test3d.cfg
    stream 1
    detect jpeg.det
    decode -1
    encode -1
    compress jpeg.cfg
    stream 2
    detect image1.det
    decode -1
    encode -1
    compress test3img.cfg
    stream 3
    detect image4
    decode -1
    encode -1
    compress test3d.cfg
    stream 4
    detect image8.det
    decode -1
    encode -1
    compress test3i8.cfg
    stream 5
    detect image24.det
    decode image24.dec
    encode image24.enc
    compress test3i24.cfg
    stream 6
    detect audio.det
    decode -1
    encode -1
    compress test3d.cfg
    stream 7
    detect exe.det
    decode exe.dec
    encode exe.enc
    compress test3d.cfg
    stream 8
    detect text.det
    decode text.dec
    encode text.enc
    compress test3d.cfg


    For example:
    stream 1
    detect jpeg.det
    decode -1
    encode -1
    compress jpeg.cfg

    Use stream 1, detect data with jpeg.det, decode none, encode none, compress with jpeg.cfg
    Output file will contain encode and compress model. Used when decompressing.

    Its an idea, no code yet.
    KZo


  5. #4
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    465
    Thanks
    193
    Thanked 289 Times in 157 Posts
    paq8pxv_v4



    • jpeg cfg updated from paq8fthis4
    • vm jit optimize array access and other code
    • fix vm and jit compression differences
    • vm is now signed (int,short, char), only << and >> are unsigned
    • add jpeg detection as external code for vm, for testing
    • change all config files


    jpeg.det has jpeg detection code. Filename hard coded now into executable.
    Attached Files Attached Files
    KZo


  6. Thanks (3):

    Darek (16th February 2019),moisesmcardona (15th February 2019),xinix (17th February 2019)

  7. #5
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    465
    Thanks
    193
    Thanked 289 Times in 157 Posts
    paq8pxv_v5

    • use config file for detection, decoding, encoding and compression
    • remove unused code
    • change jpeg detection
    • add exe detection



    Encode and decode transforms not present right now.
    Code:
    stream id=0 model=test3d.cfg
    stream id=1 model=jpeg.cfg
    stream id=2 model=image1.cfg
    stream id=3 model=test3img.cfg
    stream id=4 model=test3i8.cfg
    stream id=5 model=test3i24.cfg
    type id=0 no detect model (-1)
    type id=0 no decode model
    type id=0 no encode model
    type id=0 stream id=0
    type id=1 model=jpeg.det
    type id=1 no decode model
    type id=1 no encode model
    type id=1 stream id=1
    type id=3 model=exe.det
    type id=3 no decode model
    type id=3 no encode model
    type id=3 stream id=0
    Total streams 6, total types 3
    Creating archive test.zip.paq8pxv5 with 1 file(s)...
    
    
    File list (17 bytes)
    Compressed from 17 to 21 bytes.
    
    
    1/1  Filename: test.zip (115864 bytes)
    Block segmentation:
     0           | default   |       897 [0 - 896]
     1           | exe.det   |      8986 [897 - 9882]
     2           | jpeg.det  |     52821 [9883 - 62703]
     3           | default   |        38 [62704 - 62741]
     4           | jpeg.det  |     52821 [62742 - 115562]
     5           | default   |       301 [115563 - 115863]
    
    
     Segment data size: 79 bytes
    
    
     TN |Type name |Count      |Total size
    -----------------------------------------
      0 |default   |         3 |      1236
      1 |jpeg.det  |         2 |    105642
      2 |exe.det   |         1 |      8986
    -----------------------------------------
    Total level  0 |         6 |    115864
    
    
    test3d.cfg   stream(0).  Total 10222
    Compressing test3d.cfg   stream(0).  Total 10222
    Stream(0) compressed from 10222 to 3306 bytes
        Model compressed from 13813 to 10632 bytes
    jpeg.cfg   stream(1).  Total 105642
    JPEG model v4
    Compressing jpeg.cfg   stream(1).  Total 105642
    Stream(1) compressed from 105642 to 55684 bytes
        Model compressed from 21587 to 16629 bytes
     Segment data compressed from 79 to 56 bytes
     Total 115864 bytes compressed to 86376 bytes.
    Time 3.93 sec, used 1229 MB (1289507343 bytes) of memory
    Example detection file:
    Code:
    // For XXXX detection
    int buf0,buf1,mystart;
    int type,state,jstart,jend;
    enum {DEFAULT=1,YOURTYPE}; //internal enum
    enum {NONE=0,START,INFO,END,RESET=0xfffffffe,REQUEST=0xffffffff}; //external enum
    // function will report its state 
    // or if i=-1 then state results otherwise i is pos
    // c4 is last 4 bytes
    void reset(){
        state=NONE,type=DEFAULT,jstart=jend=buf0=buf1=mystart=0;
    }
    int detect(int c4,int i) {
        //if state parameters recuested
        if (i==REQUEST){
            if (state==NONE)  return 0xffffffff;
            if (state==START) return jstart;
            if (state==END)   return jend;
            if (state==INFO)  return 0xffffffff;
        }
        if (i==RESET){
            reset();
            return 0xffffffff;
        }
        buf1=(buf1<<8)|(buf0>>24);
        buf0=c4;
        //detect header
        if (buf1==0xFFFFFFFF && mystart==0){
            mystart=i;
        }
    
    
        if (type==DEFAULT && mystart){
            type=YOURTYPE;
            state=START; 
            jstart=mystart-4;
            return state;
        }
        if (i-mystart>0x100){
            if (type==YOURTYPE){
                state=END;
                type=DEFAULT;
                jend=i;
                return state;
          }
          state=NONE;
          type=DEFAULT;
          mystart=0;
        }
        return NONE;
    }
    
    
    int main() {
        reset();
    }
    Attached Files Attached Files
    KZo


  8. Thanks (3):

    Darek (17th February 2019),Mike (17th February 2019),moisesmcardona (17th February 2019)

  9. #6
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    465
    Thanks
    193
    Thanked 289 Times in 157 Posts
    paq8pxv_v6
    encode and decode now working. Only exe decode/encode for now.
    Recursion still missing.



    It looks like what I imagined at first.
    Attached Files Attached Files
    KZo


  10. Thanks (2):

    Darek (25th February 2019),moisesmcardona (5th March 2019)

  11. #7
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    465
    Thanks
    193
    Thanked 289 Times in 157 Posts
    paq8pxv_v8

    • enable recursion
    • detect base64, decode and encode
    • ---- v7
    • enable multithreading again
    • detect files for bmp 1,8,24 bit images, text
    • vm: free memory correctly, also in interpreted code allocated memory



    base64 detects only continues stream, no line breaks. like in b64sample or .mht files



    It seems that all this has been accomplished, what I was thinking at first.
    Or... remove matchModel and add as part of cfg, add more detection and transform files..., add some *Maps not present now (IndirectMap). Add decode and conf file to archive...
    Attached Files Attached Files
    KZo


  12. Thanks (2):

    Mike (5th March 2019),moisesmcardona (5th March 2019)

  13. #8
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    465
    Thanks
    193
    Thanked 289 Times in 157 Posts
    paq8pxv_v12
    Code:
    
    more changes to SIMD code
    fix compile errors on none SIMD code
    
    
    PIC -s7 513216 bytes compressed to 33336 bytes. used 907 MB (951143616 bytes) of memory
    
    
    paq8pxv_v12_AVX2.exe -s7 Time 25.63 sec
    paq8pxv_v12_SSE42.exe -s7 Time 27.91 sec
    paq8pxv_v12_SSE4.exe -s7 Time 27.70 sec
    paq8pxv_v12_SSSE3.exe -s7 Time 27.63 sec
    paq8pxv_v12_SSE2.exe -s7 Time 28.70 sec
    paq8pxv_v12_MMX.exe -s7 Time 28.61 sec
    paq8pxv_v12_None.exe -s7 Time 184.61 sec
    
    
    This release is compatible with v11
    paq8pxv_v11
    Code:
    cleanup
    fix jpeg cfg
    change test3d cfg, add exe cm
    paq8pxv_v10
    Code:
    add small model to archive, now all decompress info is in archive
    change cfg models
    remove matchModel, add to test3d.cfg simple model
    paq8pxv_v9
    Code:
    add conf and decode to archive
    change test3d.cfg
    Attached Files Attached Files
    KZo


  14. Thanks (5):

    CompressMaster (7th August 2019),Darek (4th June 2019),Mike (4th June 2019),moisesmcardona (7th June 2019),Stephan Busch (5th June 2019)

  15. #9
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    876
    Thanks
    474
    Thanked 175 Times in 85 Posts
    paq8pxv_v12 aborts compression of app.tar at 55%:

    Segment data size: 17941 bytes

    TN |Type name |Count |Total size
    -----------------------------------------
    0 |default | 694 | 206215076
    1 |jpeg.det | 56 | 374204
    2 |exe.det | 335 | 100808758
    3 |bmp8.det | 113 | 442616
    4 |bmp1.det | 72 | 18432
    5 |bmp24.det | 78 | 892776
    7 |text.det | 32 | 27434506
    -----------------------------------------
    Total level 0 | 1380 | 336186368

    Decode compressed to : 1366
    test3d.cfg stream(0). Total 334458340
    jpeg.cfg stream(1). Total 374204
    test3img.cfg stream(3). Total 18432
    test3i8.cfg stream(4). Total 442616
    test3i24.cfg stream(5). Total 892776
    Compressing test3d.cfg stream(0). Total 334458340
    Stream(0) compressed from 334458340 to 61897340 bytes
    Model compressed from 16628 to 6142 bytes
    Compressing jpeg.cfg stream(1). Total 374204
    JPEG model v6
    Stream(1) compressed from 374204 to 258128 bytes
    Model compressed from 20420 to 8595 bytes
    Compressing test3img.cfg stream(3). Total 18432
    3 55.55%

  16. Thanks:

    Darek (8th June 2019)

  17. #10
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    465
    Thanks
    193
    Thanked 289 Times in 157 Posts
    Replace file provided in attachment.
    Attached Files Attached Files
    KZo


  18. Thanks:

    Stephan Busch (7th August 2019)

  19. #11
    Member
    Join Date
    Jun 2018
    Location
    Slovakia
    Posts
    183
    Thanks
    49
    Thanked 13 Times in 13 Posts
    @kaitz,

    Problem with "paq8pxv_v12_NONE" solved. EXE wasn´t in the same folder with all configuration files + models. Now it works!

    Just in case, I attached correct version.
    Attached Files Attached Files

  20. #12
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    465
    Thanks
    193
    Thanked 289 Times in 157 Posts
    Code:
    -DEC Alpha transform
    works

    Creating archive query.dll.paq8pxv11 with 1 file(s)...

    File list (19 bytes)
    Compressed from 19 to 21 bytes.

    1/1 Filename: query.dll (1952016 bytes)
    Block segmentation:
    0 | default | 1952016 [0 - 1952015]

    Segment data size: 14 bytes

    TN |Type name |Count |Total size
    -----------------------------------------
    0 |default | 1 | 1952016
    -----------------------------------------
    Total level 0 | 1 | 1952016

    Decode compressed to : 517
    test3d.cfg stream(0). Total 1952016
    Compressing test3d.cfg stream(0). Total 1952016
    Stream(0) compressed from 1952016 to 711868 bytes
    Model compressed from 673 to 401 bytes
    Segment data compressed from 14 to 13 bytes
    Total 1952016 bytes compressed to 713525 bytes.
    Time 2.92 sec, used 35 MB (37658441 bytes) of memory
    Creating archive query.dll.paq8pxv11 with 1 file(s)...

    File list (19 bytes)
    Compressed from 19 to 21 bytes.

    1/1 Filename: query.dll (1952016 bytes)
    Block segmentation:
    0 | default | 186688 [0 - 186687]
    1 | dec.det | 1252244 [186688 - 1438931]
    2 | default | 513084 [1438932 - 1952015]

    Segment data size: 40 bytes

    TN |Type name |Count |Total size
    -----------------------------------------
    0 |default | 2 | 699772
    8 |dec.det | 1 | 1252244
    -----------------------------------------
    Total level 0 | 3 | 1952016

    Decode compressed to : 1055
    test3d.cfg stream(0). Total 1952016
    Compressing test3d.cfg stream(0). Total 1952016
    Stream(0) compressed from 1952016 to 679664 bytes
    Model compressed from 673 to 401 bytes
    Segment data compressed from 40 to 33 bytes
    Total 1952016 bytes compressed to 681879 bytes.
    Time 3.06 sec, used 39 MB (41007923 bytes) of memory
    EDIT:
    Above uses fast model.

    For comparison with default model:
    Creating archive query.dll.paq8pxv11 with 1 file(s)...

    File list (19 bytes)
    Compressed from 19 to 21 bytes.

    1/1 Filename: query.dll (1952016 bytes)
    Block segmentation:
    0 | default | 186688 [0 - 186687]
    1 | dec.det | 1252244 [186688 - 1438931]
    2 | default | 513084 [1438932 - 1952015]

    Segment data size: 40 bytes

    TN |Type name |Count |Total size
    -----------------------------------------
    0 |default | 2 | 699772
    8 |dec.det | 1 | 1252244
    -----------------------------------------
    Total level 0 | 3 | 1952016

    Decode compressed to : 1055
    test3d.cfg stream(0). Total 1952016
    Compressing test3d.cfg stream(0). Total 1952016
    Stream(0) compressed from 1952016 to 356213 bytes
    Model compressed from 16628 to 6142 bytes
    Segment data compressed from 40 to 33 bytes
    Total 1952016 bytes compressed to 364169 bytes.
    Time 137.36 sec, used 912 MB (956410810 bytes) of memory
    Quote Originally Posted by kaitz View Post
    Code:
    paq8pxd_v50 -s0   query.dll           1952016 1952125
    paq8pxd_v50 -s8:2 query.dll           1952016  346292 
    paq8px_v150 -8    query.dll.paq8pxd50 1952125  348056
    paq8px_v150 -8a   query.dll.paq8pxd50 1952125  349611
    paq8px_v150 -8    query.dll           1952016  383512
    paq8px_v150 -8a   query.dll           1952016  384962
    Attached Files Attached Files
    Last edited by kaitz; 5th November 2019 at 00:57. Reason: comparsion
    KZo


  21. Thanks:

    moisesmcardona (24th November 2019)

  22. #13
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    465
    Thanks
    193
    Thanked 289 Times in 157 Posts
    jpeg.v7
    Attached Files Attached Files
    KZo


  23. Thanks:

    moisesmcardona (24th November 2019)

  24. #14
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    465
    Thanks
    193
    Thanked 289 Times in 157 Posts
    • detect gif
    • encode/decode for text, static dict


    Dict is same as in https://encode.su/threads/1513-Calga...sion-challenge

    gif transform is not possible with current version. It needs multiple passes. Same for dynamic dict.
    Included conf file, same as in git.

    I was thinking replacing pseudo-c to something like ada. Another insane idea
    Attached Files Attached Files
    KZo


  25. Thanks:

    moisesmcardona (24th November 2019)

  26. #15
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,760
    Thanks
    274
    Thanked 1,200 Times in 668 Posts
    > I was thinking replacing pseudo-c to something like ada

    What's good in that? Simply different syntax?
    Some declarative syntax could be interesting (like in ZPAQL)... maybe grammar description language (like ANTLR) with bitfield extensions?

    Btw, I never understood this forward-compatibility VM idea...
    rather than adding redundancy to all archives (in the form of decompressor code),
    I think it would be much better to implement a flexible plugin system (like in 7-zip)
    with integrated ECDSA signing (as anti-malware protection).

  27. #16
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,537
    Thanks
    758
    Thanked 676 Times in 366 Posts
    Signed DLL require to install public key of DLL developer and to trust him. Imagine that I share archive made with your DLL. Receiver can't extract it if he doesn't trust you, so I have to choose between making an archive smaller or making it available for larger audience.

    DLLs made by new developers may have too small audience for a long time.

    Or opposite situation - when someone developed interesting DLLs, a lot of users started to trust him, and then he releases trojan horse codec. Or just sell his key, monetizing his work in bad way.

    If a hacker stole private key of a developer, everyone trusting him may be hacked too.

    If you lost your private key, everyone need to load your new key again.

    Overall, it's good enough for single source of codecs, but not for a large society of independent developers.

  28. #17
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,760
    Thanks
    274
    Thanked 1,200 Times in 668 Posts
    > Overall, it's good enough for single source of codecs, but not for a large society of independent developers.

    Yes, but isn't it normally the case?

    Also I'm not saying that we have to completely ban plugins not signed by framework developer.
    But rather there has to be a framework to control this.
    Somebody making a new plugin, adding it to repository, then creating an archive using it
    and sending it via email shouldn't be the reason for mailserver to automatically download
    the plugin and run it.

    Atm I see it like this:
    1. There're plugins developed and signed by main developer.
    2. There're 3rd-party plugins sent to main developer, checked and signed.
    3. Non-public custom plugins can be signed by a new key,
    with local framework configured to accept it too by default.
    Same idea solves the case where main developer disappears.
    4. Unsigned plugins can be used with explicit manual permission (for debug etc).

    It sounds simple, but there's plenty of work to implement it really.
    But I think that it could be more useful than a VM.

    > Signed DLL require to install public key of DLL developer and to trust him.

    1. I'm talking about new custom signing system, not microsoft one.
    Since we'd want it to be portable and compatible with different exe formats.

    2. I don't see a problem with trusting new plugins signed by Matt Mahoney,
    when I already have to use the archiver developed by Matt Mahoney to run the plugins.

    3. A case where we'd have to support frequently appearing new plugins
    from multiple sources which can't agree on a single registration authority
    is quite unbelievable in this case.
    But still, we can define public key storage as another plugin,
    include all relevant keys, then get main developer to sign it once.

    > Imagine that I share archive made with your DLL.
    > Receiver can't extract it if he doesn't trust you,
    > so I have to choose between making an archive smaller or making it available for larger audience.

    Sure, but its mostly a matter of automatic execution of trusted plugins.
    Untrusted ones would have to be confirmed by user, I think its reasonable
    since its unknown executable code in the end.

    Also it could make sense to add extra exceptions for sfx archives,
    since if somebody runs these, it means they already trust the contents anyway.

    > DLLs made by new developers may have too small audience for a long time.

    Normally the main developer would just verify it and generate a signature.
    Or not - but then its reasonable to not expect the default archiver setup
    to automatically unpack an archive using an untrusted plugin.

    > Or opposite situation - when someone developed interesting DLLs,
    > a lot of users started to trust him, and then he releases trojan horse codec.
    > Or just sell his key, monetizing his work in bad way.

    This certainly won't be made worse by signing.

    > If a hacker stole private key of a developer, everyone trusting him may be hacked too.
    > If you lost your private key, everyone need to load your new key again.

    Sure, but there're solutions for this too.
    Like, https keeps working, despite all the leaks and hacks.

    Also, its not like VM solution is safe.
    For RarVM one obvious attack vector is vmcode crc checks.
    It could work like this:
    1. A new filter is introduced and significantly improves compression.
    Archiver v1 only handles it by interpretation.
    2. Archiver v2 adds integrated handler for new filter because of popular demand.
    Now when v2 sees the right crc, it automatically executes the compiled handler.
    3. During the period when end-users already upgraded the archiver,
    while 3rd-party applications (like AVs) which include it didn't,
    we can send archives with trojans, which wouldn't be detected by AV software,
    because there'd be different vmcode filters adjusted to have the same crc,
    so v1 archiver would decode "safe data", while user's v2 archiver would decode
    an actual virus.

  29. #18
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    465
    Thanks
    193
    Thanked 289 Times in 157 Posts
    Quote Originally Posted by Shelwien View Post
    > I was thinking replacing pseudo-c to something like ada

    What's good in that? Simply different syntax?
    Some declarative syntax could be interesting (like in ZPAQL)... maybe grammar description language (like ANTLR) with bitfield extensions?
    If i could explain it i would. There is more to ada then only another syntax. Right?
    Quote Originally Posted by Shelwien View Post
    >
    Btw, I never understood this forward-compatibility VM idea...
    rather than adding redundancy to all archives (in the form of decompressor code),
    I think it would be much better to implement a flexible plugin system (like in 7-zip)
    with integrated ECDSA signing (as anti-malware protection).
    This is an idea as whole, there can be transform without compression or/and compression.
    ​Decompressor included or not, it does not matter, it can be on same place as archiver, sort of like a plugin,or in archive (like on older version).
    Its measure to see something like, i change some code and it compresses better, final archive is same size as before. Like a scale.
    Like this text transform, its intended only for calgary files. Probably in all other cases it increases final size or it stays same.

    As for 7-zip. If plugin is not on the authors site i probably never use it on real data.

    There will never be whitepaper that describes this prog like in zpaq,ft,etc.
    Maybe if i had some expert knowledge in computer science (programming) and higher math or something like that... then.
    For me its like half to two hours a week thing trying to remember what the hell i did last time.

    ​It's a toy.

    ​Hope it understandable..
    KZo


  30. Thanks:

    Mike (24th November 2019)

  31. #19
    Member
    Join Date
    Jul 2014
    Location
    Mars
    Posts
    194
    Thanks
    133
    Thanked 12 Times in 11 Posts
    Is it possible for it to become practical compressor in near future, utilizing multicores? Or will it just be experimental thing forever? What`s your prognosis?

  32. #20
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    465
    Thanks
    193
    Thanked 289 Times in 157 Posts
    It compresses on multiple threads if data is split by compressor. So multiple cores.

    Experimental? It works . Ok probably yes. But its fun.
    KZo


  33. #21
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    465
    Thanks
    193
    Thanked 289 Times in 157 Posts
    int cxt[4]={}; // dynamic alloc memory for int array
    int cxt1,cxt2,cxt3,cxt4,N;
    enum {SMC=1,APM1,DS,AVG,RCM,SCM,CM,MX,ST};
    int update(int y,int c0,int bpos,int c4,int pos){
    int i;
    if (bpos==0) cxt4=cxt3,cxt3=cxt2,cxt2=cxt1,cxt1=(c4&0xff)*256;
    cxt[0]=(cxt1+c0);
    cxt[1]=(cxt2+c0+0x10000);
    cxt[2]=(cxt3+c0+0x20000);
    for (i=0;i<N;++i)
    vmx(DS,0,cxt[i]);// pr[0]--pr[2]
    vmx(APM1,0,c0); //
    return 0;
    }
    void block(int a,int b){}
    int main(){int i; N=3;
    vms(0,1,1,2,0,0,0,0,0); //APM1,DS,AVG
    vmi(DS, 0,18,1023,N); // pr[0]..pr[2]
    vmi(AVG,0,0,1,2); // pr[3]=avg(1,2)
    vmi(AVG,1,0,0,3); // pr[4]=avg(0,3)
    vmi(APM1,0,256,7,4); // pr[5]=apm(4) rate 7
    cxt1=cxt2=cxt3=cxt4=0;
    }


    Above works in newer version.
    In update only contexts will be set. Prediction order depends on the order set up in main. Sort of like in zpaql config file.
    Click image for larger version. 

Name:	Capture.PNG 
Views:	47 
Size:	16.2 KB 
ID:	7131
    KZo


  34. #22
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    465
    Thanks
    193
    Thanked 289 Times in 157 Posts
    v13
    Code:
    -fix image detection, now reports width to compression model, fixes image compression
    -more comments on 1 bit image model (test3img.cfg)
    -not compatible with v12
    -removed more c functions
    -working examples are test3i24.cfg and test3img.cfg
    -removed more functions in c
    -not compatible with v12
    -only fast default and im24 compression model.
    ​no malloc so:
    int array[11]={};  // some dynamic memory 

    // Model for 1 bit image data


    enum {SMC=1,APM1,DS,AVG,SCM,RCM,CM,MX,ST,MM};
    int r0,r1,r2,r3,N,w;
    int cxt[11]={}; // contexts
    char buffer[0x400000]={};
    enum {BMASK=0x3FFFFF};
    int bufpos;
    int buf(int i){
    return buffer[(bufpos-i)&BMASK];
    }

    // update is called in VM after every bit
    int update(int y,int c0,int bpos,int c4,int pos){
    int i;
    if (bpos== 0){
    buffer[bufpos]=c4&0xff;
    bufpos++;
    bufpos=bufpos&BMASK;
    }
    // update the contexts (pixels surrounding the predicted one)
    r0=(r0<<1)|y;
    r1=(r1<<1)|((buf(w-1)>>(7-bpos))&1);
    r2=(r2<<1)|((buf(w+w-1)>>(7-bpos))&1);
    r3=(r3<<1)|((buf(w+w+w-1)>>(7-bpos))&1);
    cxt[0]=0x100|(r0&0x7)+((r1>>4)&0x38)+((r2>>3)&0xc0);
    cxt[1]=0x200|((r0&1)|((r1>>4)&0x3e)|((r2>>2)&0x40)|((r3>>1)&0x80));
    cxt[2]=0x300|((r0&0x3f)^(r1&0x3ffe)^((r2<<2)&0x7f00)^((r3<<5)&0xf800));
    cxt[3]=0x400|((r0&0x3e)^(r1&0x0c0c)^(r2&0xc800));
    cxt[4]=0x100|(((r1&0x30)^(r3&0x0c0c))|(r0&3));
    cxt[5]=0x800|(((!r0)&0x444)|(r1&0xC0C)|(r2&0xAE3)|(r3&0x51C));
    cxt[6]=0xC00|((r0&1)|((r1>>4)&0x1d)|((r2>>1)&0x60)|(r3&0xC0));
    cxt[7]=0x1000|(((r0>>4)&0x2AC)|(r1&0xA4)|(r2&0x349)|((!r3)&0x14D));
    cxt[8]=0x2000 | ((r0&7)|((r1>>1)&0x3F8)|((r2<<5)&0xC00));//
    cxt[9]=0x10000| ((r0&0x3f)^(r1&0x3ffe)^(r2<<2&0x7f00)^(r3<<5&0xf800));
    cxt[10]=0x20000|((r0&0x3e)^(r1&0x0c0c)^(r2&0xc800));
    for (i=0;i<N;++i)
    vmx(DS,0,cxt[i]);
    vmx(MX,0,((r0&0x7)|((r1>>4)&0x38)|((r2>>3)&0xc0)) &0xff);
    vmx(MX,0,(((r1&0x30)^(r3&0x0c))|(r0&3)) &0xff);
    vmx(MX,0,((r0&1)|((r1>>4)&0x3e)|((r2>>2)&0x40)|((r3>>1)&0x80)) &0xff);
    vmx(MX,0,((r0&0x3e)^((r1>>8)&0x0c)^((r2>>8)&0xc8)) &0xff);
    vmx(MX,0,c0);
    vmx(SMC,0,c0);
    vmx(SMC,1,c0|(buf(1)<<8));
    return 0;
    }
    //VM calls this after every image block
    void block(int a,int b) {
    w=a; //get block info, image width
    }
    // main is called only once after VM init.
    int main() {
    N=11;
    //{SMC,APM1,DS,AVG,SCM,RCM,CM,MX,ST,MM};
    vms(1+1,0,1,1+1,1 ,0,0,1,1,N+1+1);
    vmi(DS,0,18,1023,N); //pr[0]..pr[11]
    vmi(MM,0,0,0,0); //mixer(0).add(pr[0]) //add 0-11 predictions to mixer(0)
    vmi(MM,1,0,1,0); //..
    vmi(MM,2,0,2,0);
    vmi(MM,3,0,3,0);
    vmi(MM,4,0,4,0);
    vmi(MM,5,0,5,0);
    vmi(MM,6,0,6,0);
    vmi(MM,7,0,7,0);
    vmi(MM,8,0,8,0);
    vmi(MM,9,0,9,0); //..
    vmi(MM,10,0,10,0); //mixer(0).add(pr[10])
    vmi(ST,0,3,0,-1); //pr[11]=0 //static prediction
    vmi(SMC,0,0x100,1023,-1); //pr[12]=smc(0,cxt)
    vmi(AVG,0,0,11,12); //pr[13]=avg(pr[11],pr[12])
    vmi(MM,11,0,13,0); //mixer(0).add(pr[13])
    vmi(SMC,1,0x10000,1023,-1); //pr[14]=smc(1,cxt)
    vmi(AVG,1,0,14,11); //pr[15]=avg(pr[14],pr[11])
    vmi(MM,12,0,15,0); //mixer(0).add(pr[15])
    vmi(SCM,0,8,0,0); //
    vmi(MX,0,N*2+2+2,256*5,5); //pr[16]=mixer(0).mix // mix all inputs
    mxs(0,256);
    mxs(0,256);
    mxs(0,256);
    mxs(0,256);
    mxs(0,256);
    r0=r1=r2=r3=0;
    bufpos=0;
    }
    Attached Files Attached Files
    Last edited by kaitz; 7th January 2020 at 05:33. Reason: im8
    KZo


  35. Thanks (2):

    Mike (6th January 2020),moisesmcardona (21st January 2020)

  36. #23
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    465
    Thanks
    193
    Thanked 289 Times in 157 Posts
    paq8pxv_v16
    Code:
    ​update jpeg model
    ​add new component DHS (used by 4bit image and jpeg)
    small changes
    reduce warnings at compile time
    make jit memory execute/read only
    fix text transform
    in vm add static bounds check on src->asm compile
    add runtime bounds check
    fix arm detection
    enable arm,dec in main conf
    remove mxs function.
    make mixer into one layer only
    fix SIMD mixer errors
    add 4 bit bmp detection
    add 4 bit image compression
    update git readme
    Attached Files Attached Files
    KZo


  37. Thanks:

    moisesmcardona (21st January 2020)

  38. #24
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,760
    Thanks
    274
    Thanked 1,200 Times in 668 Posts
    I always wanted to turn paq models to coroutines -
    so that a model would look like a normal parser/coder with branches and rc calls in multiple places,
    and main engine would just sync multiple such coders.

    Another point is memory allocation: dynamic memory alloc is quite slow, and is usually totally unnecessary -
    for example, I managed to port paq jpeg model (as jojpeg) to a standalone codec with static alloc.
    And here I can see that each vmi() does its own alloc.

  39. #25
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    465
    Thanks
    193
    Thanked 289 Times in 157 Posts
    paq8pxv_v16 -s8
    Code:
    dicke 1994733
    mozil 9952626
    mr    2098167
    nci   1069908
    ooff  1434245
    osdb  2080633
    reym   831310
    samba 2799197
    sao   3762286
    webst 4999505
    x-ray 3611304
    xml    288872
         34922786
    
    
    enwik8 17694997 7954.28 sec, used 1542 MB (i3 4160)
    silesia (http://www.mattmahoney.net/dc/silesia.html) is behind paq8pxd_v4 -8
    Last edited by kaitz; 20th January 2020 at 18:05.
    KZo


  40. #26
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    465
    Thanks
    193
    Thanked 289 Times in 157 Posts
    update to v16
    Code:
    fix JIT
    update main compression, now a bit better compression

    ​Binary only, source in github.
    only pxv_VM version can decompress archive produced by this binary. Otherwise c/decompression is compatible.

    --
    Quote Originally Posted by Shelwien View Post

    Another point is memory allocation: dynamic memory alloc is quite slow, and is usually totally unnecessary -
    for example, I managed to port paq jpeg model (as jojpeg) to a standalone codec with static alloc.
    And here I can see that each vmi() does its own alloc.
    I think this adds no significant speedup. Then again i can be wrong.

    ----
    ​Added bmp24 filter
    Attached Files Attached Files
    Last edited by kaitz; 24th January 2020 at 00:42. Reason: bmp
    KZo


  41. #27
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    465
    Thanks
    193
    Thanked 289 Times in 157 Posts
    I changed detection part, so that there can be multiple detectors for same type. Detected data can go to same stream for compression or in another one.
    For example below is for image. bmp8.2.det detects images where width is larger then 256.
    Also for text there can be multiple detectors. One for small files for processing with transform and another for larger files. Or UTF files, or some binary text mix.
    etc.

    I still need to implement recursive types to have priority over above. Like small base64 data vs long text conflict.

    ​Or something like that.
    Code:
     
     Segment data size: 4512 bytes
    
     TN |Type name |Count      |Total size
    -----------------------------------------
      0 |default   |       174 |   1328262
      2 |exe.det   |         1 |   2233470
      3 |bmp8.det  |       107 |     36352
      4 |bmp1.det  |        37 |      9984
     11 |bmp4.det  |        27 |     12716
     13 |bmp8.2.det |         1 |    250000
    -----------------------------------------
    Total level  0 |       347 |   3870784
    
    Decode compressed to : 1330
    test3d.cfg   stream(0).  Total 3561732
    test3img.cfg   stream(3).  Total 9984
    test3i8.cfg   stream(4).  Total 250000
    test3i4.cfg   stream(6).  Total 12716
    test3d.cfg   stream(7).  Total 36352
    Code:
    Segment data size: 4512 bytes
    
     TN |Type name |Count      |Total size
    -----------------------------------------
      0 |default   |       174 |   1328262
      2 |exe.det   |         1 |   2233470
      3 |bmp8.det  |       107 |     36352
      4 |bmp1.det  |        37 |      9984
     11 |bmp4.det  |        27 |     12716
     13 |bmp8.2.det |         1 |    250000
    -----------------------------------------
    Total level  0 |       347 |   3870784
    
    Decode compressed to : 1330
    test3d.cfg   stream(0).  Total 3561732
    test3img.cfg   stream(3).  Total 9984
    test3i8.cfg   stream(4).  Total 286352
    test3i4.cfg   stream(6).  Total 12716
    Compressing test3d.cfg   stream(0).  Total 3561732
    KZo


  42. Thanks:

    Mike (30th January 2020)

  43. #28
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    465
    Thanks
    193
    Thanked 289 Times in 157 Posts
    Code:
    ​change detection
    update affected .det files
    Attached Files Attached Files
    KZo


  44. Thanks:

    Mike (31st January 2020)

  45. #29
    Member
    Join Date
    Jun 2009
    Location
    Puerto Rico
    Posts
    190
    Thanks
    89
    Thanked 18 Times in 14 Posts
    Quote Originally Posted by kaitz View Post
    Code:
    ​change detection
    update affected .det files
    Why release v16 twice? Shouldn't it be v17?

  46. #30
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    465
    Thanks
    193
    Thanked 289 Times in 157 Posts
    Confusing for sure.

    Its compatible when decompressing.
    KZo


Similar Threads

  1. Machine Learning to identify weight loss parameters
    By Sportman in forum The Off-Topic Lounge
    Replies: 0
    Last Post: 13th August 2016, 13:10
  2. Split-stream compression (or what?) for x86 machine code
    By Paul W. in forum Data Compression
    Replies: 9
    Last Post: 26th April 2014, 21:35
  3. Virtual Hard Disk Compress/Dedupe
    By JayM in forum Data Compression
    Replies: 6
    Last Post: 22nd July 2013, 00:00
  4. Do you have a 64-bit machine at home?
    By encode in forum The Off-Topic Lounge
    Replies: 22
    Last Post: 4th December 2009, 13:09
  5. Virtual test-machines
    By Vacon in forum Data Compression
    Replies: 7
    Last Post: 14th April 2009, 23:19

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •