Page 65 of 65 FirstFirst ... 1555636465
Results 1,921 to 1,938 of 1938

Thread: paq8px

  1. #1921
    Member
    Join Date
    Jun 2009
    Location
    Puerto Rico
    Posts
    230
    Thanks
    125
    Thanked 43 Times in 33 Posts
    Quote Originally Posted by Gotty View Post
    Thanks a lot! Oh, that's not it. The SIMD mixer code is clearly a winner, but the Bucket find() function is under investigation here.
    Please recompile the code with changes in ContextMap and ContextMap2. Look for 4+4 instances of the find function there, and modify the call like:

    table[ctx0].find(chk0, shared->chosenSimd);
    ->
    table[ctx0].findSsse3(chk0);


    vs. (which is faster/slower?)

    table[ctx0].find(chk0, shared->chosenSimd);
    ->
    table[ctx0].findNone(chk0);

    So to force paq8px to use the SIMD vs the non-SIMD Bucket find function. Unfortunately you cannot control that from the command line.

    Edit: one more thing. Please test it with non-image and non-audio files. Images and audio files do not use so much these functions.
    With findSsse3:
    Code:
    paq8px_v187fix3_findSsse3.exe -9  F:\temp\test_bin
    paq8px archiver v187fix3 (c) 2020, Matt Mahoney et al.
    
    Creating archive test_bin.paq8px187fix3 in single file mode...
    
    Filename: F:\temp\test_bin (606900 bytes)
    Block segmentation:
     0           | default          |    606900 bytes [0 - 606899]
    -----------------------
    Total input size     : 606900
    Total archive size   : 161470
    
    Time 102.09 sec, used 3941 MB (4132631569 bytes) of memory
    With findNone:

    Code:
    paq8px_v187fix3_findNone.exe -9  F:\temp\test_bin
    paq8px archiver v187fix3 (c) 2020, Matt Mahoney et al.
    
    Creating archive test_bin.paq8px187fix3 in single file mode...
    
    Filename: F:\temp\test_bin (606900 bytes)
    Block segmentation:
     0           | default          |    606900 bytes [0 - 606899]
    -----------------------
    Total input size     : 606900
    Total archive size   : 161470
    
    Time 97.88 sec, used 3941 MB (4132631569 bytes) of memory
    Difference was just a few seconds, but findNone was the winner.

    Attached compiled executables.
    Attached Files Attached Files

  2. Thanks:

    Gotty (15th June 2020)

  3. #1922
    Member
    Join Date
    Jun 2009
    Location
    Puerto Rico
    Posts
    230
    Thanks
    125
    Thanked 43 Times in 33 Posts
    However, things were opposite when compiled with NATIVECPU=ON

    findSsse3():
    Code:
    paq8px_v187fix3_findSsse3_NativeCPU.exe -9  F:\temp\test_bin
    paq8px archiver v187fix3 (c) 2020, Matt Mahoney et al.
    
    Creating archive test_bin.paq8px187fix3 in single file mode...
    
    Filename: F:\temp\test_bin (606900 bytes)
    Block segmentation:
     0           | default          |    606900 bytes [0 - 606899]
    -----------------------
    Total input size     : 606900
    Total archive size   : 161470
    
    Time 101.11 sec, used 3941 MB (4132631569 bytes) of memory
    findNone():
    Code:
    paq8px_v187fix3_findNone_NativeCPU.exe -9  F:\temp\test_bin
    paq8px archiver v187 (c) 2020, Matt Mahoney et al.
    
    Creating archive test_bin.paq8px187 in single file mode...
    
    Filename: F:\temp\test_bin (606900 bytes)
    Block segmentation:
     0           | default          |    606900 bytes [0 - 606899]
    -----------------------
    Total input size     : 606900
    Total archive size   : 161470
    
    Time 104.91 sec, used 3941 MB (4132631581 bytes) of memory
    Still, the previous findNone from the unoptimized build seems to be the winner here.

  4. Thanks:

    Gotty (15th June 2020)

  5. #1923
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    470
    Thanks
    323
    Thanked 309 Times in 166 Posts
    Thank you moisesmcardona!
    Interesting findings. The second one is indeed unexpected. Did you try to run it a couple of times to filter out noise from windows background processes?
    And one more: could you try the same experiment with neon vs none? I have the feeling that the none will be the winner, but let's have a solid proof.

  6. #1924
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    509
    Thanks
    208
    Thanked 347 Times in 185 Posts
    In my last posts first link has older version of simd find. One in pxd has additional code (below). I tested (older find) on office. It was 4 sec faster on avx. If you compare it you can see what was added, its also commented on find.
    Maybe it helps to test this.
    I probably enabled this part and forget it.
     

    XMM lastl=_mm_set1_epi8((last&15));
    XMM lasth=_mm_set1_epi8((last>>4));
    XMM one1 =_mm_set1_epi8(1);
    XMM vm=_mm_setr_epi8(0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7);


    XMM lastx=_mm_unpacklo_epi64(lastl,lasth); //last&15 last>>4
    XMM eq0 =_mm_cmpeq_epi8 (lastx,vm); //compare values


    eq0=_mm_or_si128(eq0,_mm_srli_si128 (eq0, 8)); //or low values with high


    lastx = _mm_and_si128(one1, eq0); //set to 1 if eq
    XMM sum1 = _mm_sad_epu8(lastx,xmmzero); //cout values, abs(a0 - b0) + abs(a1 - b1) .... up to b8
    const U32 pcount=_mm_cvtsi128_si32(sum1); //population count
    /*for (int i=0; i<7; ++i) {
    bh[i][0]=i+1;

    }*/
    U32 t0=(~_mm_movemask_epi8(eq0));
    for (int i=pcount; i<7; ++i) {
    int bitt =ctz(t0); //get index
    //#if ((__GNUC__ >= 4) || ((__GNUC__ == 3) && (__GNUC_MINOR__ >= 4)))
    //asm("btr %1,%0" : "+r"(t0) : "r"(bitt)); // clear bit set and test again https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47769
    //#else
    t0 &= ~(1 << bitt); // clear bit set and test again
    //#endif
    int pri=bh[bitt][0];
    if (pri<b ) b=pri, bi=bitt;




    }
    /*
    //uncomment above SIMD version and comment out code below to use full SIMD (SSE2) version
    for (int i=0; i<7; ++i) {
    int pri=bh[i][0];
    if (pri<b && (last&15)!=i && (last>>4)!=i) b=pri, bi=i;
    }*/
    KZo


  7. #1925
    Member
    Join Date
    Jun 2009
    Location
    Puerto Rico
    Posts
    230
    Thanks
    125
    Thanked 43 Times in 33 Posts
    Quote Originally Posted by Gotty View Post
    Thank you moisesmcardona!
    Interesting findings. The second one is indeed unexpected. Did you try to run it a couple of times to filter out noise from windows background processes?
    And one more: could you try the same experiment with neon vs none? I have the feeling that the none will be the winner, but let's have a solid proof.
    Will test it and report back.


    Quote Originally Posted by kaitz View Post
    In my last posts first link has older version of simd find. One in pxd has additional code (below). I tested (older find) on office. It was 4 sec faster on avx. If you compare it you can see what was added, its also commented on find.
    Maybe it helps to test this.
    I probably enabled this part and forget it.
     

    XMM lastl=_mm_set1_epi8((last&15));
    XMM lasth=_mm_set1_epi8((last>>4));
    XMM one1 =_mm_set1_epi8(1);
    XMM vm=_mm_setr_epi8(0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7);


    XMM lastx=_mm_unpacklo_epi64(lastl,lasth); //last&15 last>>4
    XMM eq0 =_mm_cmpeq_epi8 (lastx,vm); //compare values


    eq0=_mm_or_si128(eq0,_mm_srli_si128 (eq0, 8)); //or low values with high


    lastx = _mm_and_si128(one1, eq0); //set to 1 if eq
    XMM sum1 = _mm_sad_epu8(lastx,xmmzero); //cout values, abs(a0 - b0) + abs(a1 - b1) .... up to b8
    const U32 pcount=_mm_cvtsi128_si32(sum1); //population count
    /*for (int i=0; i<7; ++i) {
    bh[i][0]=i+1;

    }*/
    U32 t0=(~_mm_movemask_epi8(eq0));
    for (int i=pcount; i<7; ++i) {
    int bitt =ctz(t0); //get index
    //#if ((__GNUC__ >= 4) || ((__GNUC__ == 3) && (__GNUC_MINOR__ >= 4)))
    //asm("btr %1,%0" : "+r"(t0) : "r"(bitt)); // clear bit set and test again https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47769
    //#else
    t0 &= ~(1 << bitt); // clear bit set and test again
    //#endif
    int pri=bh[bitt][0];
    if (pri<b ) b=pri, bi=bitt;




    }
    /*
    //uncomment above SIMD version and comment out code below to use full SIMD (SSE2) version
    for (int i=0; i<7; ++i) {
    int pri=bh[i][0];
    if (pri<b && (last&15)!=i && (last>>4)!=i) b=pri, bi=i;
    }*/
    The SSSE3 code in Bucket.hpp looks the same, except a few variable names are changed.

    Code:
    
    
    
    
     
          __m128i lastL = _mm_set1_epi8((mostRecentlyUsed & 15U));
          __m128i lastH = _mm_set1_epi8((mostRecentlyUsed >> 4U));
          __m128i one1 = _mm_set1_epi8(1);
          __m128i vm = _mm_setr_epi8(0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7);
          
          __m128i lastX = _mm_unpacklo_epi64(lastL, lastH); // mostRecentlyUsed&15 mostRecentlyUsed>>4
          __m128i eq0 = _mm_cmpeq_epi8(lastX, vm); // compare values
    
          eq0 = _mm_or_si128(eq0, _mm_srli_si128(eq0, 8));    // or low values with high
    
          lastX = _mm_and_si128(one1, eq0);                //set to 1 if eq
          __m128i sum1 = _mm_sad_epu8(lastX, _mm_setzero_si128());        // count values, abs(a0 - b0) + abs(a1 - b1) .... up to b8
          const uint32_t pCount = _mm_cvtsi128_si32(sum1); // population count
          uint32_t t0 = (~_mm_movemask_epi8(eq0));
          for( int i = pCount; i < 7; ++i ) {
            int bitt = ctz(t0);     // get index
            t0 &= ~(1 << bitt); // clear bit set and test again
            int pri = bitState[bitt][0];
            if( pri < worst ) {
              worst = pri;
              idx = bitt;
            }
          }

  8. #1926
    Member
    Join Date
    Jun 2009
    Location
    Puerto Rico
    Posts
    230
    Thanks
    125
    Thanked 43 Times in 33 Posts
    ARM runtimes. Snapdragon 845

    findNeon:

    Code:
    paq8px_v187fix3_findneon -9 test_bin
    paq8px archiver v187fix3 (c) 2020, Matt Mahoney et al.
    
    Creating archive test_bin.paq8px187fix3 in single file mode...
    
    Filename: test_bin (606900 bytes)
    Block segmentation:
     0           | default          |    606900 bytes [0 - 606899]
    -----------------------
    Total input size     : 606900
    Total archive size   : 161470
    
    Time 574.75 sec, used 3941 MB (4132631529 bytes) of memory
    findNone:

    Code:
    paq8px_v187fix3_findnone -9 test_bin
    paq8px archiver v187fix3 (c) 2020, Matt Mahoney et al.
    
    Creating archive test_bin.paq8px187fix3 in single file mode...
    
    Filename: test_bin (606900 bytes)
    Block segmentation:
     0           | default          |    606900 bytes [0 - 606899]
    -----------------------
    Total input size     : 606900
    Total archive size   : 161470
    
    Time 552.66 sec, used 3941 MB (4132631529 bytes) of memory
    Again, findNone is the winner.

  9. Thanks:

    Gotty (17th June 2020)

  10. #1927
    Member
    Join Date
    Jun 2009
    Location
    Puerto Rico
    Posts
    230
    Thanks
    125
    Thanked 43 Times in 33 Posts
    Quote Originally Posted by Gotty View Post
    Thank you moisesmcardona!
    Interesting findings. The second one is indeed unexpected. Did you try to run it a couple of times to filter out noise from windows background processes?
    And one more: could you try the same experiment with neon vs none? I have the feeling that the none will be the winner, but let's have a solid proof.
    Turns out that I renamed the wrong executable. The file size was bigger. I recompiled them both with NATIVECPU and made sure to run `make clean` to build fresh executables.

    NativeCPU findSsse3:

    Code:
    paq8px_v187fix3_findSsse3.exe -9  F:\temp\test_bin
    paq8px archiver v187fix3 (c) 2020, Matt Mahoney et al.
    
    Creating archive test_bin.paq8px187fix3 in single file mode...
    
    Filename: F:\temp\test_bin (606900 bytes)
    Block segmentation:
     0           | default          |    606900 bytes [0 - 606899]
    -----------------------
    Total input size     : 606900
    Total archive size   : 161470
    
    Time 102.38 sec, used 3941 MB (4132631569 bytes) of memory
    NativeCPU findNone:

    Code:
    paq8px_v187fix3_findNone_NativeCPU.exe -9  F:\temp\test_bin
    paq8px archiver v187fix3 (c) 2020, Matt Mahoney et al.
    
    Creating archive test_bin.paq8px187fix3 in single file mode...
    
    Filename: F:\temp\test_bin (606900 bytes)
    Block segmentation:
     0           | default          |    606900 bytes [0 - 606899]
    -----------------------
    Total input size     : 606900
    Total archive size   : 161470
    
    Time 101.63 sec, used 3941 MB (4132631569 bytes) of memory
    Guess who won? findNone again!

  11. Thanks:

    Gotty (17th June 2020)

  12. #1928
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    979
    Thanks
    96
    Thanked 396 Times in 276 Posts
    enwik6 plus round 4k extra:
    195,030 bytes, 108.598 sec., paq8px_v187fix3_findSSSe3 -12
    195,030 bytes, 108.482 sec., paq8px_v187fix3_findNone -12

  13. Thanks:

    Gotty (17th June 2020)

  14. #1929
    Member
    Join Date
    Jun 2009
    Location
    Puerto Rico
    Posts
    230
    Thanks
    125
    Thanked 43 Times in 33 Posts
    Anyone knows why PAQ gets stuck at 100% sometimes when extracting? The output file is invalid too.

  15. #1930
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    470
    Thanks
    323
    Thanked 309 Times in 166 Posts
    I experienced that a long time ago, but it was fixed: an uninitialized variable.
    There are sometimes issues with compiler optimizations. Sometimes with MSVC, sometimes with GCC.
    Which file is it? (And compiler and compiler settings that are interesting.)

  16. #1931
    Member
    Join Date
    Jun 2009
    Location
    Puerto Rico
    Posts
    230
    Thanks
    125
    Thanked 43 Times in 33 Posts
    Quote Originally Posted by Gotty View Post
    I experienced that a long time ago, but it was fixed: an uninitialized variable.
    There are sometimes issues with compiler optimizations. Sometimes with MSVC, sometimes with GCC.
    Which file is it? (And compiler and compiler settings that are interesting.)
    I was compressing some scanned images stored on uncompressed TIFF files. It's weird in that v187 worked fine but v187fix3 failed on some while others decompressed fine. I compiled them using cmake and gcc. No additional flags passed to it.

    I'm using paq8px on a private BOINC distributed computing project where files are distributed to my machines to compress them, making the most use of my hardware. I doubt that differences between CPU's would be making such issues.

  17. #1932
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    470
    Thanks
    323
    Thanked 309 Times in 166 Posts
    Quote Originally Posted by moisesmcardona View Post
    I was compressing some scanned images stored on uncompressed TIFF files. It's weird in that v187 worked fine but v187fix3 failed on some while others decompressed fine. I compiled them using cmake and gcc. No additional flags passed to it.

    I'm using paq8px on a private BOINC distributed computing project where files are distributed to my machines to compress them, making the most use of my hardware. I doubt that differences between CPU's would be making such issues.
    It could be a (new) bug. Or an old bug coming to the surface because of the recent changes. This is an experimental compressor and tests are not thorough. I for example have only a couple of TIFF testcases - certainly not every case is covered. If you could share a failing sample I could investigate it. If it is not publicly shareable please contact me privately.
    I'm sure we can find the source of the problem.

  18. #1933
    Member
    Join Date
    Jun 2009
    Location
    Puerto Rico
    Posts
    230
    Thanks
    125
    Thanked 43 Times in 33 Posts
    Quote Originally Posted by Gotty View Post
    It could be a (new) bug. Or an old bug coming to the surface because of the recent changes. This is an experimental compressor and tests are not thorough. I for example have only a couple of TIFF testcases - certainly not every case is covered. If you could share a failing sample I could investigate it. If it is not publicly shareable please contact me privately.
    I'm sure we can find the source of the problem.
    Or maybe the files compressed incorrectly in the first place. I compiled your v187fix5 and I haven't experienced an issue, yet. Hope it stays that way!

  19. Thanks:

    Gotty (24th June 2020)

  20. #1934
    Member
    Join Date
    Jun 2009
    Location
    Puerto Rico
    Posts
    230
    Thanks
    125
    Thanked 43 Times in 33 Posts
    @Gotty, just finished compressing and extracting to validate 108 tiff files. paq8px v187fix5 worked fine on my machines. 2 Intel i7 and 2 AMD machines.

  21. Thanks:

    Gotty (28th June 2020)

  22. #1935
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    470
    Thanks
    323
    Thanked 309 Times in 166 Posts
    That's good news then.

  23. #1936
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,151
    Thanks
    703
    Thanked 455 Times in 352 Posts
    is paq8px v187fix5 contains any compression algorithm changes or it's only time optimization?

  24. #1937
    Member
    Join Date
    Jun 2009
    Location
    Puerto Rico
    Posts
    230
    Thanks
    125
    Thanked 43 Times in 33 Posts
    Quote Originally Posted by Darek View Post
    is paq8px v187fix5 contains any compression algorithm changes or it's only time optimization?
    Just some tweaks to the code.

    Code:
    paq8px_v187fix4
    2020.06.13
    - Cosmetic changes (formatting for better readability)
    - Added a couple of remarks
    - Restricted input buffer size to 1GB (on compression level 12 less memory is used: 28702 MB -> 27678 MB)
    - Restored INJECT #definitions for accessing Shared members (speed improvement)
    - Bucket find() function is hardwired to the non-SIMD version (speed improvement; SIMD version introduced in v184 seems to be somewhat slower than the non-SIMD one)
    - Archives are still expected to be binary compatible with v183fix1
    
    
    paq8px_v187fix5
    2020.06.14 
    - More cosmetic changes (formatting for better readability), more remarks in code
    - "Shared" and "UpdateBroadcaster" instances are compositioned instead of using a singleton
    - Archives are still expected to be binary compatible with v183fix1

  25. #1938
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    470
    Thanks
    323
    Thanked 309 Times in 166 Posts
    Yes, they are nothing special.
    I just needed to commit those so that the next "real" version does not carry too many changes. These are many small changes so it is already not easy to diffview anyway.

  26. Thanks:

    LucaBiondi (29th June 2020)

Page 65 of 65 FirstFirst ... 1555636465

Similar Threads

  1. FrontPAQ - GUI frontend for PAQ8PF and PAQ8PX
    By LovePimple in forum Download Area
    Replies: 26
    Last Post: 17th January 2019, 13:36
  2. Alternative paq8px builds
    By M4ST3R in forum Download Area
    Replies: 20
    Last Post: 25th June 2010, 16:19
  3. Optimized paq7asm.asm code not compatible with paq8px?
    By M4ST3R in forum Data Compression
    Replies: 7
    Last Post: 3rd June 2009, 15:34

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •