Results 1 to 9 of 9

Thread: SSE2(o0,o1) demo aka 2d interpolated mapping of linear inputs

  1. #1
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,378
    Thanks
    215
    Thanked 1,025 Times in 546 Posts

    Exclamation SSE2(o0,o1) demo aka 2d interpolated mapping of linear inputs

    Here's a new bwt postcoder based on o0&o1 static mix, 216838 on book1
    http://nishi.dreamhosters.com/u/o01_v0.rar
    I planned to test SSE2 with it, but for now got stuck in redesign and speed optimizations.
    Can you compare its speed/compression to your coder?

  2. #2
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Can't properly compile it using Visual C++ - due to weird in-line asm. My current state-of-the-art BCM compiled only via VC.

    Well, even if compare unoptimized BCM (VC compile) and your highly optimized ICL compile:

    o01_v0 - 12 sec on enwik8.bwt, 216xxx on book1
    bcm-o0+sse - 12 sec on enwik8.bwt, 214xxx on book1
    bcm-o01+sse - 16 sec on enwik8.bwt, 211xxx on book1
    bcm-o012+sse - 18 sec on enwik8, 210xxx on book1

    o01_v0 is far from perfect...

  3. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,378
    Thanks
    215
    Thanked 1,025 Times in 546 Posts
    To compile with VC, modify coro.inc to use setjmp.h instead of my_setjmp.h
    And thanks for testing, anyway - I'd try to improve it.
    One interesting thing though - here again gcc 4.5 loses in speed for some reason:
    enc. dec.
    6.000s 6.578s // IC
    7.203s 7.250s // gcc/mingw 4.5

  4. #4
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Better skip back to the SSE2!

  5. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,378
    Thanks
    215
    Thanked 1,025 Times in 546 Posts
    Ok, here's a version with interpolated SSE2.
    http://nishi.dreamhosters.com/u/o01_v1.rar
    The compression improvement from it is pretty good (and very close to what I expected),
    but unfortunately its also slow - no wonder with extra 11 MULs and 1 DIV
    So I guess it won't be of much help for BWT postcoders, even if we'd optimize it,
    but still, its a component with better effect than logistic mixing, so it has its uses.

    Code:
    book1bwt  enwik8bwt  enctime dectime
    
    216838    21337208    6.015s  6.797s // ic111, o01_v0, static linear mix
    
    216816    21334589    6.344s  6.688s // ic110_no_PGO, mixtest_v2, static linear mix
    
    216098    21204128    6.797s  7.156s // ic110_no_PGO, mixtest_v2, adaptive linear mix
    
    214695    21065497    9.954s 10.609s // ic110_no_PGO, mixtest_v3, adaptive logistic mix
    
    212624    21220420   21.469s 21.031s // ic110, o01_v1 = SSE2i(o0,o1), book1bwt profile
    
    213129    21040485   22.188s 22.718s // ic110, o01_v1 = SSE2i(o0,o1), enwik8bwt profile
    
    212478    21098548   23.234s 23.578s // ic110_no_PGO, o01_v1, o1 dual update, book1bwt profile

  6. #6
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hi,

    to enhance 2d SSE, you might want to try:

    - add a nonlinear quantization for each input probability to get more bins near 0/1,
    - allocate a space of N and M partition bins with N:M <> 1:1 (more bins for the more accurate model?).

    AFAIK noone tried both of these. It'd be nice if you'd find some time to test these.

    Greets
    M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk

  7. #7
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,378
    Thanks
    215
    Thanked 1,025 Times in 546 Posts
    > - add a nonlinear quantization for each input probability to get more bins near 0/1,

    Yes, something like stretch() helps, but the effect is insignificant... perhaps
    the interpolation, especially in update, has to be fixed to support that...
    but I'm not sure how.

    > - allocate a space of N and M partition bins with N:M <> 1:1 (more bins for the more accurate model?).

    My SSE2 class supports that:
    Code:
    template<int Q1=7, int Q2=7, int InitFlag=1>
    struct SSE2i {
      word T[Q2][Q1];
      word P[Q2][Q1];
    [...]
    But I never actually seen any noticeable improvement from tweaking this.
    In fact, beside <5,5>,<6,6>,<7,7>, hardly any other configuration was ever
    useful... maybe there're some side effects in the implementation...
    for example, with 0->0,1->Q-1 mapping there, 0.5 only gets its own entry
    if Q is odd.
    But still, the effect is pretty good even with these restrictions,
    so I never bothered to clarify these things.
    Although note that I only used these kinds of SSE components (1d too)
    with linear counters... interpolation might not be so easily applicable
    with other types.

    > AFAIK noone tried both of these. It'd be nice if you'd find some time to test these.

    Maybe I'd try later again... It'd be reasonable first to do something about its speed though.
    But for now the plan is to introduce fsm counters (my version) and parallel rc.
    Also maybe a multiplication-free rc.

  8. #8
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    286
    Thanks
    9
    Thanked 33 Times in 21 Posts
    I can't find any SSE in the linked file?

    Also I wonder how to interpolate in SSE2D between the two models?
    Obviously p1'=Interpolate(p1), p2'=Interpolate(p2) and averaging the two isn't too clever, but one could place a Mixer there.
    Also it's clear that SSE works best with low variance input, so using it directly on counters could hurt.

  9. #9
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,378
    Thanks
    215
    Thanked 1,025 Times in 546 Posts
    http://nishi.dreamhosters.com/u/o01_v1.rar
    \o01_v1\o01_src\sh_SSE2.inc

Similar Threads

  1. M1 - Optimized demo coder
    By toffer in forum Data Compression
    Replies: 189
    Last Post: 22nd July 2010, 00:49
  2. PAQ8xxx SSE2 builds
    By LovePimple in forum The Off-Topic Lounge
    Replies: 11
    Last Post: 2nd July 2009, 18:43
  3. Comparison of the recent CM demo coders
    By Shelwien in forum Data Compression
    Replies: 38
    Last Post: 13th June 2008, 14:21
  4. PrePAQ v2 (aka paq8o8pre v2)
    By schnaader in forum Forum Archive
    Replies: 10
    Last Post: 18th January 2008, 17:38
  5. QUAD-SFX DEMO
    By encode in forum Forum Archive
    Replies: 17
    Last Post: 26th April 2007, 14:57

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •