i just made interesting observations:
Code:
C:\>bsc
Platform specific options:
-G Enable Sort Transform acceleration on NVIDIA GPU, default: disable
-P Enable large 2MB RAM pages, default: disable
-t Disable parallel blocks processing, default: enable
-T Disable multi-core systems support, default: enable
C:\>timer bsc d m:\b9 nul -P
Kernel Time = 3.244 = 00:00:03.244 = 20%
User Time = 74.521 = 00:01:14.521 = 479%
Process Time = 77.766 = 00:01:17.766 = 499%
Global Time = 15.554 = 00:00:15.554 = 100%
C:\>timer bsc d m:\b9 nul
Kernel Time = 1.544 = 00:00:01.544 = 9%
User Time = 97.500 = 00:01:37.500 = 576%
Process Time = 99.045 = 00:01:39.045 = 585%
Global Time = 16.926 = 00:00:16.926 = 100%
C:\>timer bsc d m:\b9 nul -PT
Kernel Time = 0.592 = 00:00:00.592 = 1%
User Time = 55.832 = 00:00:55.832 = 98%
Process Time = 56.425 = 00:00:56.425 = 99%
Global Time = 56.519 = 00:00:56.519 = 100%
C:\>timer bsc d m:\b9 nul -T
Kernel Time = 1.232 = 00:00:01.232 = 1%
User Time = 72.400 = 00:01:12.400 = 98%
Process Time = 73.632 = 00:01:13.632 = 99%
Global Time = 73.648 = 00:01:13.648 = 100%
it seems that bsc already has parallel unbwt. large pages make s/t unbwt ~1.5x faster (18 seconds of overall time is -e1 model execution), but doesn't change much for m/t mode - may be because there aren't enough threads, since cpu time still increases by 23 seconds. unfortunately, there is no way to increase number of threads to check that assumption
EDIT: from BSC history:
Changes in 2.2.0 (June 15, 2010)
- Added parallel version of reverse BWT transform
so sorry, you reinvented the wheel
look up "num_indexes" in the bsc sources