The question is.
Suppose we have two sequences
1: AAABAA
2: AABAAA
Use ST2 (STX):
And we get out of 1st and 2nd row from the same recent columns.
AA
AA
AA
AA
AB
BA
From different lines have the same result.
Or where is wrong?
The question is.
Suppose we have two sequences
1: AAABAA
2: AABAAA
Use ST2 (STX):
And we get out of 1st and 2nd row from the same recent columns.
AA
AA
AA
AA
AB
BA
From different lines have the same result.
Or where is wrong?
With STn you still sort the whole string shifts like in BWT, and still output the last column.
The difference is that you only include n symbols in comparisons.
Also in STn you'd still need a EOF symbol or an index position like in BWT.
1: AAABAA
2: AABAAA
Use ST2 (STX):
1.
shift matrix:
AAABAA <- index0
AAAABA
AAAAAB
BAAAAA
ABAAAA
AABAAA
after sort:
AAABAA <- index0
AAAABA
AAAAAB
AABAAA
ABAAAA
BAAAAA
Result : AAAABA
index0: 0
2.
shift matrix:
AABAAA <- index0
AAABAA
AAAABA
AAAAAB
BAAAAA
ABAAAA
after sort:
AABAAA <- index0
AAABAA
AAAABA
AAAAAB
ABAAAA
BAAAAA
Result : AAAABA <- equal to 1
index0: 0
Or as a result I do not have to take second, and the last column?
For 1: AABAAA
For 2: AAABAA
You have to take the last column.
Well, there's a different (and much simpler) way to build ST2.
Just count the occurrences of symbol pairs, then allocate a buffer for each pair, then put
the symbols encountered in context of that pair into that buffer.
Something like this:
Code:uint Forward_ST2( byte* inpbuf, byte* outbuf, uint inplen ) { uint i,c,p,q; memset( o2, 0, sizeof(o2) ); p = inpbuf[inplen-2]; q = inpbuf[inplen-1]; for( i=0; i<inplen; i++ ) { c = inpbuf[i]; o2[1+p+(q<<8)]++; p = q; q = c; } for( i=0,c=2; i<CNUM*CNUM; i++ ) o2[i]=c, c+=o2[i+1]; p = inpbuf[inplen-2]; outbuf[0] = p; q = inpbuf[inplen-1]; outbuf[1] = q; for( i=0; i<inplen; i++ ) { c = outbuf[ o2[p+(q<<8)]++ ] = inpbuf[i]; p = q; q = c; } return inplen+2; } uint Inverse_ST2( byte* inpbuf, byte* outbuf, uint inplen ) { uint i,j,c,p,q; memset( o1, 0, sizeof(o2) ); memset( o2, 0, sizeof(o2) ); for( i=2; i<inplen; i++ ) c=inpbuf[i], o1[1+c]++; // symbol freqs for( i=0,c=2; i<CNUM; i++ ) o1[i]=c, c+=o1[i+1]; o1[CNUM]=c; // now offsets for( j=0; j<CNUM; j++ ) { // symbol loop q = j; for( i=o1[j]; i<o1[j+1]; i++ ) { c = inpbuf[i]<<8; o2[1+q+c]++; // byte freqs } } for( i=0,c=2; i<CNUM*CNUM; i++ ) o2[i]=c, c+=o2[i+1]; p = inpbuf[0]; q = inpbuf[1]; for( i=0; i<inplen-2; i++ ) { c = outbuf[i] = inpbuf[ o2[p+(q<<8)]++ ]; p = q; q = c; } return inplen-2; }
with something as simple as
AABAAA and AAABAA
why not use the BWTS transform and then you don't need a stupid INDEX
AABAAA > AAB AAA two lyndon words the strings of interest
are AAA AAA AAA AAB ABA BAA which sort to
AAA
AAA
AAA
AAB
ABA
BAA
giving AAABAA as the BWTS of AABAAA
while
AAABAA > AAAB AA two lyndon words with strings
AA AA AAAB AABA ABAA BAAA
sorting
AAA Note repeat string to match length of longest lyndon word
AAA
AAAB
AABA
ABAA
BAAA Use last symbol of strings of interest in case for the length 2
lyndon words you use the second symbol.
giving AABAAA as the BWTS string of AAABAA and again no need for an INDEX
All thanks.
biject.bwts I do not fully understand a tricky algorithm. This is a simple BWT?
Shelwien Thanks, it significantly speeds up the decoding ST2.
And explain what the difference in the inverse transform algorithms for different X - differ from ST2_decode ST8_decode?
I wonder how, in general, and optimized versions.
You can see the ST3,ST4,ST5 implementation in libbsc.
In short, ST4+ has to be much more complicated than ST2, because allocating 16G static arrays for it is still troublesome atm.
I looked bsc_st.cpp decoding STX - not so difficult to implement, but for a long time to execute.
In fact, it turns out that sorting the entire length of the contexts we have shifted from compression to decompression. However, decompression is use is usually much more often than compression and we get the loss in whole . Or is there something wrong?
And presumably, that should be faster encoding + decoding for STX or for BWT, ie that faster unSTX (STX) or unBWT (BWT)?
Only reverse ST3 transform is faster than BWT. BWT also can be easy optimized for multi-core systems where ST is not.
Enjoy coding, enjoy life!
Yes. But there is a possibility to use all the cores. To count the number of different characters all the cores and threads. To save the intermediate data can be used shared memory.
I think you can speed up tenfold more than the CPU.
Even if we can speed up a bit - it will share the load between the CPU and GPU.
Random memory access is slow on CPU too.
RAM works well in consistently access and in the random access cache adds speed .
Block larger than 1MB will not be able to use almost the L2 cache, and we can only use L3.
Using key -b10 - 100, CPU will not have any benefit in using the cache.
http://www.1024cores.net/home/in-rus...ata-structures
http://www.1024cores.net/home/parall...ous-algorithms