Let's compare output of ST4 transformation with snapshot of HT4 matcfinder table at some moment. They are pretty similar - both contains pointers to buffer sorted by 4 pointed chars. The difference is that, 1) ST4 doesn't suffer from match collisions, and 2) ST4 includes all buffer positions while HT4 holds only last N entries for every hash value
ST4 output even may be used to search matches like the HT4 table - first, we should find pointer to the current position in the buffer (using some sort of binary search or index structure), and then scan preceding positions - either all with the same 4 bytes, or limiting search to fixed depth
Going further, we can replace ST4 transformation with HST4 - sorting pointers by some hash of 4 bytes pointed plus position. HST4 advantages over ST4: 1) the transformation can be done in single pass and work faster, 2) we can save indexes used in the transformation and employ them for the following search operations:
this brings us the complete matchfinder solution but it's hardly any better that plain HT4. But unlike HT4 and any other dynamic matchfinders, HST4 allows to implement multi-threaded lazy/greedy matchfinder. The key is that hst4_output[] doesn't updated after sorting so it may be used in multiple threads. so we can split input buffer into muliple pieces and compress them all simultaneously. all we need to replicate for every thread is the index[] array. modified code parts are:Code://HST4: sorting for(i=0; i<N; i++) word = *(uint32*)(buf+i) index[hash(word)]++ for(i=1; i<index_elems; i++) index[i+1] += index[i] for(i=0; i<N; i++) word = *(uint32*)(buf+i) idx = index[hash(word)]++ hst4_output[idx] = buf+i for(i=index_elems-1; i>0; i--) index[i] = index[i-1] // Now index[hash(word)] points to the first hst4_output[] element having the same hash(word) // As we will go through the buf[], these indexes will be updated to point to hst4_output[] element where we should start search: // HST4: searching for(i=0; i<N; i++) word = *(uint32*)(buf+i) idx = index[hash(word)]++ for (; hash(*(uint32*)hst4_output[idx]) == hash(word); idx--) check match length at postion pointed by hst4_output[idx]
Code:for (core=0; core<number_of_cores; core++) // first, save initial indexes for this part of buffer for(i=0; i<index_elems; i++) local_index[core][i] = index[i] // second, process the next N/number_of_cores positions for(i=core*N/number_of_cores; i<(core+1)*N/number_of_cores; i++) word = *(uint32*)(buf+i) idx = index[hash(word)]++ hst4_output[idx] = buf+i // HST4 searching in the piece number K for(i=K*N/number_of_cores; i<(K+1)*N/number_of_cores; i++) word = *(uint32*)(buf+i) idx = local_index[K][hash(word)]++ for (; hash(*(uint32*)hst4_output[idx]) == hash(word); idx--) check match length at postion pointed by hst4_output[idx]