1. ## Sorting Algorithms Benchmark

I just want to inform you about my new project where I will develop some variations of existing sorting algorithms and also mix them in some creative and effective ways.

Project is here: https://github.com/tarsa/SortingAlgorithmsBenchmark

Right now there are mostly variations of heap sort but there are more to come. Also the output, in the long run, will be changed to JSON to allow easy and robust processing by external tools.

I think sorting algorithms benchmark could be interesing for encode.su audience as sorting is used somewhat frequently within compression algorithms.

If you have ideas on how to improve performance or flexibility of the algorithms, feel free to share that ideas with me - either here or through GitHub facilities.

ps:
Right now it compiles under CLang++ 3.3 and fails to compile under GCC 4.6. I'm testing it on Ubuntu 12.04 64-bit.

2. ## Thanks (2):

Bulat Ziganshin (10th April 2014),Cyan (15th May 2014)

3. In case someone is interested:

I'm still playing with heapsorts. Some interesting points:
- I've made clustering heapsorts, ie heaps of small heaps, with the intention to reduce random memory accesses - it turned out that didn't help and even made things worse; now I think it's because speculative out of order execution causes speculative memory fetches (that very often are useful) with very low overhead compared to extra arithmetic for handling clusters,
- I've made ternary and quaternary heapsorts - there is some efficiency improvements from that,
- I've made a backtracking heapsort, eg: https://github.com/tarsa/SortingAlgo...levarianta.hpp ; it reduces the amount of comparisons by almost a half for deep heaps (less for shallow heaps); AFAIK nobody else discovered that trick before; that variant of heapsort is very efficient when it comes to number of comparisons, it does much less comparisons than quicksort on average and is rather close to optimal number of comparisons,
- I've make cascading heapsort, eg: https://github.com/tarsa/SortingAlgo...ngvarianta.hpp ; it allows for very efficient memory prefetching, basically, the amount of concurrent prefetches can be as high as the number of heap levels (which is logarithmic); for sorting ints on big enough heaps it achieves much higher speed than regular heapsort; as above, I think nobody discovered such trick before,
- now I'm thinking on a hybrid heapsort which will combine backtracking heapsort (for reducing number of comparisons) and cascading heapsort (for efficient prefetching); it'll be somewhat tricky,
- also I'm thinking on some cached comparisons heapsort and maybe some more,
- in general I think that I'm able to develop a heapsort that will be faster than quicksort + insertionsort for medium sized arrays (bigger than few dozens of elements but still fitting in fast CPU caches),

4. ## Thanks:

Bulat Ziganshin (9th May 2014)

5. Sorting is extremely well-studied. For that reason, I myself wouldn't work on that problem. I think mergesort is pretty good for locality. I know there are a couple different ways to build a heap. I'd have a high index of suspicion toward any inkling that I'd discovered anything new about such an old subject.

Are you interested in suffix sorting? There might be opportunities there to advance the state of the art. More than general sorting.

6. Are you interested in suffix sorting? There might be opportunities there to advance the state of the art. More than general sorting.
That was the reason I started Suffix Sorting Benchmark. I've done a suffix sorting algorithm before: http://encode.su/threads/1291-My-con...light=qrotsort I plan to work on it further, but first I need the Suffix Sorting Benchmark done to select the best performing sort algorithm for each step of suffix sorting. By 'best' I mean one that is fastest by absolute measure, not merely asymptotical complexity as it's easy today to develop O(n lg n) sorting algorithm based on those that are already well known. Also I need scalability, so developing algorithm that divides the input to many roughly equal partitions at once and does that in parallel is crucial.

Sorting is extremely well-studied. For that reason, I myself wouldn't work on that problem. I think mergesort is pretty good for locality. I know there are a couple different ways to build a heap. I'd have a high index of suspicion toward any inkling that I'd discovered anything new about such an old subject.
Sorting is well studied and suffix sorting also is well studied. But there are always problems with implementations. libdivsufsort is based on many algorithms that have good theory behind them but the devil is in the implementation details. Good implementation can bring big speedups.

As for locality: I'm upset about that most benchmarks in papers and presentations about sorting study bare integers sorting. Quicksort obviously shines in such benchmarks, because of the way memory prefetchers work and because radix-sort is not included in such benchmarks - why not include radix-sort where it can easily work?
What about multipartition algorithms? Most used quicksorts, mergesorts, heapsorts, etc are binary ie quicksorts have one pivot and partitions are splitted into two parts, mergesort divides arrays by two until reaching singletons and then merges exactly two merged subarrays, heapsorts always uses binary heaps, etc That's not good for cache locality as it requires more passes over input data and/ or more random memory accesses.

I was thinking about SIMD enriched integer sorting heapsort, but then I've switched to another experiments. Why use heapsort when you can use radix-sort?
Anyways, I couldn't find SIMD heapsort even though it would be easily doable - eg use 8-ary heapsort and at each step in siftDown use SSE2/ AVX/ whatever to select index of maximum element from the 8 element vector.

If you're aware of similar project to mine then please tell me. I would be extremely happy to know that. I would be even more happy if you could find a paper with similar tricks and observations as mine.

BTW:
Does anyone know some super effective way to select index of maximum element in a short vector using SSE2/ AVX/ etc? If someone gives me a complete solution then I can quickly build a complete primitive-sorting heapsort from that.

7. Originally Posted by Piotr Tarsa
That was the reason I started Suffix Sorting Benchmark. I've done a suffix sorting algorithm before: http://encode.su/threads/1291-My-con...light=qrotsort I plan to work on it further, but first I need the Suffix Sorting Benchmark done to select the best performing sort algorithm for each step of suffix sorting. By 'best' I mean one that is fastest by absolute measure, not merely asymptotical complexity as it's easy today to develop O(n lg n) sorting algorithm based on those that are already well known. Also I need scalability, so developing algorithm that divides the input to many roughly equal partitions at once and does that in parallel is crucial.
There are suffix sort algorithms that achieve O(n)-time, which is obviously equal to the lower bound for suffix sorting. But, supposedly, they are not as fast as some O(n lg n) algorithms in practice. In reality, O(n lg n) is no different than O(n), because if n fits in a machine word, lg n < 64.

Sorting is well studied and suffix sorting also is well studied. But there are always problems with implementations. libdivsufsort is based on many algorithms that have good theory behind them but the devil is in the implementation details. Good implementation can bring big speedups.
People only became interested in suffix sorting relatively recently. I think it started with the Manber and Myers paper on suffix arrays around 1990. Knuth's volume on sorting and searching was first published in 1973.

What's wrong with divsufsort? I thought the implementation was a strong point.

As for locality: I'm upset about that most benchmarks in papers and presentations about sorting study bare integers sorting. Quicksort obviously shines in such benchmarks, because of the way memory prefetchers work and because radix-sort is not included in such benchmarks - why not include radix-sort where it can easily work?
What about multipartition algorithms? Most used quicksorts, mergesorts, heapsorts, etc are binary ie quicksorts have one pivot and partitions are splitted into two parts, mergesort divides arrays by two until reaching singletons and then merges exactly two merged subarrays, heapsorts always uses binary heaps, etc That's not good for cache locality as it requires more passes over input data and/ or more random memory accesses.

I was thinking about SIMD enriched integer sorting heapsort, but then I've switched to another experiments. Why use heapsort when you can use radix-sort?
Anyways, I couldn't find SIMD heapsort even though it would be easily doable - eg use 8-ary heapsort and at each step in siftDown use SSE2/ AVX/ whatever to select index of maximum element from the 8 element vector.

If you're aware of similar project to mine then please tell me. I would be extremely happy to know that. I would be even more happy if you could find a paper with similar tricks and observations as mine.

BTW:
Does anyone know some super effective way to select index of maximum element in a short vector using SSE2/ AVX/ etc? If someone gives me a complete solution then I can quickly build a complete primitive-sorting heapsort from that.
There are tons of papers on sorting. If you use the wrong search engine, or the wrong search terms, it can seem like there are no papers. But here is a sample: http://scholar.google.com/scholar?q=simd+sort https://www.google.com/search?q=site....org+fast+sort

Here is a competition for sorting that's been going on for many years: http://sortbenchmark.org/

8. People only became interested in suffix sorting relatively recently. I think it started with the Manber and Myers paper on suffix arrays around 1990. Knuth's volume on sorting and searching was first published in 1973.

What's wrong with divsufsort? I thought the implementation was a strong point.
divsufsort is great, I only stressed that it has many low-level optimizations that make it perform really quick. It's based on some algorithms, eg SAIS which was a subject of some scientific paper, but originally that algorithm was many times slower than the final divsufsort incarnation.

OTOH, divsufsort has one weak point - it parallelizes poorly, probably never reaching 2x speedup even on many cores (within one block of course). My QRotSort parallelized better, though far from perfect, but was so slow that even with many threads it was often slower. I need to look for some faster and more parallelizable sorting algorithms to make it faster.

There are tons of papers on sorting. If you use the wrong search engine, or the wrong search terms, it can seem like there are no papers. But here is a sample: http://scholar.google.com/scholar?q=simd+sort https://www.google.com/search?q=site....org+fast+sort
Hmm, there are not many free papers (quick look and I didn't found any). I'll look at it deeper another time.

Here is a competition for sorting that's been going on for many years: http://sortbenchmark.org/
But I'm developing a in-memory only sorting, not a file sorter. Doing full BWT is several time slower than reading even from relatively slow HDDs so I/O is not an issue for me.

As to doubling technique from Manbers and Mayer - AFAIR divsufsort doesn't use that at all.

As to sorting being well studied - TimSort was developed relatively recently and found its way into Python (dozen of years ago) and Java 7 reference implementation (though TimSort isn't a part of the spec). TimSort is basically an adaptive mergesort which achieves O(n) complexity when there's a lot of presortedness and O(n lg n) in worst case with smooth transitions between those two. It also performs much better than SmoothSort from Dijkstra, but is not a in-place sort unfortunately. If Tim Peters was able to develop algorithms that improves on real use cases after so many years of global research, then why couldn't someone else?

And if you're so good at searching papers, tell me eg which paper presents the backtracking heapsort trick. Everywhere heapsort is presented it is using the early-stop sift down procedure which requires almost 2x more comparisons that the backtracking one in average case.

PS:
Reading through multiple papers and trying to implement ideas only to find the papers are outdated is no fun. And without an easy to use benchmark of available algorithms it is not easy to see what is relevant today.

9. Originally Posted by Piotr Tarsa
divsufsort is great, I only stressed that it has many low-level optimizations that make it perform really quick. It's based on some algorithms, eg SAIS which was a subject of some scientific paper, but originally that algorithm was many times slower than the final divsufsort incarnation.

OTOH, divsufsort has one weak point - it parallelizes poorly, probably never reaching 2x speedup even on many cores (within one block of course). My QRotSort parallelized better, though far from perfect, but was so slow that even with many threads it was often slower. I need to look for some faster and more parallelizable sorting algorithms to make it faster.

Hmm, there are not many free papers (quick look and I didn't found any). I'll look at it deeper another time.

But I'm developing a in-memory only sorting, not a file sorter. Doing full BWT is several time slower than reading even from relatively slow HDDs so I/O is not an issue for me.

As to doubling technique from Manbers and Mayer - AFAIR divsufsort doesn't use that at all.

As to sorting being well studied - TimSort was developed relatively recently and found its way into Python (dozen of years ago) and Java 7 reference implementation (though TimSort isn't a part of the spec). TimSort is basically an adaptive mergesort which achieves O(n) complexity when there's a lot of presortedness and O(n lg n) in worst case with smooth transitions between those two. It also performs much better than SmoothSort from Dijkstra, but is not a in-place sort unfortunately. If Tim Peters was able to develop algorithms that improves on real use cases after so many years of global research, then why couldn't someone else?

And if you're so good at searching papers, tell me eg which paper presents the backtracking heapsort trick. Everywhere heapsort is presented it is using the early-stop sift down procedure which requires almost 2x more comparisons that the backtracking one in average case.
I was just wondering if you were aware of all the research into sorting. That's all. You don't want to risk repeating work that's already been done. Just looking through all the papers on sorting is exhausting.

I don't think sortbenchmark.org was meant to be limited to external sorting.

My point about Manber and Myers was only that their paper marked the beginning of people's recent interest in suffix sorting, AFAIK. Before suffix arrays, there wasn't much need to sort suffixes.

PS:
Reading through multiple papers and trying to implement ideas only to find the papers are outdated is no fun. And without an easy to use benchmark of available algorithms it is not easy to see what is relevant today.
That's a real problem. When I read papers, my main concern is less that they might be outdated than that they are low-quality and irrelevant. You can't be certain of finding only worthwhile papers on a search engine. What I'd do is look at the algorithms that were used in the sortbenchmark.org winners. The competitors are well-informed and they're sure to use only the best algorithms in their entries. Even if the purpose of the competition is not strictly to test cpu time, they probably would choose algorithms with all-around excellent performance.

I looked at divsufsort at one point and my impression was that a huge amount of optimization and tuning had gone into it. Even if you have one good idea for a suffix sort, divsufsort would probably still be faster due to all that additional tuning. I've had ideas about suffix sorting, but I'm not going to bother implementing them unless or until I have some idea that's absolutely brilliant, because to try to outmatch divsufsort would most likely be a losing battle.

Your post on that other thread about BWT compressors making poor use of long repeats is interesting. That's the kind of problem I'd rather work on.

10. It turned out that I was wrong with the backtracking heapsort
std:: partial_sort uses it and it was also described (well, almost the same idea) on: http://www.inference.phy.cam.ac.uk/m...g/sorting.html (that yellow highlight).

sortbenchmark.org doesn't have any in-memory benchmark, AFAIK after skimming through the page. I was once (or many times) looking for an existing sort benchmarks but they were uninteresting mostly. I aim to test algorithms with different item types, problem sizes, distributions, etc and also make a tool to visualize the results. And I aim to gather them into one place, so somebody else looking for performance evaluation could just look at my benchmark and instantly decide what's best for him without rereading lots of papers.

Your post on that other thread about BWT compressors making poor use of long repeats is interesting. That's the kind of problem I'd rather work on.
qsufsort is a refinement of Manber-Myers algorithm. In both ones long repeats means lots of duplicated keys while doing sorting. Designing a good adaptive sort should help a lot there. Long repeats are an interesting problem by themselves and they need special treatment, but anyway - dealing quickly with short repeats is more important than focusing on reduntant data IMHO. Especially that Manber-Myers derived algos don't exhibit patological cases on any sort of data.

11. Originally Posted by Piotr Tarsa
It turned out that I was wrong with the backtracking heapsort
std:: partial_sort uses it and it was also described (well, almost the same idea) on: http://www.inference.phy.cam.ac.uk/m...g/sorting.html (that yellow highlight).
Interesting link. There might be something there that you could directly apply to compression, since sorting algorithms are also algorithms for compressing permutations. To represent a permutation, just sort it and record the outcome of each comparison. I'm not sure if that analysis has anything to do with actual runtime performance, though. There are so many variables involved that you simply can't just make an algorithm more complicated and not empirically test it and assume you made it faster.

sortbenchmark.org doesn't have any in-memory benchmark, AFAIK after skimming through the page. I was once (or many times) looking for an existing sort benchmarks but they were uninteresting mostly. I aim to test algorithms with different item types, problem sizes, distributions, etc and also make a tool to visualize the results. And I aim to gather them into one place, so somebody else looking for performance evaluation could just look at my benchmark and instantly decide what's best for him without rereading lots of papers.
This paper looks like one you'd be interested in:
http://203.144.248.23/ACM.FT/1810000...351-satish.pdf

It looks on-topic and authoritative to me.

qsufsort is a refinement of Manber-Myers algorithm. In both ones long repeats means lots of duplicated keys while doing sorting. Designing a good adaptive sort should help a lot there. Long repeats are an interesting problem by themselves and they need special treatment, but anyway - dealing quickly with short repeats is more important than focusing on reduntant data IMHO. Especially that Manber-Myers derived algos don't exhibit patological cases on any sort of data.
Does divsufsort have pathological cases? Empirically, I've never run into any. The big-O could understate its true performance. Big-O is only an upper bound and it might not be tight.

12. ## Thanks:

Piotr Tarsa (12th May 2014)

13. Thanks for the link. Seems to be relatively up-to-date. But, OTOH, it's focused on sorting small keys with no additional payload or only small additional payload. No arbitrary objects, no arbitrary comparators, etc Simply: not suitable as generic sorts But interesting nonetheless and I'll read that.

I'm not aware if divsufsort can have pathological cases. AFAIR the algorithm it's based on has pretty good worst case complexity, but I could be wrong.

14. Originally Posted by Piotr Tarsa
Does anyone know some super effective way to select index of maximum element in a short vector using SSE2/ AVX/ etc? If someone gives me a complete solution then I can quickly build a complete primitive-sorting heapsort from that.
PHMINPOSUW

15. Too late I've actually spotted that instruction today while researching the topic: http://stackoverflow.com/questions/2...10050_23590610

I'm ironing out bugs from other heapsorts now so the SIMDified heapsort will be delayed.

16. Originally Posted by Piotr Tarsa
Thanks for the link. Seems to be relatively up-to-date. But, OTOH, it's focused on sorting small keys with no additional payload or only small additional payload. No arbitrary objects, no arbitrary comparators, etc Simply: not suitable as generic sorts But interesting nonetheless and I'll read that.
Have you looked at burstsort? I might have seen papers on sorting large keys, too, but I didn't know what you were looking for.

17. In general the benchmark has to be interesting to as many people as possible :P so including many datatypes is crucial. Burstsort looks promising, I have read about it in the past, maybe I'll play with it. The main selling point for the benchmark will be ease of performance comparison between various algorithms. Usually, in each scientific paper, authors are designing their own benchmarks on their own datasets and their own computers and often the benchmark programs aren't provided and even if they are then they are not very flexible and easy to integrate with other benchmarks.

A sorting algorithm that is interesting for highest number of people has to be generic, ie you have an array of custom objects with custom comparator and you need to sort them. In short: a possible replacement for stl::sort, java.util.Arrays.sort and so on And I want to focus primarily on that. Specialized sorts are useful too and including them in benchmark to measure potential speedups will tell whether it's sensible to use them instead of generic sorts.

18. Originally Posted by Piotr Tarsa
In general the benchmark has to be interesting to as many people as possible :P so including many datatypes is crucial. Burstsort looks promising, I have read about it in the past, maybe I'll play with it. The main selling point for the benchmark will be ease of performance comparison between various algorithms. Usually, in each scientific paper, authors are designing their own benchmarks on their own datasets and their own computers and often the benchmark programs aren't provided and even if they are then they are not very flexible and easy to integrate with other benchmarks.

A sorting algorithm that is interesting for highest number of people has to be generic, ie you have an array of custom objects with custom comparator and you need to sort them. In short: a possible replacement for stl::sort, java.util.Arrays.sort and so on And I want to focus primarily on that. Specialized sorts are useful too and including them in benchmark to measure potential speedups will tell whether it's sensible to use them instead of generic sorts.
Custom objects with custom comparators? That just sounds slow.

I suspect that in order to do more than the most basic optimizations, you have to constrain the problem in some way.

19. TimSort is a generic sort designed for (semi)expensive comparisons and exploitation of various kinds presortedness. It happens that a lot of sorts are done using fancy comparators and the order of data to be sorted is not completely random.

I think that lots of the sorts (ie applications of sorts to particular problems) actually does sorting of pointers to objects and comparators compare the objects that are pointed to. Additionally the objects can have further pointers. That means a lot of random memory accesses and the key to improving performance would be to reduce the number of passes through the array to be sorted, I think. Usual sorts, I mean ordinary quicksort and mergesort here, only reduce problem by two on each pass and that requires quite a lot of passess through data. N way partitioning, where N is substantially bigger than 2, should improve things, I guess. That need to be checked on some real world use cases and also some synthetic ones.

20. Originally Posted by Bulat Ziganshin
PHMINPOSUW
That might actually be pronounceable in Polish.

21. I've pushed simple AVX2 versions of heapsort to the git repo:
https://github.com/tarsa/SortingAlgo...ngvariantb.hpp
https://github.com/tarsa/SortingAlgo...rdvariantb.hpp

They're faster than std::sort for random arrays that fit in the caches. Funny thing is that the cascaded version is faster than the non-cascaded version for array of size 1234567 (int32_t's). That array fits in the last level cache so access should be pretty fast. It turns out that prefetching cache lines from last level cache to first level cache can actually bring a speedup in heapsort or I've broken something.

Benchmark code is very limited still so I can't reliably test sorting performance on small arrays but it seems the advantage of AVX2 heapsort over std::sort should grow as the array gets smaller. This means that instead of late switching into insertion sort we can use earlier switch to SIMD-ified heapsort and gain performance.

I have used code from StackOverflow thread. I haven't yet implemented version that uses PHMINPOSUW instruction.

22. An interesting challenge would be to compute and return the inversion vector. Inversion vectors are integral to compression, and I could not find an efficient implementation for computing MTF, so I wrote my own. Mine is certainly not the last word.

23. Erm, isn't it enough to just append an index to the key and sort the keys together with indexes?

And if you want to compute an inverse of permutation then I've posted somewhere on this forum an efficient solution. Basically, instead of straight few-line solution which causes lots of cache misses, you can do LSD radix sort which utilizes CPU caches well causing overall speedup.

24. I'm thinking on how to use that PHMINPOSUW instruction to select eg index of a minimum in a vector of 32-bit unsigned dwords but I can't come up with anything sensible. Any ideas?

25. Originally Posted by Piotr Tarsa
Erm, isn't it enough to just append an index to the key and sort the keys together with indexes?
Well, that's not a complete algorithm, but mine was O(n lg n) and had the same space requirements, I think. And that sounds pretty close to a viable way to invert an inversion vector, i.e. recover the permutation. But it's not what I actually did.

It's straightforward to adapt any stable sort to count the inversions. I used mergesort, and it switches to insertion sort for small numbers of elements, so that's two different sort algorithms I did it for.

I doubt you can change the asymptotic runtime much from O(n lg n), but you can make incremental improvements.

And if you want to compute an inverse of permutation then I've posted somewhere on this forum an efficient solution. Basically, instead of straight few-line solution which causes lots of cache misses, you can do LSD radix sort which utilizes CPU caches well causing overall speedup.
That's different. I was considering using that on the bwts, but it would have increased the memory requirements. I might pick up on the bwts again some time.

26. The only major application for inversion vectors that I know of are in BWT second step transforms, but asymptotically slow algorithms are fast for small alphabets, so there's not much point in working on that. However, what about a word BWT? That would seem to demand a better MTF algorithm, and it might be a good way to compress text.

27. I would prefer to work on fixing and improving QRotSort rather than looking for some exotic transformations. But first I need some fast sorting algos.

As to PHMINPOSUW I was thinking about algorithm looking like that:
- Goal: Find unsigned minimum element index,
- Input: 8 uint32_ts in a single vector,
- Steps:
1. Extract 8 hi-words into separate register - but how? I've examined the various PACK and UNPACK instructions and can't find anything suitable,
2. Do PHMINPOSUW on hi-words
3. Broadcast the value to the other 7 words in vector - as above - I don't have any idea how to do it quickly,
4. Do a compare to compute the mask of minimum hi-words,
5. Extract 8 lo-words from input vector,
6. Use the mask to saturate lo-words for which the hi-words aren't minimum,
7. Do PHMINPOSUW on lo-words,
8. If minimum lo-word == 0xFFFF then use index from first PHMINPOSUW else use index from second PHMINPOSUW
- Output: Index of minimum element

As you can see the algorithm is pretty complicated, more complicated than the solutions that aren't based on PHMINPOSUW.

28. http://doremifasort.web.fc2.com/doremifasort_eng.html
This is an improvement over quick sort.

29. Originally Posted by Piotr Tarsa
I would prefer to work on fixing and improving QRotSort rather than looking for some exotic transformations. But first I need some fast sorting algos.
I don't know what you dislike about exotic transforms, but it's up to you. Btw, I searched the web for that instruction and I got lots of hits.

30. The DoReMiFaSort looks like it's a tuned Bentley-Sedgewick Quicksort: http://www.cs.princeton.edu/~rs/talk...tIsOptimal.pdf
They're using nontrivial pivot selection but that's not super clever still. std::sort also uses nontrivial pivot selection.
They didn't count number of comparisons. With nontrivial elements to sort, number of comparisons becomes important. Bentley-Sedgewick quicksort is good for data with lots of duplicates or for primitives collections.
So no real breakthough here I think.

I have some possibly very good improvement to quicksort / radix-sort in my mind but first I need to polish my other sorting algorithms.

I don't know what you dislike about exotic transforms, but it's up to you.
It's just priorities. Now I'm working on sorting algorithms per se. Jumping from task to task can be frustrating in the long run.

Btw, I searched the web for that instruction and I got lots of hits.
You mean PHMINPOSUW? And have you found an efficient solution on how to extend PHMINPOSUW to work on vectors of dwords or qwords?

31. Originally Posted by Piotr Tarsa
You mean PHMINPOSUW? And have you found an efficient solution on how to extend PHMINPOSUW to work on vectors of dwords or qwords?
I did not read them. But people have written their experiences with that instruction, and what you're trying to do is a fairly natural thing to want to do.

32. I googled that instruction and I failed to find solution to my problem.

33. Originally Posted by Piotr Tarsa
I googled that instruction and I failed to find solution to my problem.
That could mean nothing, or it could mean that it's not a good idea to use the instruction that way.

Page 1 of 2 12 Last

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•