I've recently done more work on the buffered radix matchfinder I used in the experimental Radyx archiver (https://encode.su/threads/2134-Radyx-archiver). It's simpler and faster than the one in Radyx, and I doubt there's much more scope for improvement. The library is written in C. I used some performance tricks and portability code from Zstandard.
I posted a 2-thread Silesia comparison graph in the github project: https://github.com/conor42/fast-lzma2
The performance gap over 7-zip is even higher with 4 threads than 2.
It's work in progress and there's no DLL build yet, only a fuzzer and a benchmark. I've made a 7-zip fork which uses the library by default, which I'll upload with binaries once I sort out building all the components.
Regards,
Conor