Results 1 to 15 of 15

Thread: TarsaMatchFinder - complete match finder based on radix sort

  1. #1
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,495
    Thanks
    26
    Thanked 131 Times in 101 Posts

    TarsaMatchFinder - complete match finder based on radix sort

    Project sources: https://github.com/tarsa/TarsaMatchFinder

    Description:
    TarsaMatchFinder outputs matches which can be used in LZ77-style algorithms. Unlike other approaches ( https://encode.su/threads/2710-LZ-M-...ll=1#post51865 ) it doesn't only output longest match per each position. Instead, for every position and match length in specified interval (between min-match and max-match inclusive) it outputs a match (if there's a match for that position and length) with smallest offset.

    In order to reduce the memory usage during match finding the matches are filtered before saving and then a linear interpolation process recovers the filtered matches at a very small cost.

    Requirements:
    - Java 8,
    - SBT (Scala Build Tool) or Lightbend Activator (which is a wrapper over SBT),

    Sample usage:
    Code:
    piotrek@piotr-desktop:~/projekty/TarsaMatchFinder$ java -version
    java version "1.8.0_121"
    Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
    Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)
    piotrek@piotr-desktop:~/projekty/TarsaMatchFinder$ ll corpora/
    razem 106720
    drwxrwxr-x 2 piotrek piotrek     4096 lut 18 14:34 ./
    drwxrwxr-x 9 piotrek piotrek     4096 lut 18 15:18 ../
    -rw------- 1 piotrek piotrek   768771 lip 15  1991 book1
    -rw-rw-r-- 1 piotrek piotrek 18217472 lut 18 15:15 book1.flt
    -rw-rw-r-- 1 piotrek piotrek 65442800 lut 18 15:15 book1.int
    -rw-rw-r-- 1 piotrek piotrek  1066978 lut 18 14:33 calgary.zip
    -rw-rw-r-- 1 piotrek piotrek   100000 lut 17 01:11 enwik5
    -rw-rw-r-- 1 piotrek piotrek  1229072 lut 18 14:30 enwik5.flt
    -rw-rw-r-- 1 piotrek piotrek 11336848 lut 18 14:02 enwik5.int
    piotrek@piotr-desktop:~/projekty/TarsaMatchFinder$ alias activ
    alias activ='~/devel/activator-1.3.12-minimal/bin/activator'
    piotrek@piotr-desktop:~/projekty/TarsaMatchFinder$ activ
    [info] Loading global plugins from /home/piotrek/.sbt/0.13/plugins
    [info] Updating {file:/home/piotrek/.sbt/0.13/plugins/}global-plugins...
    [info] Resolving org.fusesource.jansi#jansi;1.4 ...
    [info] Done updating.
    [info] Loading project definition from /home/piotrek/projekty/TarsaMatchFinder/project
    [info] Set current project to TarsaMatchFinder (in build file:/home/piotrek/projekty/TarsaMatchFinder/)
    > run
    [info] Updating {file:/home/piotrek/projekty/TarsaMatchFinder/}tarsamatchfinder...
    [info] Resolving jline#jline;2.14.1 ...
    [info] Done updating.
    [info] Compiling 7 Scala sources to /home/piotrek/projekty/TarsaMatchFinder/target/scala-2.12/classes...
    [info] Running pl.tarsa.matchfinders.Main 
    Please specify a command
     Available commands (case sensitive):
      help
        displays this help
      find-matches <input> <finder> <min> <max> <compacted>
        finds matches in input file and stores the essential ones
        input: input file with original data
        finder: match finder, one of:
          bfmf: brute force match finder
          tmf: Tarsa match finder
        min: minimum match size, min >= 1, min <= max
        max: maximum match size, max >= min, max <= 120
        compacted: file where filtered matches will be written
      interpolate <input> <compacted> <interpolated>
        reconstructs full set of matches from essential ones
        input: input file with original data
        compacted: file with filtered matches
        interpolated: file where full set of matches will be written
      verify <input> <interpolated>
        uses brute force match finder to verify presence of all matches
        input: input file with original data
        interpolated: file with full set of matches
          
    [success] Total time: 4 s, completed 2017-02-18 15:15:05
    > run find-matches corpora/book1 tmf 3 120 corpora/book1.flt
    [info] Running pl.tarsa.matchfinders.Main find-matches corpora/book1 tmf 3 120 corpora/book1.flt
    [success] Total time: 1 s, completed 2017-02-18 15:15:14
    > run interpolate corpora/book1 corpora/book1.flt corpora/book1.int
    [info] Running pl.tarsa.matchfinders.Main interpolate corpora/book1 corpora/book1.flt corpora/book1.int
    [success] Total time: 0 s, completed 2017-02-18 15:15:21
    > run verify corpora/book1 corpora/book1.int
    [info] Running pl.tarsa.matchfinders.Main verify corpora/book1 corpora/book1.int
    Verification OK
    [success] Total time: 801 s, completed 2017-02-18 15:28:47
    >
    Source code isn't yet very convoluted so should be relatively easy to follow. To understand what's going on I recommend starting from verifier, then going to brute force match finder, then interpolator and then TarsaMatchFinder (which is a trickiest part of the whole toolkit).


    Beware that the memory usage is quite high and performance isn't yet impressive. I recommend starting experiments with small files (ten of kilobytes) and not exceeding one megabyte. If you don't tune JVM memory settings then you can run into memory exhaustion and this is not easy to spot, as before throwing OutOfMemoryError the JVM sometimes runs the multithreaded GC for a long time. This program is single threaded so if you notice it using many cores then probably you've run into memory exhaustion and GC went crazy. JVM doesn't try to use all of the available memory for the Java objects - instead it has own heap with a fixed size limit set at JVM startup.
    Last edited by Piotr Tarsa; 18th February 2017 at 17:59.

  2. Thanks (2):

    Alexander Rhatushnyak (19th February 2017),Bulat Ziganshin (18th February 2017)

  3. #2
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,495
    Thanks
    26
    Thanked 131 Times in 101 Posts
    Has anyone succeeded in running the program? It shouldn't be hard.

    I've created proper documentation (with https://github.com/tarsa/TarsaMatchF...oc/GLOSSARY.md and others), simplified the algorithms a bit (some match filtering techniques were infeasible to get right in radix-sort based match finder), added basic speed optimizations. Everything is available at the same address as before: https://github.com/tarsa/TarsaMatchFinder

    Current stats from running TarsaMatchFinder on enwik8 with minMatch=2 and maxMatch=120:
    - single threaded mode only (multi-threading is on todo list),
    - CPU Intel i5-4670, RAM DDR3 1600 MHz CL9 or something like that,
    - time spent on match finding and collecting essential matches in memory: 28.3s - this equals to about 3.5 MB/s of processing speed,
    - number of essential matches: around 210 millions - each match in packed form takes 8 bytes, so all of them take around 1680 megabytes,
    - number of all optimal matches: around 1421 millions - that would equal to over 11 gigabytes of match data, but as described in documentation, the interpolation from essential matches to all optimal matches is a linear process with low resource usage,

    TarsaMatchFinder gives the same results as BruteForceMatchFinder and should be bug free.

    I was aiming to make match filtering techniques smart enough to keep the number of essential matches below the number of input bytes but that seems infeasible and even not possible. However, the factor by which essential matches outnumber the input symbols is not high. In the example above (enwik8 with minMatch=2 and maxMatch=120) the factor is 2.1x. I'm pretty sure that we could apply some integer sequences compression techniques to compress the matches very quickly and efficiently (like the ones done by Hamid: https://github.com/powturbo/TurboPFor ).

    I also have some speed optimization techniques to try. Speeding up the processing two or three times is very realistic even without rewriting in unmanaged language or mutli-threading. Multi-threading should improve the speed considerably as radix sort is known to scale well.

    In current form the matcher looks for optimal matches only. It doesn't do any fuzzy matches. It also doesn't try to somehow reuse offsets from offsets queue which are used in several LZ77 variations (like LZX, LZMA, zstd, Brotli, etc). Do you know of any match finder that has explicit support for such things (ie fuzzy matches or offsets recycling)? I imagine that it should be relatively easy for simple match finders, eg hash table based ones, but such match finders offers poor efficiency to begin with so it doesn't make much sense on its own. Hybrid approach with a perfect match finder plus an offset-queue-aware hash table based match finder would make more sense.

  4. #3
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,569
    Thanks
    777
    Thanked 687 Times in 372 Posts
    It's no question that such approach will work, the only questions are speed & memory. Radix sort has some hidden costs. My first C implementation was about 25x slower than it should be from raw command execution time, and once i found and applied tlb-related speedup trick, it became only 5x faster.

    Since you perfrom MSB sort, you probably resort only segments having the same prefix? While these data usually fit in L3C, tlb speed cap still limits your speed

    You don't need to find matches starting from len=2. Later, parser will sequentially scan the data, and it can find any matches as far as this is cheap enough, i.e. match finder fits in L2/L3 cache - and this depends on maxdist for particular len. In particular, LZMA finds matches with len<=3 at the parser side, so you can experiment with minlen=4..6.

    While you can speedup your approach by using m/t, the same is possible for sequential match finders - even with binary trees. Like other sorting approaches, i find your MF more interesting for GPUs rather than cpus, because sequential MFs are incompatible with GPU mass parallelism

    Optimal parsers can't expect REP matches from match finder, since MF doesn't know previous match history and therefore can't check these matches. Instead, parser itself check matches at those distances when it processes each position. You can look in tornado sources:


    // Evaluate all the encoding possibilities starting at the position `p`,
    // including literal, matches at repeated distances and all found matches.
    // The [matches,matches_end) buffer should contain length/distance for every match found,
    // starting at the position `p`, in the order of strictly increasing `length`.
    UINT evaluate_literal_and_repdist (BYTE *p)
    { ... }

  5. Thanks:

    Piotr Tarsa (26th February 2017)

  6. #4
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,495
    Thanks
    26
    Thanked 131 Times in 101 Posts
    It's no question that such approach will work, the only questions are speed & memory. Radix sort has some hidden costs. My first C implementation was about 25x slower than it should be from raw command execution time, and once i found and applied tlb-related speedup trick, it became only 5x faster.
    By "raw command execution time" you mean CPU instructions execution time? Well, I think that my radix sort is more limited by random memory access than reordering suffix array. I've checked now and for enwik8 half of the time elements are not actually moved during reordering pass. But probably the TLB trick should make a difference for big segments when doing early passes (ie ones for low prefix lengths).

    Since you perfrom MSB sort, you probably resort only segments having the same prefix? While these data usually fit in L3C, tlb speed cap still limits your speed
    I'm doing ordinary MSB radix sort, so yes, there are multiple nested segments. Most of the time actually is currently spent sorting the smallest segments, below 70 elements in size. For enwik8 those smallest segments account for 2/3 of the total processing time.

    You don't need to find matches starting from len=2. Later, parser will sequentially scan the data, and it can find any matches as far as this is cheap enough, i.e. match finder fits in L2/L3 cache - and this depends on maxdist for particular len. In particular, LZMA finds matches with len<=3 at the parser side, so you can experiment with minlen=4..6.
    Interesting idea. I've prepared some quick statistics and indeed, excluding short or near matches can reduce their number considerably. Output for enwik8:
    Code:
    TarsaMatchFinder.run took 30116 ms
    Tarsa accepted:  209938792
    Tarsa discarded: 1211051522
    Accepted matches' lengths histogram:
      <= 2 - 45875441
      == 3 - 30620705
      == 4 - 22957449
      == 5 - 19017258
      >= 6 - 91467939
    Accepted matches' offsets histogram:
           offset == 1e0 - 381061
     1e0 < offset <= 1e1 - 3399160
     1e1 < offset <= 1e2 - 24811818
     1e2 < offset <= 1e3 - 37869073
     1e3 < offset <= 1e4 - 33505504
     1e4 < offset <= 1e5 - 30765117
     1e5 < offset <= 1e6 - 31017943
     1e6 < offset <= 1e7 - 29848850
     1e7 < offset <= 1e8 - 18340266
     1e8 < offset        - 0
    Accepted matches with length >= 3 and offset >= 64KiB - 84206274
    While you can speedup your approach by using m/t, the same is possible for sequential match finders - even with binary trees. Like other sorting approaches, i find your MF more interesting for GPUs rather than cpus, because sequential MFs are incompatible with GPU mass parallelism
    If I understood correctly then you've stated that working on single binary tree from many threads causes too much contention and slows things so much that it's slower than single threaded match finding. Alternative was to have multiple binary trees so memory usage would be proportional to number of threads. Radix sort can be parallelized without requiring to build multiple suffix arrays. You can work on single suffix array from many threads without causing much contention. Therefore the memory usage will be independent from number of threads and that would be an advantage over binary trees.

    Optimal parsers can't expect REP matches from match finder, since MF doesn't know previous match history and therefore can't check these matches. Instead, parser itself check matches at those distances when it processes each position. You can look in tornado sources:
    Yes, if match finder and parser are decoupled then it would be really hard to search for repeated offsets in match finder. But many LZ77-style compressors don't have decoupled match finding and parsing, so maybe someone actually did an offset-recycling-aware match finder?


    I was looking at Brotli RFC: https://tools.ietf.org/html/rfc7932 and its codes for offsets recycling:
    Code:
          0: last distance
          1: second-to-last distance
          2: third-to-last distance
          3: fourth-to-last distance
          4: last distance - 1
          5: last distance + 1
          6: last distance - 2
          7: last distance + 2
          8: last distance - 3
          9: last distance + 3
         10: second-to-last distance - 1
         11: second-to-last distance + 1
         12: second-to-last distance - 2
         13: second-to-last distance + 2
         14: second-to-last distance - 3
         15: second-to-last distance + 3
    It looks it would require some arcane magic to use these codes effectively. I wonder how much savings those codes bring in practice?

  7. #5
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,569
    Thanks
    777
    Thanked 687 Times in 372 Posts
    By "raw command execution time" you mean CPU instructions execution time? Well, I think that my radix sort is more limited by random memory access than reordering suffix array. I've checked now and for enwik8 half of the time elements are not actually moved during reordering pass. But probably the TLB trick should make a difference for big segments when doing early passes (ie ones for low prefix lengths).
    yes, each radix pass perfroms only a few cpu instructions per element

    where you need to perfrom random memaccess? inside sorting, inpuit data are accessed sequentially, and output is written in 256 segments, each accessed sequentially

    Interesting idea. I've prepared some quick statistics and indeed, excluding short or near matches can reduce their number considerably.
    excluding near matches is indeed an interesting improvement, although we need efficient way to find all these matches at parsing time. probably it will require to use a binary tree

    It looks it would require some arcane magic to use these codes effectively.
    it can be done by small changes to the tornado procedure, although i'm not sure about compression efficiency - tornado always select locally-optimal match, while lzma parser speculates over next few matches IMHO exactly to find globally-optimal decisions

    But many LZ77-style compressors don't have decoupled match finding and parsing, so maybe someone actually did an offset-recycling-aware match finder?
    well, MF can check REP matches first and avoid checking of regular matches. But with binary tree you anyway need to insert new data and this takes almost the same time as checking matches, do it doesn't make much sense. And without binary tree, you can't do exhasutive match search

    If I understood correctly then you've stated that working on single binary tree from many threads causes too much contention and slows things so much that it's slower than single threaded match finding. Alternative was to have multiple binary trees so memory usage would be proportional to number of threads.
    1. i was incorrect - Igor doesn't made any tests, only speculated about it at the times when Core2 had a serious perfromance hit for concurrent memory access from 2 cores. so it should be tested before going to any conclusions

    2. memory occupied by a binary tree is proportional to amount of elements, so ALL threads together will occupy 12*dictsize bytes for their binary trees - compared to single binary tree where we capitalize on the fact that tree_index==dict_index and therefore can use the same number for indexing both arrays:

    struct single_tree_element {
    int left_index, right_index; // same number indexes both dictionary and tree
    }

    struct multi_tree_element {
    int dict_index; // dictionary index corresponding to the current tree element
    int left_tree_index, right_tree_index;
    }

  8. #6
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,495
    Thanks
    26
    Thanked 131 Times in 101 Posts
    where you need to perfrom random memaccess? inside sorting, inpuit data are accessed sequentially, and output is written in 256 segments, each accessed sequentially
    Well, I'm not caching the whole suffix in suffix array. Basically I'm only storing a starting index of suffixes. It would be infeasible to store a 120-byte suffix per input byte. I've seen that your radix sort match finder is limited to maximum of 8 bytes. In such case it would be acceptable to cache those 8 bytes at the beginning and then we would have only sequential accesses. But as I've said the maximum match limit in case of my match finder is 120 so I can cache only a small portion of the suffix. Because average LCP in a suffix array for eg enwik8 is quite large (many dozens, I would assume) I need to re-cache data very often.

    1. i was incorrect - Igor doesn't made any tests, only speculated about it at the times when Core2 had a serious perfromance hit for concurrent memory access from 2 cores. so it should be tested before going to any conclusions
    2. binary tree has the size proportional to amount of elements, so ALL threads together will occupy 12*dictsize bytes for their binary trees - compared to single binary tree where we capitalize on the fact that tree_index==dict_index and therefore can use the same number for indexing both arrays
    So basically the solution is to make many separate binary trees? Where is the improvement then? With separate trees I need to search all of them to find the best match. Gain can be achieved only on adding matches as a single match has to be added to only one tree. Or I've misunderstood something?

  9. #7
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,569
    Thanks
    777
    Thanked 687 Times in 372 Posts
    well, you can try another approach - instead of sorting data all the way, use sorting only to split data into small enough segments and then use traditional match finders like BT. this approach allows to exploit m/t, but reduces size of each BT to the amount that may fit into L3 cache. afair from my experiments with enwik9, largest segment with same first 4 bytes was 16 MB or so.

    So basically the solution is to make many separate binary trees? Where is the improvement then? With separate trees I need to search all of them to find the best match. Gain can be achieved only on adding matches as a single match has to be added to only one tree. Or I've misunderstood something?
    the whole point of m/t match finders i described is that all positions are split into N categories based on the hash value, so each thread maintains only 1/N of all positions. i.e. if match finder searches 4+ byte matches then we compute

    thread_num = hash(*(uint32*)pos) % N

    and then perfrom all MF operations on this position only in that thread

  10. #8
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,495
    Thanks
    26
    Thanked 131 Times in 101 Posts
    well, you can try another approach - instead of sorting data all the way, use sorting only to split data into small enough segments and then use traditional match finders like BT. this approach allows to exploit m/t, but reduces size of each BT to the amount that may fit into L3 cache. afair from my experiments with enwik9, largest segment with same first 4 bytes was 16 MB or so.
    I'll try various approaches and benchmark them. Right now I'm caching only one byte for big segments and the match finder already has decent speed IMHO.

    the whole point of m/t match finders i described is that all positions are split into N categories based on the hash value, so each thread maintains only 1/N of all positions. i.e. if match finder searches 4+ byte matches then we compute

    thread_num = hash(*(uint32*)pos) % N

    and then perfrom all MF operations on this position only in that thread
    OK, got it. Makes sense now. So basically then a single tree contains all substrings with given 4-byte prefix and we only need to search that tree for a match with such prefix.

  11. #9
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,569
    Thanks
    777
    Thanked 687 Times in 372 Posts
    OK, got it. Makes sense now. So basically then a single tree contains all substrings with given 4-byte prefix and we only need to search that tree for a match with such prefix.
    it's enough to have one BT array per thread, so threads don't try to modify single cache line (64 bytes) simultaneously

    but BT array isn't a single binary tree, exactly like hash chain array isn't a SINGLE hash chain. instead, we have head[] array, indexed by hash(pos)%HEAD_SIZE and it points to heads of multiple small binary trees or hash chains. you may want to look into tornado/lzma/zstd sources

    Right now I'm caching only one byte for big segments and the match finder already has decent speed IMHO.
    BT match finder in lzma works at 5-7 MB/s, while having smaller memreqs
    Last edited by Bulat Ziganshin; 26th February 2017 at 23:35.

  12. #10
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,495
    Thanks
    26
    Thanked 131 Times in 101 Posts
    well, you can try another approach - instead of sorting data all the way, use sorting only to split data into small enough segments and then use traditional match finders like BT. this approach allows to exploit m/t, but reduces size of each BT to the amount that may fit into L3 cache. afair from my experiments with enwik9, largest segment with same first 4 bytes was 16 MB or so.
    The binary tree or suffix array segment can fit in L3 cache, but the data they point to is scattered troughout the whole file. For example the substring "the " appears 713909 times in enwik8, but such substrings appear in every part of the file.

    The size of a segment in radix sorting is not a problem because suffix array indexes/ pointers are accessed in pretty much sequential way.

  13. #11
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,943
    Thanks
    293
    Thanked 1,286 Times in 728 Posts
    > It looks it would require some arcane magic to use these codes effectively. I wonder how much savings those codes bring in practice?

    I did some related experiments here:
    https://encode.su/threads/1288-LZMA-...ll=1#post25165

    I'd expect +x/-x code to help even less, if there's no explicit price estimation for fuzzy matches.

  14. #12
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,569
    Thanks
    777
    Thanked 687 Times in 372 Posts
    The binary tree or suffix array segment can fit in L3 cache, but the data they point to is scattered troughout the whole file.
    Cpu cache line is 64 bytes, but it seems that default BIOS setting is to read two adjancent cache lines from memory on the cache miss. This means that each match on average occupies 150-160 bytes of cpu cache, and 8 MB cache can hold ~50K matches. So, you can use BT starting with this segment size

    It's not necessary to build tree in the same way as LZMA. Any container can be used as far as it allows fast building of the match list. In particular, you may try B-trees with nodes of ~64 bytes.

    You can cache match bytes in tree nodes. You can use different node structure at different depth levels (in particular, store less info in leaves since half of binary tree nodes are leaves, but they participate only in 1/tree_depth string comparisions)

  15. #13
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,495
    Thanks
    26
    Thanked 131 Times in 101 Posts
    I have made some progress - mainly by implementing LCP-aware insertion sort with outputting filtered matches. That insertion sort turned out to be pretty tough, but I've finally done it the way it should be done. Now match finding is working at a rate of 4 MB/s on 3.8 GHz i5 4670 CPU, 1600 MHz CL8 RAM - code is still in Scala. There are still many things to improve, though.

    BT match finder in lzma works at 5-7 MB/s, while having smaller memreqs
    This figure is for single threaded match finding or multi-threaded one?

    Hamid has made a benchmark where lzma program works at a rate of 1.26 MB/s on enwik8 (on 4.5 GHz i5 2500k CPU, unknown RAM): https://sites.google.com/site/powturbo/home/benchmark
    Code:
    Text file: enwik8
       Size	 	Ratio	 C.Time  D.Time Compresor
        	 	     	   MB/s    MB/s
     24763989	 24.8	   1.26   93.14 lzma 9
    Going from 5-7 MB/s (and assuming it's single thread result) to 1.26 MB/s would mean that lzma compression is heavily dominated by things other than match finding (ie presumably parsing). Additionally, I think "lzma 9" uses a window size substantially smaller than the 100 MB of full enwik8.

  16. #14
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,569
    Thanks
    777
    Thanked 687 Times in 372 Posts
    yes, my numbers were probably for non-exhaustive MF. on my 4770 (3.9GHz Haswell, 1600-CL10 RAM), exhaustive lzma BT matchfinder with 100MB window is 2 MB/s for enwik8, 4.3 MB/s for a binary file

  17. #15
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,495
    Thanks
    26
    Thanked 131 Times in 101 Posts
    Development of TarsaMatchFinder will be stalled for some time, but I've sketched the direction of development already. I am planning to use tries to implement streaming processing: https://github.com/tarsa/TarsaMatchF...r/doc/TREES.md

    Accidentially, I've invented low RAM usage mode (described in that TREES.md file). It could be used to implement exhaustive match finding with very low memory requirements, eg 1.2x the sliding window size. The downside is that it would require a lot of I/O, but that transfers to/ from disk would be almost fully sequential, thus making it in some way practical.

Similar Threads

  1. who invented insertion sort?
    By just a worm in forum The Off-Topic Lounge
    Replies: 4
    Last Post: 16th March 2015, 15:38
  2. Radix sort match finder
    By Conor in forum Data Compression
    Replies: 17
    Last Post: 13th February 2015, 13:19
  3. eXdupe 0.4.1 with complete source
    By rrrlasse in forum Data Compression
    Replies: 2
    Last Post: 10th June 2013, 13:12
  4. Index-Compress-Update: parallel LZ match finder algo
    By Bulat Ziganshin in forum Data Compression
    Replies: 22
    Last Post: 10th January 2012, 19:36
  5. Replies: 7
    Last Post: 19th March 2011, 10:50

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •