Results 1 to 2 of 2

Thread: Unaligned bitstring matching experiment

  1. #1
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,375
    Thanks
    214
    Thanked 1,023 Times in 544 Posts

    Unaligned bitstring matching experiment

    Here I made a tool to find bitstring matches

    http://nishi.dreamhosters.com/u/uam_find_v0.rar
    Usage:
    uam_find inputfile tempfile

    For example, it finds stuff like this in mp3s:
    Code:
    002EC19D : 4C 41 4D 45 33 2E 39 37 AA AA AA AA AA AA AA AA    LAME3.97кккккккк
    002EC1AD : AA AA AA AA AA AA AA AA AA AA AA 22 00 02 D3 9B    ккккккккккк" ╙Ы
    
    008EDDA6 : 98 82 9A 8A 66 5C 72 6F 55 55 55 55 55 55 55 55    ШВЪКf\roUUUUUUUU
    008EDDB6 : 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55    UUUUUUUUUUUUUUUU
    It only finds fairly long matches (like 64+ bytes) though.

    Now, I'm interested whether there're any filetypes with nontrivial
    matches like that (beside mp3). I tested some videos, but didn't
    find anything interesting. But then, there were some bitmask
    table matches in executables and some redundant codes in
    zip archives, so I have some hope that there're more interesting cases.

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,375
    Thanks
    214
    Thanked 1,023 Times in 544 Posts
    I guess its better to apply this kind of analysis to files already processed
    by rep/srep. http://freearc.org/research/SREP.aspx
    Otherwise it can find a block of zeroes and happily report it as the longest one.

    Code:
    Z:\>SREP32I.EXE -l16 -m2 PIC 1
    0 mb used for hash
    Compression ratio: 513216 -> 192184: 37.45%. Cpu 32.846 mb/sec, real 19.853 mb/sec
    
    Z:\uam-find.exe 1 2
    sum=1537472 x=6788 avglen=226.4
    67880 bytes in hashtable
    Reading the hashtable
    Converting the hashtable
    Sorting the hashes
    Counting the matches
    Total matching data = 8874 bits = 1109 bytes, 55 matches, average = 161.3 bits/match
    Unaligned matching data = 1593 bits = 199 bytes, 11 matches, average = 144.8 bits/match
    Longest unaligned match is ofs 0000C063 bit 6 and ofs 0000C6CF bit 5, 186 bits
    And as to bitwise images... these surely can be used as example
    (at least some unaligned matches would be found for sure), but
    its not really interesting

Similar Threads

  1. Advanced Lazy Matching
    By encode in forum Data Compression
    Replies: 15
    Last Post: 8th May 2008, 01:29

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •