Results 1 to 15 of 15

Thread: De-compressing unknown file (CTF riddle)

  1. #1
    Member
    Join Date
    Aug 2017
    Location
    England
    Posts
    8
    Thanks
    0
    Thanked 0 Times in 0 Posts

    De-compressing unknown file (CTF riddle)

    I am trying to de-compress file that was compressed with unknown algorithm.
    I am pretty new in this area. I read about algoirthm such LZ 77, 78, LZW, LZRW 1, etc.
    I need to find the algorithm.


    Here is the file:
    https://ufile.io/648r6


    The extension of the file is:



    I am not sure if this is kind of clue.


    This is a compressed PNG file.


    I think they are using some kind of LZRW 1, I am not sure.


    I succeed to de-compress the IHeader of the file correctly (I verified it with CRC).
    This the comparing between them:



    This is the algorithm I noticed it worked with:



    There is a 1 byte flag every each 8 bytes.
    If the flag is > 0 I am checking its bits.
    For example:
    08 (0000 1000)
    00 00 02 58 (00 83) 5A 08 06


    The flag is 08 => 0000 1000
    So it need to go 5 step forward after the flag => 00 83 ....


    The dictionary looks to be built by 2 bytes, for example: 00 83
    In binary:
    0000 0000 1000 0011


    The first 4 bits from the right are might be the length and the rest (12 bits) are probably the absolute offset.
    offset length
    (0000 0000 1000) (0011)


    Check again the the picture of the above algorithm if you still didn't understand.


    I wrote decompress algorithm according to the above picture, every 8 bytes checking the flag, if its != 0, I am going to the index it point on and take the length and the offset according to what I wrote.
    The problem is that it worked for me only on the first chunk, on rest (IDAT) it didn't work.


    I looked to the next flags after the first chunk dictionary (0x0 0x83) and saw a new dictionaries (0x9 0x84):



    My algorithm takes the 0x98 as the offset and 0x4 as the length.
    Maybe there is something else that the 0x9 is response to.
    I tried to calculate it in any way like that:
    0x9 + 0x8, 0x9 | 0x8 and etc.
    I don't see the pattern.


    This is the "best" result I received with my algorithm:



    My algorithm:
    Code:
    Algoritm pesudo code:
    
    
    data = readFile(filePath)
    i = 0
    while i < len(data):
        controlbits = data[i] // controlbits is 8 bites
        i += 1
        for bit in controlbits:
            if bit == 0:
                output += data[i]
                i += 1
            else: // bit is 1
                // Example: data[i] = 0x0, data[i+1] = 0x83
                // data[i] = 0000 0000
                // data[i+1] = 1000 0011
                // dict = 0000 0000 1000 0011
                dict = data[i] << 8
                dict |=  data[i+1]
    
    
                offset = dict[0:12]
                len = dict[12:16]
                output += output[offset: offset+len]
                i += 2
    Any idea ?

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,305 Times in 741 Posts
    Sorry, I don't get it either.
    But I think that field can't just be absolute offset.
    Only 4k fits in 12 bits, but even this file is longer.
    Also, I think there's some correlation with current offset in unpacked data:
    Code:
    00000014:  000083 ; match: dist=8 len=3
    0000001A:  000083 ; match: dist=8 len=3
    000000AA:  000984 ; match: dist=152 len=4
    000000B3:  000984 ; match: dist=152 len=4
    000000BB:  000984 ; match: dist=152 len=4
    000000C4:  000994 ; match: dist=153 len=4
    000000CC:  000995 ; match: dist=153 len=5 
    000000D5:  0009A4 ; match: dist=154 len=4
    000000DD:  0009A4 ; match: dist=154 len=4
    000000E6:  0009B3 ; match: dist=155 len=3
    I think these all refer to the same zero string at offset 8... and distance kinda correlates with ofs/16.

  3. #3
    Member
    Join Date
    Aug 2017
    Location
    England
    Posts
    8
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Yes, it probably not absolute.
    I tried also to use relative like: 0x0 0x83 => 8 + 3 = 11 => go 11 backward => didn't help.

    Why you think they all related to the zero string at offset 8 ?
    How for example, 0009A4 can be related ?

    The problem starts when the first bye is not zero.
    For example: 00 83 => it worked well
    For 09 84 => it doesn't.

  4. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,305 Times in 741 Posts
    > Why you think they all related to the zero string at offset 8 ?

    Just look at some other png file in a hex viewer.
    To me it seems like the most likely match in these locations.

    > How for example, 0009A4 can be related ?

    0x08 appears at offset 0x14/1A, 0x9A appears at offset 0xD5 (and may actually refer to 0x1A instead).
    Also, there can be different nibble order (ie 9A4 could be len=4, dist=0xA9 instead), or even some fixed
    displacement for the distance field. (dist=0x80 at pos=0x14 is not really a problem, if the window is zeroed at init).

  5. #5
    Member
    Join Date
    Aug 2017
    Location
    England
    Posts
    8
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Also, there can be different nibble order

    It won't work with 0x083 (len=3, offset=0x80)
    I tried with and exception, only if the first byte is != I will try to reverse the bytes.
    So with 0x0 0x83 => offset = 0x8
    With 0x9A4 => offset = 0xA9

    The picture still blured

    0x08 appears at offset 0x14/1A, 0x9A appears at offset 0xD5 (and may actually refer to 0x1A instead).
    I still don't see how can I assume that.


    Lets go on the byte after the last 0x83, 0x984 => the "offset" is 0x98.
    How from 0x98 I can understand that I need to go to offset 0x8 (the original 3 zeros), 0x14 or 0x1A.
    I am not sure this is the way.


    If I were need to compress this file, I would compress that thing that repeat, like the 3 zeros:
    offset
    0008: 00
    0009: 00
    000A: 00
    000B: 0d
    ..


    At offset 0014 and 001A he author so that they repeat, so he compressed them.


    So 0x984 should point on some other bytes that return, but I can't understand whay.


    I think like you, that there is a correlation between the current offest.
    Maybe te 8 left bits have some other role.



  6. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,305 Times in 741 Posts
    > I still don't see how can I assume that.

    Well, length field is pretty much confirmed, right?
    I mean, you even checked png header crc?
    Actually len=3 encoded as 3 is pretty rare - normally there's like +3 increment for length.

    And then it means that at least literals would be at right locations in decoder output?

    > So 0x984 should point on some other bytes that return, but I can't understand whay.

    I was thinking that there could be something like ((pos&0xFFF0)-dist). Then 0xAA&0xFFF0-0x98=8.
    Or it might even copy matches from packed data (literals are as is there anyway).

    So, any more samples? A correct decoded version for this sample?
    Do you have the program that decodes them?

  7. #7
    Member
    Join Date
    Aug 2017
    Location
    England
    Posts
    8
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Well, length field is pretty much confirmed, right?
    I mean, you even checked png header crc?
    Yes, this is the IHDR code after I de-compressed it:
    49 48 44 52
    00 00 02 58 00 00 00 5a 08 06 00 00 00
    64 31 28 fe

    When we check according to this website: https://www.lammertbies.nl/comm/info...lculation.html
    CRC(49484452000002580000005a0806000000) = 0x643128fe
    So the CRC 0x643128fe is the one they wrote in the file.




    So, any more samples? A correct decoded version for this sample?
    Do you have the program that decodes them?
    I have two samples:
    https://ufile.io/648r6 (the first file from my first post)
    <not_relevant>(other file compressed in the same method)

    I don't have a correct decoded version for this sample, I need to find by my self the de-compressor algorithm.
    Last edited by cr33p; 3rd August 2017 at 16:18.

  8. #8
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,305 Times in 741 Posts
    Well, this is more than nothing too.
    Code:
    0017.0014:  0083 ; match: dist=8 pos=FF8 len=3; 00 00 00 
    001D.001A:  0083 ; match: dist=8 pos=FF8 len=3; 00 00 00 
    00BE.00AB:  0993 ; match: dist=99 pos=FF7 len=3; 00 00 00 
    00C7.00B4:  0994 ; match: dist=99 pos=7 len=4; 0A 00 00 00 
    00CE.00BC:  0994 ; match: dist=99 pos=7 len=4; 0A 00 00 00 
    00D5.00C4:  0995 ; match: dist=99 pos=17 len=5; 5A 08 06 00 00 
    00DB.00CD:  09A5 ; match: dist=9A pos=16 len=5; 00 5A 08 06 00 
    00E2.00D6:  09B3 ; match: dist=9B pos=25 len=3; 49 44 41 
    00EA.00DE:  09B3 ; match: dist=9B pos=25 len=3; 49 44 41 
    00FD.00F0:  0C73 ; match: dist=C7 pos=19 len=3; 06 00 00 
    0102.00F6:  0A83 ; match: dist=A8 pos=38 len=3; 8B 2E 82 
    0104.00F9:  0994 ; match: dist=99 pos=47 len=4; 7A 24 BC 81 
    0106.00FD:  0AB4 ; match: dist=AB pos=35 len=4; EE 72 91 8B 
    0108.0101:  0994 ; match: dist=99 pos=57 len=4; 7A 34 53 A8 
    010B.0105:  0AF4 ; match: dist=AF pos=41 len=4; 77 54 CC 6F 
    010D.0109:  0995 ; match: dist=99 pos=57 len=5; 7A 34 53 A8 4C 
    010F.010E:  0B34 ; match: dist=B3 pos=3D len=4; 90 4B 28 94 
    0111.0112:  09A5 ; match: dist=9A pos=66 len=5; 3C 96 72 A4 87 
    0113.0117:  0B74 ; match: dist=B7 pos=49 len=4; BC 81 64 89 
    0115.011B:  09B3 ; match: dist=9B pos=65 len=3; D1 3C 96 
    0117.011E:  0BB5 ; match: dist=BB pos=45 len=5; 49 2A 7A 24 BC 
    0119.0123:  09B3 ; match: dist=9B pos=75 len=3; A0 A0 8B
    (first 4 digits is offset in packed file, 2nd set is offset in unpacked file).
    As you can see, there's kinda a dependency on offset value, but it resets after 0x100?

  9. #9
    Member
    Join Date
    Aug 2017
    Location
    England
    Posts
    8
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I prefer that we will work on the first file I gave (https://ufile.io/648r6) just to prevent from mistakes.

    As you can see, there's kinda a dependency on offset value, but it resets after 0x100?
    I don't think it is good to based on further flags because the output is changing all the time and the offset will also be changed.
    But the 5 first dictionaries (0083, 0083, 000984, 000984 ,...) are better to be based on.

  10. #10
    Member
    Join Date
    Aug 2017
    Location
    England
    Posts
    8
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I just noticed that someone wrote that he solved it here:
    https://reverseengineering.stackexch...nown-algorithm

    He wrote:
    Managed to properly decompress enough of the image to get the flag I needed.
    Turns out the file was compressed with what seems like a homebrew hybrid of LZRW1 with some strange dictionary build routine.
    Can't upload yet, but if anyone is interested I can do so in about a week.

    I he said that it kind of LZRW1 hybrid, so maybe he meant regarding the flags.
    LZRW1 has 2 byte flags but we are using here 1 byte flag.

    Regarding the dictionary I still not sure.
    I tried over the weekend number of combination with the first byte:
    00 83

    Code:
    a = 00 << 4 
    offset = a + 8 
    
    0984
    a = 09 << 4
    offset = a + 0x98
    nothing interesting

  11. #11
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,305 Times in 741 Posts
    Well, he only "managed to decompress enough of image", which doesn't really require to understand how these matches work.
    Most of matches there are zeroes, so its in theory possible to just build the correct unpacked file manually.

    As to "strange dictionary build routine", what we know is
    (1) "distance" can be the same in multiple matches, while supposedly referring to the same data string
    (2) "distance" is increased by 1 every 16 bytes or so
    (3) "distance" increment is reset every 256 bytes or so

    Anyway, it should be possible to do a bruteforce/manual mapping of .bcmp dictionary indexes (="distance")
    to actual file positions. I mean, we know these for 0x008 and 0x099, and other values also supposedly
    refer to some strings in unpacked data, so actual mapping can be bruteforced by png parsing / deflate decoding.

  12. #12
    Member
    Join Date
    Aug 2017
    Location
    England
    Posts
    8
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I want to try the bruteforcing idea.
    But I am not sure I understand the idea.

    This is the info of the first file I gave:

    Code:
    00000014:  000083 ; match: dist=8 len=3
    0000001A:  000083 ; match: dist=8 len=3
    000000AA:  000984 ; match: dist=152 len=4
    000000B3:  000984 ; match: dist=152 len=4
    000000BB:  000984 ; match: dist=152 len=4
    000000C4:  000994 ; match: dist=153 len=4
    000000CC:  000995 ; match: dist=153 len=5 
    000000D5:  0009A4 ; match: dist=154 len=4
    000000DD:  0009A4 ; match: dist=154 len=4
    000000E6:  0009B3 ; match: dist=155 len=3
    So, the idea is like to try to add +1 to the distance each time ?

    Like that:


    Code:
    for i=0..200:
        dec0mpress(i)
            ...
            byte1
            byte2
            len = byte1 & 0x0F
            offset = byte1 >> 4
            offset = byte2 + offset
            ...


    This is what you mean?
    Example:
    i = 0:
    byte1 = 0x84
    byte2 = 0x9
    len = 4
    offset = 0x8
    offset = byte2 + offset = 0x9 + 0x8 = 0x11

  13. #13
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,305 Times in 741 Posts
    > But I am not sure I understand the idea.

    See http://nishi.dreamhosters.com/u/defl_matches_v0.rar etc
    After IDAT tag there's 2 bytes of zlib header and then deflate stream (type=2).
    And there're not many matches, so we can bruteforce their meaning by looking at the length of data correctly decoded from deflate.

    > This is what you mean?

    I don't know how exactly it works either - need more examples.

    But I don't think its simply encrypted for no reason (like nibble1+nibble2 or whatever you tried).
    More likely, there's really some method of dictionary construction, instead of normal LZ distances.
    Maybe it works like a cpu cache, or some such.

  14. #14

  15. #15
    Member
    Join Date
    Aug 2017
    Location
    England
    Posts
    8
    Thanks
    0
    Thanked 0 Times in 0 Posts
    No

Similar Threads

  1. de-zlibber ?
    By cbloom in forum Data Compression
    Replies: 3
    Last Post: 26th July 2016, 12:54
  2. [LZ] M/T & GPU (de)compression
    By Bulat Ziganshin in forum Data Compression
    Replies: 0
    Last Post: 31st December 2015, 12:09
  3. compressing a really small 1k .COM file
    By Rugxulo in forum Data Compression
    Replies: 3
    Last Post: 28th November 2009, 00:32
  4. Replies: 8
    Last Post: 12th April 2009, 02:39
  5. Can't allocate memory required for (de)compression..help!
    By Duarte in forum Data Compression
    Replies: 19
    Last Post: 18th July 2008, 18:14

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •