# Thread: De-compressing unknown file (CTF riddle)

1. ## De-compressing unknown file (CTF riddle)

I am trying to de-compress file that was compressed with unknown algorithm.
I am pretty new in this area. I read about algoirthm such LZ 77, 78, LZW, LZRW 1, etc.
I need to find the algorithm.

Here is the file:
https://ufile.io/648r6

The extension of the file is: I am not sure if this is kind of clue.

This is a compressed PNG file.

I think they are using some kind of LZRW 1, I am not sure.

I succeed to de-compress the IHeader of the file correctly (I verified it with CRC).
This the comparing between them: This is the algorithm I noticed it worked with: There is a 1 byte flag every each 8 bytes.
If the flag is > 0 I am checking its bits.
For example:
08 (0000 1000)
00 00 02 58 (00 83) 5A 08 06

The flag is 08 => 0000 1000
So it need to go 5 step forward after the flag => 00 83 ....

The dictionary looks to be built by 2 bytes, for example: 00 83
In binary:
0000 0000 1000 0011

The first 4 bits from the right are might be the length and the rest (12 bits) are probably the absolute offset.
offset length
(0000 0000 1000) (0011)

Check again the the picture of the above algorithm if you still didn't understand.

I wrote decompress algorithm according to the above picture, every 8 bytes checking the flag, if its != 0, I am going to the index it point on and take the length and the offset according to what I wrote.
The problem is that it worked for me only on the first chunk, on rest (IDAT) it didn't work.

I looked to the next flags after the first chunk dictionary (0x0 0x83) and saw a new dictionaries (0x9 0x84): My algorithm takes the 0x98 as the offset and 0x4 as the length.
Maybe there is something else that the 0x9 is response to.
I tried to calculate it in any way like that:
0x9 + 0x8, 0x9 | 0x8 and etc.
I don't see the pattern.

This is the "best" result I received with my algorithm: My algorithm:
Code:
```Algoritm pesudo code:

i = 0
while i < len(data):
controlbits = data[i] // controlbits is 8 bites
i += 1
for bit in controlbits:
if bit == 0:
output += data[i]
i += 1
else: // bit is 1
// Example: data[i] = 0x0, data[i+1] = 0x83
// data[i] = 0000 0000
// data[i+1] = 1000 0011
// dict = 0000 0000 1000 0011
dict = data[i] << 8
dict |=  data[i+1]

offset = dict[0:12]
len = dict[12:16]
output += output[offset: offset+len]
i += 2```
Any idea ?  Reply With Quote

2. Sorry, I don't get it either.
But I think that field can't just be absolute offset.
Only 4k fits in 12 bits, but even this file is longer.
Also, I think there's some correlation with current offset in unpacked data:
Code:
```00000014:  000083 ; match: dist=8 len=3
0000001A:  000083 ; match: dist=8 len=3
000000AA:  000984 ; match: dist=152 len=4
000000B3:  000984 ; match: dist=152 len=4
000000BB:  000984 ; match: dist=152 len=4
000000C4:  000994 ; match: dist=153 len=4
000000CC:  000995 ; match: dist=153 len=5
000000D5:  0009A4 ; match: dist=154 len=4
000000DD:  0009A4 ; match: dist=154 len=4
000000E6:  0009B3 ; match: dist=155 len=3```
I think these all refer to the same zero string at offset 8... and distance kinda correlates with ofs/16.  Reply With Quote

3. Yes, it probably not absolute.
I tried also to use relative like: 0x0 0x83 => 8 + 3 = 11 => go 11 backward => didn't help.

Why you think they all related to the zero string at offset 8 ?
How for example, 0009A4 can be related ?

The problem starts when the first bye is not zero.
For example: 00 83 => it worked well
For 09 84 => it doesn't.  Reply With Quote

4. > Why you think they all related to the zero string at offset 8 ?

Just look at some other png file in a hex viewer.
To me it seems like the most likely match in these locations.

> How for example, 0009A4 can be related ?

0x08 appears at offset 0x14/1A, 0x9A appears at offset 0xD5 (and may actually refer to 0x1A instead).
Also, there can be different nibble order (ie 9A4 could be len=4, dist=0xA9 instead), or even some fixed
displacement for the distance field. (dist=0x80 at pos=0x14 is not really a problem, if the window is zeroed at init).  Reply With Quote

5. Also, there can be different nibble order

It won't work with 0x083 (len=3, offset=0x80)
I tried with and exception, only if the first byte is != I will try to reverse the bytes.
So with 0x0 0x83 => offset = 0x8
With 0x9A4 => offset = 0xA9

The picture still blured

0x08 appears at offset 0x14/1A, 0x9A appears at offset 0xD5 (and may actually refer to 0x1A instead).
I still don't see how can I assume that.

Lets go on the byte after the last 0x83, 0x984 => the "offset" is 0x98.
How from 0x98 I can understand that I need to go to offset 0x8 (the original 3 zeros), 0x14 or 0x1A.
I am not sure this is the way.

If I were need to compress this file, I would compress that thing that repeat, like the 3 zeros:
offset
0008: 00
0009: 00
000A: 00
000B: 0d
..

At offset 0014 and 001A he author so that they repeat, so he compressed them.

So 0x984 should point on some other bytes that return, but I can't understand whay.

I think like you, that there is a correlation between the current offest.
Maybe te 8 left bits have some other role.  Reply With Quote

6. > I still don't see how can I assume that.

Well, length field is pretty much confirmed, right?
I mean, you even checked png header crc?
Actually len=3 encoded as 3 is pretty rare - normally there's like +3 increment for length.

And then it means that at least literals would be at right locations in decoder output?

> So 0x984 should point on some other bytes that return, but I can't understand whay.

I was thinking that there could be something like ((pos&0xFFF0)-dist). Then 0xAA&0xFFF0-0x98=8.
Or it might even copy matches from packed data (literals are as is there anyway).

So, any more samples? A correct decoded version for this sample?
Do you have the program that decodes them?  Reply With Quote

7. Well, length field is pretty much confirmed, right?
I mean, you even checked png header crc?
Yes, this is the IHDR code after I de-compressed it:
49 48 44 52
00 00 02 58 00 00 00 5a 08 06 00 00 00
64 31 28 fe

When we check according to this website: https://www.lammertbies.nl/comm/info...lculation.html
CRC(49484452000002580000005a0806000000) = 0x643128fe
So the CRC 0x643128fe is the one they wrote in the file.

So, any more samples? A correct decoded version for this sample?
Do you have the program that decodes them?
I have two samples:
https://ufile.io/648r6 (the first file from my first post)
<not_relevant>(other file compressed in the same method)

I don't have a correct decoded version for this sample, I need to find by my self the de-compressor algorithm.  Reply With Quote

8. Well, this is more than nothing too.
Code:
```0017.0014:  0083 ; match: dist=8 pos=FF8 len=3; 00 00 00
001D.001A:  0083 ; match: dist=8 pos=FF8 len=3; 00 00 00
00BE.00AB:  0993 ; match: dist=99 pos=FF7 len=3; 00 00 00
00C7.00B4:  0994 ; match: dist=99 pos=7 len=4; 0A 00 00 00
00CE.00BC:  0994 ; match: dist=99 pos=7 len=4; 0A 00 00 00
00D5.00C4:  0995 ; match: dist=99 pos=17 len=5; 5A 08 06 00 00
00DB.00CD:  09A5 ; match: dist=9A pos=16 len=5; 00 5A 08 06 00
00E2.00D6:  09B3 ; match: dist=9B pos=25 len=3; 49 44 41
00EA.00DE:  09B3 ; match: dist=9B pos=25 len=3; 49 44 41
00FD.00F0:  0C73 ; match: dist=C7 pos=19 len=3; 06 00 00
0102.00F6:  0A83 ; match: dist=A8 pos=38 len=3; 8B 2E 82
0104.00F9:  0994 ; match: dist=99 pos=47 len=4; 7A 24 BC 81
0106.00FD:  0AB4 ; match: dist=AB pos=35 len=4; EE 72 91 8B
0108.0101:  0994 ; match: dist=99 pos=57 len=4; 7A 34 53 A8
010B.0105:  0AF4 ; match: dist=AF pos=41 len=4; 77 54 CC 6F
010D.0109:  0995 ; match: dist=99 pos=57 len=5; 7A 34 53 A8 4C
010F.010E:  0B34 ; match: dist=B3 pos=3D len=4; 90 4B 28 94
0111.0112:  09A5 ; match: dist=9A pos=66 len=5; 3C 96 72 A4 87
0113.0117:  0B74 ; match: dist=B7 pos=49 len=4; BC 81 64 89
0115.011B:  09B3 ; match: dist=9B pos=65 len=3; D1 3C 96
0117.011E:  0BB5 ; match: dist=BB pos=45 len=5; 49 2A 7A 24 BC
0119.0123:  09B3 ; match: dist=9B pos=75 len=3; A0 A0 8B```
(first 4 digits is offset in packed file, 2nd set is offset in unpacked file).
As you can see, there's kinda a dependency on offset value, but it resets after 0x100?  Reply With Quote

9. I prefer that we will work on the first file I gave (https://ufile.io/648r6) just to prevent from mistakes.

As you can see, there's kinda a dependency on offset value, but it resets after 0x100?
I don't think it is good to based on further flags because the output is changing all the time and the offset will also be changed.
But the 5 first dictionaries (0083, 0083, 000984, 000984 ,...) are better to be based on.  Reply With Quote

10. I just noticed that someone wrote that he solved it here:
https://reverseengineering.stackexch...nown-algorithm

He wrote:
Managed to properly decompress enough of the image to get the flag I needed.
Turns out the file was compressed with what seems like a homebrew hybrid of LZRW1 with some strange dictionary build routine.
Can't upload yet, but if anyone is interested I can do so in about a week.

I he said that it kind of LZRW1 hybrid, so maybe he meant regarding the flags.
LZRW1 has 2 byte flags but we are using here 1 byte flag.

Regarding the dictionary I still not sure.
I tried over the weekend number of combination with the first byte:
00 83

Code:
```a = 00 << 4
offset = a + 8

0984
a = 09 << 4
offset = a + 0x98```
nothing interesting  Reply With Quote

11. Well, he only "managed to decompress enough of image", which doesn't really require to understand how these matches work.
Most of matches there are zeroes, so its in theory possible to just build the correct unpacked file manually.

As to "strange dictionary build routine", what we know is
(1) "distance" can be the same in multiple matches, while supposedly referring to the same data string
(2) "distance" is increased by 1 every 16 bytes or so
(3) "distance" increment is reset every 256 bytes or so

Anyway, it should be possible to do a bruteforce/manual mapping of .bcmp dictionary indexes (="distance")
to actual file positions. I mean, we know these for 0x008 and 0x099, and other values also supposedly
refer to some strings in unpacked data, so actual mapping can be bruteforced by png parsing / deflate decoding.  Reply With Quote

12. I want to try the bruteforcing idea.
But I am not sure I understand the idea.

This is the info of the first file I gave:

Code:
```00000014:  000083 ; match: dist=8 len=3
0000001A:  000083 ; match: dist=8 len=3
000000AA:  000984 ; match: dist=152 len=4
000000B3:  000984 ; match: dist=152 len=4
000000BB:  000984 ; match: dist=152 len=4
000000C4:  000994 ; match: dist=153 len=4
000000CC:  000995 ; match: dist=153 len=5
000000D5:  0009A4 ; match: dist=154 len=4
000000DD:  0009A4 ; match: dist=154 len=4
000000E6:  0009B3 ; match: dist=155 len=3```
So, the idea is like to try to add +1 to the distance each time ?

Like that:

Code:
```for i=0..200:
dec0mpress(i)
...
byte1
byte2
len = byte1 & 0x0F
offset = byte1 >> 4
offset = byte2 + offset
...```

This is what you mean?
Example:
i = 0:
byte1 = 0x84
byte2 = 0x9
len = 4
offset = 0x8
offset = byte2 + offset = 0x9 + 0x8 = 0x11  Reply With Quote

13. > But I am not sure I understand the idea.

See http://nishi.dreamhosters.com/u/defl_matches_v0.rar etc
After IDAT tag there's 2 bytes of zlib header and then deflate stream (type=2).
And there're not many matches, so we can bruteforce their meaning by looking at the length of data correctly decoded from deflate.

> This is what you mean?

I don't know how exactly it works either - need more examples.

But I don't think its simply encrypted for no reason (like nibble1+nibble2 or whatever you tried).
More likely, there's really some method of dictionary construction, instead of normal LZ distances.
Maybe it works like a cpu cache, or some such.  Reply With Quote

14. The CTF is over Anybody has the solution ?
https://reverseengineering.stackexch...nown-algorithm
https://reverseengineering.stackexch...ssion-was-used
http://forum.zenhax.com/viewtopic.php?t=4716  Reply With Quote

15. No   Reply With Quote

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•