Results 1 to 11 of 11

Thread: Decompress a paq8l file

  1. #1
    Member
    Join Date
    Jun 2008
    Location
    G
    Posts
    377
    Thanks
    26
    Thanked 23 Times in 16 Posts

    Decompress a paq8l file

    Hi,

    i have encrypted some data long time ago via paq8l so far i remember. But if i try to uncompress the archive with paq8l it says.


    Extracting 225 file(s) from archive.paq8l -7
    archive.paq8l: header corrupted at 13608

    is there any chance to get the data back?


    thx for advance
    Attached Files Attached Files

  2. #2
    Programmer Jan Ondrus's Avatar
    Join Date
    Sep 2008
    Location
    Rychnov nad Kněžnou, Czech Republic
    Posts
    279
    Thanks
    33
    Thanked 138 Times in 50 Posts
    Sorry,
    it looks like all CR+LF were replaced by LF. This is irreversible so it is impossible to extract original data.

  3. #3
    Member
    Join Date
    Jun 2008
    Location
    G
    Posts
    377
    Thanks
    26
    Thanked 23 Times in 16 Posts
    how could this happen? in encoding step?

  4. #4
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    619
    Thanks
    267
    Thanked 245 Times in 123 Posts
    Quote Originally Posted by Jan Ondrus View Post
    Sorry,
    it looks like all CR+LF were replaced by LF. This is irreversible so it is impossible to extract original data.
    Hm.. repairing the header works with this code (executable, code and "converted" archive attached):

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    
    int main() {
      int i;
      int c;
      FILE* fin = fopen("archive.paq8l", "rb");
      FILE* fout = fopen("archive_.paq8l", "wb");
      for (i = 0; i <= 0x3528; i++) {
        c = fgetc(fin);
        if (c == '\n') {
          fputc('\r', fout);
        }
        fputc(c, fout);
      }
      for(;;) {
        c = fgetc(fin);
        if (c < 0) break;
        fputc(c, fout);
      }
      fclose(fout);
      fclose(fin);
    }
    This seems to restore the first few files correctly (Tracks\Challenges\My Challenges\Buster.Challenge.Gbx ... six.Challenge.Gbx), because the headers ("GBX...") look fine. After that, everything is garbage.
    It should be possible to do some "brute-force" on converting LF back to CR+LF, proceed if the result looks better and revert if it doesn't. I'm not sure if there are corruption signs that could be detected automatically, but since paq8l doesn't have checksums IIRC, it's not possible, so it would still be much manual work to check the results. And for larger non-text files, it will be hard to check validity.

    By the way, there are 6801 LF codes after the header, so a complete brute-force without validity checks isn't feasible as it would need 2**6801 extractions.
    Attached Files Attached Files
    Last edited by schnaader; 8th April 2012 at 12:53.
    http://schnaader.info
    Damn kids. They're all alike.

  5. #5
    Programmer Jan Ondrus's Avatar
    Join Date
    Sep 2008
    Location
    Rychnov nad Kněžnou, Czech Republic
    Posts
    279
    Thanks
    33
    Thanked 138 Times in 50 Posts
    Quote Originally Posted by thometal View Post
    how could this happen? in encoding step?
    I suppose some other software modifed file contents after that. I think it misinterpreted file as text file instead of binary file and made CR+LF->LF conversion mistakenly.

  6. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,007
    Thanks
    301
    Thanked 1,322 Times in 755 Posts
    paq8 uses a carryless rc which is not perfectly bijective, so if the distance between LFs is not too short, it might be possible
    to fix some of them automatically (decode/encode/compare the code/replace last LF with CRLF on mismatch).
    Also after an error, the decompressed data would become totally random very soon (within next 100-200 bytes), so bpc
    tracking can be used (unless files in archive are already compressed, which is unlikely).

  7. #7
    Member
    Join Date
    Jun 2008
    Location
    G
    Posts
    377
    Thanks
    26
    Thanked 23 Times in 16 Posts
    Thx to all, i recovered how the \r were removed. Due to this file was sent via email it was stored in base64 and in convertion process base64 this file was recognized as text file, like it was said before. So i convert the base64 manually back and then i could decompress it via normal version.

    Last question:

    How did you recognized that the \r\n was replaced by \n?

  8. #8
    Programmer Jan Ondrus's Avatar
    Join Date
    Sep 2008
    Location
    Rychnov nad Kněžnou, Czech Republic
    Posts
    279
    Thanks
    33
    Thanked 138 Times in 50 Posts
    Quote Originally Posted by thometal View Post
    How did you recognized that the \r\n was replaced by \n?
    Simlpy by looking at file in hexadecimal view in my viewer i could see it in header and then i searched for CR+LF and there was none in the whole file (about 16 is expected in 1MB of random/compressed data).

  9. #9
    Member
    Join Date
    May 2008
    Location
    England
    Posts
    325
    Thanks
    18
    Thanked 6 Times in 5 Posts
    That looks like Trackmania files to me!

  10. #10
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,610
    Thanks
    30
    Thanked 65 Times in 47 Posts
    One interesting thing is that regular corruption recovery schemes (i.e. Reed-Solomon) are helpless against such issue. I wonder whether there are erasure codes that don't work blockwise and can succeed with a lot of tiny errors spread all over the file.

  11. #11
    Member
    Join Date
    Jun 2008
    Location
    G
    Posts
    377
    Thanks
    26
    Thanked 23 Times in 16 Posts
    Quote Originally Posted by Intrinsic View Post
    That looks like Trackmania files to me!
    yeah you are right should i upload these? =P

    oh ok via statistical assumption, i also could had this idea =/

Similar Threads

  1. Can't extract file from ARC file.
    By Absurd in forum Data Compression
    Replies: 3
    Last Post: 26th January 2009, 21:11
  2. PAQ8L
    By Matt Mahoney in forum Forum Archive
    Replies: 9
    Last Post: 13th March 2007, 20:39

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •