Results 1 to 10 of 10

Thread: New archive format

  1. #1
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    I have been working on a new archive format that supports files over 2 GB without any nonportable code. As a demo, I created an archiver that stores both uncompressed files and files compressed with lpaq1 7 (needs 387 MB memory)
    http://cs.fit.edu/~mmahoney/compression/#lpq1

    To make the code portable without any Windows or Linux specific code, all files have to be accessed sequentially without using fseek(). Files are divided into blocks (about 64KB to 1MB) with the format:

    "lPq" 1 [filename [0 mode oldsize newsize data]...]...

    The first 4 bytes identify the archive and version number. The compressed files are concatenated together with a filename and a sequence of blocks. Each block has a mode ('c' for compressed or 's' for stored), the uncompressed size (4 bytes), compressed size (4 bytes) and the compressed data.

    To list the contents, the program has to read the whole file, skipping over the data, but there is no easy other way to do this portably with large files. It is fast enough, though. The compressor doesn't know the file size until it is done, and it has to buffer the compressed data so it can write the block headers.

    The archiver uses commands similar to 7zip and rar (just a, x, and l). To keep it simple, it only supports solid archives that can't be updated. (The compressor is not initialized between files, so compression is better). Also, it will never clobber files, just skip over existing files and keeps going. It lets you rename files when you extract them, which is an annoying problem with some archivers. It doesn't create directories (which would not be portable).

    I will probably use this new format in my upcoming paq9.

  2. #2
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Thanks Matt! I'm looking forward to updates of this, and the upcoming paq9.

  3. #3
    Member
    Join Date
    Oct 2007
    Location
    Germany, Hamburg
    Posts
    408
    Thanks
    0
    Thanked 5 Times in 5 Posts
    Yeah nice, lpaq with an archive format and also an extended one. It rocks, thank you

  4. #4
    Member Vacon's Avatar
    Join Date
    May 2008
    Location
    Germany
    Posts
    523
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hello everyone,

    quite an interesting attempt!
    I'm wondering, if this format will make it possible to decrease the border between the different OS and make it easier to exchange data.
    I had some really ugly discussions about something like that on OpenOffice.org's dev-list (german) regarding content and format of the files to be handed out to users (cab vs. 7z, about one year ago), so I'm really curious! Good luck!

    Best regards!

  5. #5
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    Quote Originally Posted by Vacon
    Im wondering, if this format will make it possible to decrease the border between the different OS and make it easier to exchange data.
    this format will be used only in his own tools (and may be other with the same ideology) and just allows to omit all the complexities of OS-specific programming in the program which main goal is compression, not full-scale archives support

    this format is inappropriate for general-purpose archivers like 7z/fa

  6. #6
    Member Vacon's Avatar
    Join Date
    May 2008
    Location
    Germany
    Posts
    523
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hello everyone,

    Quote Originally Posted by Bulat Ziganshin
    this format is inappropriate for general-purpose archivers like 7z/fa
    This makes sense to me -> as I understand Matt, his main goal isnt strenght in compression (now), but *maybe* it can be enhanced someday to better serve some needs (for instance the lack of ability to create directories).
    Concerning compression and comfort its *far* away from FreeArc and 7-zip, but TAR is usefull allthough its no compressor at all!
    Nevertheless -> its an interesting idea.

    Best regards!

  7. #7
    Member
    Join Date
    Oct 2007
    Location
    Germany, Hamburg
    Posts
    408
    Thanks
    0
    Thanked 5 Times in 5 Posts
    No, I think you understand it wrong. What the goal of this is to support all kind of files. It has nothing (not much) todo with compression. But for a real archiver it takes too long to step through the whole file without knowing sizes for example. No reasons on compression that will be very good here

  8. #8
    Member Vacon's Avatar
    Join Date
    May 2008
    Location
    Germany
    Posts
    523
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hello everyone,

    Quote Originally Posted by Simon Berger
    No, I think you understand it wrong.
    Sh...t happens, so maybe you are right.
    Matt?! Some more comments?

    Best regards!

  9. #9
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Well, it really is an experimental format, not intended to replace archivers like zip or 7zip or rar. The goal is not lots of features or support for existing formats. The goal is a simple design where many files of different types can be compressed together to improve compression, but still be as useful as a file compressor that lets you name the output file.

    Suppose I want to compress a folder with many different file types like text, .exe, .wav, .bmp, etc. The problem with single file compressors is I either have to compress the files one at a time, so I can't use redundancy between text files, or tar them together first, so I can't use special filters for the different file types. The lpq1 format will solve this problem by creating solid archives but allowing different file types to be compressed using different algorithms. Right now it supports -c (compress) and -s (store) but I plan to add other types in paq9. The default will be to let the program decide by using file name extensions or guessing by looking at the beginning of the file, but you can override that with options on a file by file basis.

    Also, another reason to keep features to a minimum is if I decide to make self extracting archives, the decompressor needs to be small. So it doesn't make directories, search recursively, strip paths, ask if you want to overwrite files, update the archive, and all this other stuff you already know how to work around without learning a bunch of new commands. Wildcard handling is built into g++, so I don't plan to build that into the program to work with other compilers like VC++, Intel, Borland, etc. I would rather spend my effort improving speed and compression. Someone else can build a GUI and add all these features outside the engine if they want

  10. #10
    Member Vacon's Avatar
    Join Date
    May 2008
    Location
    Germany
    Posts
    523
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hello everyone,

    thank you Matt for making that clearer to us!
    Quote Originally Posted by Matt Mahoney
    I would rather spend my effort improving speed and compression. Someone else can build a GUI and add all these features outside the engine if they want
    Divide et impere.

    Best regards!

Similar Threads

  1. StuffIt X Format
    By maadjordan in forum Data Compression
    Replies: 19
    Last Post: 9th August 2008, 14:03
  2. Universal Archive Format
    By Bulat Ziganshin in forum Data Compression
    Replies: 1
    Last Post: 9th July 2008, 01:54
  3. Bit Archive Format
    By osmanturan in forum Forum Archive
    Replies: 39
    Last Post: 29th December 2007, 00:57
  4. Help on designing new file archiver format
    By Lasse Reinhold in forum Forum Archive
    Replies: 10
    Last Post: 27th September 2007, 16:01
  5. UZ2 file format
    By encode in forum Forum Archive
    Replies: 0
    Last Post: 13th July 2007, 00:00

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •