Results 1 to 11 of 11

Thread: Database compressors / preprocessors?

  1. #1
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    573
    Thanks
    245
    Thanked 98 Times in 77 Posts

    Question Database compressors / preprocessors?

    Do you know of a compressor or archiver that's content-aware regarding sqlite or other databases?

    Just to clarify, I am NOT talking about VACUUM command or similar in which the output is still a functional database

  2. #2
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    517
    Thanks
    25
    Thanked 45 Times in 37 Posts
    Quote Originally Posted by Gonzalo View Post
    Do you know of a compressor or archiver that's content-aware regarding sqlite or other databases?

    Just to clarify, I am NOT talking about VACUUM command or similar in which the output is still a functional database
    It's really depend on the database.
    For example mysql/mariadb dumps are typically text-based.
    Others (ex MS SQL) runs with binary BAK

  3. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,135
    Thanks
    320
    Thanked 1,397 Times in 802 Posts
    https://stackoverflow.com/questions/...qlite-database

    I'm aware of some preprocessors for simple DB formats (like .dbf) in storage engines - they try to detect sizes of fields and rows
    and reorder them to improve compression.

    However for more advanced formats that's not really practical - there its already a kind of recompression,
    and is usually considered too slow.

    Binary DB formats are also rather obscure, usually have multiple different versions and in many cases are proprietary and undocumented.
    So compressing their dumps really makes sense, though its not really lossless.

  4. #4
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    573
    Thanks
    245
    Thanked 98 Times in 77 Posts
    I see... Yes, I guess it makes sense to try and compress the dump. Sadly, is not my use case. I just want to be able to archive a .db file among other files on a program folder for distribution.

    I tried the usual asymmetric lossless compressors and even some fast symmetric ones like bsc, bcm, ccm plus the most common preprocessors. The only program that showed a significant improvement over baseline was razor somehow (I consider wimlib+rep+srep+flzma2 my baseline as it usually compresses better and way faster than 7z, rar, FA and similar with full dictionaries).

    So now I'm thinking maybe the rolz family is better suited for the task? That or Christian just happened to add some very clever transforms like he usually does...

  5. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,135
    Thanks
    320
    Thanked 1,397 Times in 802 Posts
    Try Bulat's delta preprocessor maybe?
    Razor has dedup, exe preprocessor and delta tokens, otherwise its not really better than lzma.

  6. #6
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    573
    Thanks
    245
    Thanked 98 Times in 77 Posts
    Yup, one of the first things I did. I tried Bulat's delta but t doesn't help that much.

  7. #7
    Member
    Join Date
    Mar 2013
    Location
    Berlin
    Posts
    50
    Thanks
    16
    Thanked 79 Times in 33 Posts
    An important feature of many modern in-memory-databases is that data is always compressed, even when the DB system is running.
    While a typical SQL dump is row-by-row (which doesn't compress very well), they store data column-wise.

    A short blog posting about HANA (SAP's database): https://blogs.sap.com/2018/12/02/hana-data-compression-algorithms-dictionary-encoding/
    They just some dictionary encoding, prefix encoding, run-length encoding and cluster encoding.

    You could easily write a script that dumps/exports your SQLite database column-by-column instead of row-by-row and try the usual compression tools.

  8. #8
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    517
    Thanks
    25
    Thanked 45 Times in 37 Posts
    Which exactly db?
    For example in mariadb you can compress innodb tablespace with lz4, lzo, zlib and others at page level.
    Yours focus is on sqllite?

  9. #9
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    573
    Thanks
    245
    Thanked 98 Times in 77 Posts
    Quote Originally Posted by fcorbelli View Post
    Which exactly db?
    For example in mariadb you can compress innodb tablespace with lz4, lzo, zlib and others at page level.
    Yours focus is on sqllite?
    Yes, I just thought there was maybe a preprocessor written for databases in general. My focus is on sqlite, but not exclusively.
    Thanks for the info, but I'm looking for a stronger compressor, above lzma levels, and solid. The idea is to archive the database offline, it doesn't need to be live.

  10. #10
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    517
    Thanks
    25
    Thanked 45 Times in 37 Posts
    Quote Originally Posted by Gonzalo View Post
    Yes, I just thought there was maybe a preprocessor written for databases in general. My focus is on sqlite, but not exclusively.
    Thanks for the info, but I'm looking for a stronger compressor, above lzma levels, and solid. The idea is to archive the database offline, it doesn't need to be live.

    The compression of a database in use is very different from that of an unused one.
    In the first case, for example MariaDB, the pages, even if compressed, are written entirely on disk.
    If a page with 4KB of data is compressed, let's say, in 3KB, 3KB + 1KB of zeros will be written (at least today).
    Clearly, filesystems with sparse management are preferred, but this leads to problems of "holes" which can be very numerous.
    Therefore it is not that simple to have a system with high efficiency (high compression ratio).

    There are cases of row compression instead of page, but the efficiency is understandably very low.

    There is a much higher margin, typically, in indexes.
    They, in general, are highly compressible.
    But, in common use, they are mainly kept in RAM (except for really large databases) thus mitigating the advantage of compression: they must be accessed at the highest possible speed, rather than with great space efficiency

    SQLLite has a very ... crude but complex format, due from its birth as little more than a .dbf file, then gradually evolved until today

    https://www.sqlite.org/fileformat.html


    I hardly ever use it, except in whatsapp chats, and I have not deepen the topic of its compression

  11. #11
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    573
    Thanks
    245
    Thanked 98 Times in 77 Posts
    I understand now why razor had such great performance. Maybe it just happened to correctly guess where the indexes were. In the particular db of my tests, there are also other ids as alphanumeric fields that can be considered a number too, and part of a sequence.

    I guess there is a lot of room for improvement here for a theoretical precompressor or an specialized compressor. Databases are usually full of timestamps and dates too. String or VARCHAR fields can be compressed very well if they are handled in the same context, especially those with names or email directions in them. base64 encoding can be reversed, etc.

Similar Threads

  1. Text-ish preprocessors
    By Gonzalo in forum Data Compression
    Replies: 25
    Last Post: 24th November 2019, 14:25
  2. Replies: 25
    Last Post: 19th March 2018, 08:14
  3. Some of my toy compressors
    By RichSelian in forum Data Compression
    Replies: 27
    Last Post: 6th October 2011, 06:09
  4. Preprocessors and filters and Nanozip??
    By kampaster in forum Data Compression
    Replies: 18
    Last Post: 9th July 2010, 21:42
  5. Replies: 33
    Last Post: 24th October 2007, 13:39

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •