Results 1 to 9 of 9

Thread: TurboPFor compression questions

  1. #1
    Member
    Join Date
    Jun 2020
    Location
    Meadowood
    Posts
    6
    Thanks
    2
    Thanked 2 Times in 2 Posts

    TurboPFor compression questions

    I am trying to use TurboPFor for compressing huge data files for a school project. However, I have run into some very basic issues that I can't seem to resolve with just the readme. Any help is appreciated:


    1.What is the difference between ./icapp and ./icbench commands for compression?


    2. After downloading the git and making, running $./icbench yields the error 'No such file or directory'. Also, trying to specifically just $make icbench results in errors on a completely new linux system. What could I be doing wrong?


    3. Other than the readme, what are good resources for learning how to use the software? Specifically, I want to compress a huge file of floating point numbers, do you have any guidance for how to do this(with TurboPFor or any other software that seems better)?


    Thank you for any help

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,942
    Thanks
    291
    Thanked 1,286 Times in 728 Posts
    > 1. What is the difference between ./icapp and ./icbench commands for compression?

    icapp is supposedly the new benchmark framework, while icbench is the old one.
    Current makefile only builds icapp.

    > 2. After downloading the git and making, running $./icbench yields the error 'No such file or directory'.
    > Also, trying to specifically just $make icbench results in errors on a
    > completely new linux system. What could I be doing wrong?

    I checked it and git clone + make seems to successfully build icapp

    > 3. Other than the readme, what are good resources for learning how to use
    > the software? Specifically, I want to compress a huge file of floating
    > point numbers, do you have any guidance for how to do this(with TurboPFor
    > or any other software that seems better)?

    Read the whole readme page at https://github.com/powturbo/TurboPFo...er-Compression

    Note that icapp is a benchmark, its not intended for actual file-to-file compression.
    For actual compression you're supposed to make your own frontend using some of the provided
    header files (like fp.h).

    You can also look at other libraries referenced here: https://github.com/powturbo/TurboPFo...ree/master/ext

  3. Thanks:

    AlexBa (1st July 2020)

  4. #3
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts
    Make isn't installed by default on most Linux distros. You have to install make first before trying to build – that's what your error message sounds like to me. Did you install it?

  5. #4
    Member
    Join Date
    Jun 2020
    Location
    Meadowood
    Posts
    6
    Thanks
    2
    Thanked 2 Times in 2 Posts
    Quote Originally Posted by Shelwien View Post
    > 1. What is the difference between ./icapp and ./icbench commands for compression?

    icapp is supposedly the new benchmark framework, while icbench is the old one.
    Current makefile only builds icapp.

    > 2. After downloading the git and making, running $./icbench yields the error 'No such file or directory'.
    > Also, trying to specifically just $make icbench results in errors on a
    > completely new linux system. What could I be doing wrong?

    I checked it and git clone + make seems to successfully build icapp

    > 3. Other than the readme, what are good resources for learning how to use
    > the software? Specifically, I want to compress a huge file of floating
    > point numbers, do you have any guidance for how to do this(with TurboPFor
    > or any other software that seems better)?

    Read the whole readme page at https://github.com/powturbo/TurboPFo...er-Compression

    Note that icapp is a benchmark, its not intended for actual file-to-file compression.
    For actual compression you're supposed to make your own frontend using some of the provided
    header files (like fp.h).

    You can also look at other libraries referenced here: https://github.com/powturbo/TurboPFo...ree/master/ext
    Thank you for your help. After posting, I did realize that only icapp was built, and that makes sense as a reason why. I'm rereading the readme, but I still feel like a lot of implementation details are lacking(it seems to mainly focus on results). I'm guessing it just assumes more knowledge than I have, so I will have to keep working on that.

    If you would be able to, could you help to provide some guidance on how to adapt files like fp.h into a header file. I want to learn the process, but I feel like the question is still too open ended for me to tackle blindly. My specific requires me to compress massive files of sensor data, with a short snippet of sample data attached below. The main complications I forsee are having the few lines of extra header filler and an inconsistent separator between individual columns. I would want to compress all data to within some small factor like 1e-9.

    Thanks again for everything, I am just a lowly engineer trying to learn the world of cs. People like you make it a lot easier.

    Best,
    Alex
    Attached Files Attached Files

  6. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,942
    Thanks
    291
    Thanked 1,286 Times in 728 Posts
    > If you would be able to, could you help to provide some guidance on how to
    > adapt files like fp.h into a header file.

    It is already a header file with declarations of FP-related functions.

    > My specific requires me to compress massive files of sensor data,
    > with a short snippet of sample data attached below.

    Well, that's text. Just converting it to binary would give you
    compression ratio 3.2: 1536/480 = 3.2

    I made a simple utility for this (see attach), but in this case
    it would be better to write each column to a different file, most likely.

    Also you have to understand that float compression libraries are
    usually intended for _binary_ floats.

    Its also easier to run icapp - just "icapp -Ff out3.bin".
    Attached Files Attached Files

  7. Thanks:

    AlexBa (1st July 2020)

  8. #6
    Member
    Join Date
    Jun 2020
    Location
    Meadowood
    Posts
    6
    Thanks
    2
    Thanked 2 Times in 2 Posts
    Quote Originally Posted by Shelwien View Post
    > If you would be able to, could you help to provide some guidance on how to
    > adapt files like fp.h into a header file.

    It is already a header file with declarations of FP-related functions.

    > My specific requires me to compress massive files of sensor data,
    > with a short snippet of sample data attached below.

    Well, that's text. Just converting it to binary would give you
    compression ratio 3.2: 1536/480 = 3.2

    I made a simple utility for this (see attach), but in this case
    it would be better to write each column to a different file, most likely.

    Also you have to understand that float compression libraries are
    usually intended for _binary_ floats.

    Its also easier to run icapp - just "icapp -Ff out3.bin".
    Thank you so much for your help. This all makes sense. I have done a little testing, and you were right, creating individual binary files from each column results in better compression. I will continue working on code and a header file to run this compression easily. As I'm getting started, I have a few more questions:

    Do you have any advice on resources for learning C++ as a language? I have taken a course with C++ in high school, but that was a few years ago.
    Do you think TurboPFor is the best method for this task? I have researched other compressors(bzip2, zfp, SZ..) and it seems to have the best performance, but I am by no means an expert. I am still planning on implementing other methods for comparison.
    Once I have progressed further, can I break apart the TurboPFor code? Could I take a small amount of the files to run the method that seems to work best from benchmarking or would this result in errors with all of the interdependencies within TurboPFor?

    Thank you for any more help and advice. You have done so much for me!

    Alex

  9. #7
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts
    Shelwien, what does your utility do? Does it just convert numbers from text to binary? What does it do with other text?

    Do you convert floats to IEEE binary?

  10. #8
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,942
    Thanks
    291
    Thanked 1,286 Times in 728 Posts
    > Do you have any advice on resources for learning C++ as a language?

    https://stackoverflow.com/questions/...guide-and-list

    But you don't really need to know everything about C++ syntax and libraries to start programming in it.
    There're reference sites like https://www.cplusplus.com/ so you can always look up specific features.

    Basically just find some open-source project that you like and read the source,
    while looking up things that you don't know.

    > Do you think TurboPFor is the best method for this task?

    If you need very high processing speed, then probably yes.

    > I have researched other compressors(bzip2, zfp, SZ..) and it seems to have
    > the best performance, but I am by no means an expert. I am still planning
    > on implementing other methods for comparison.

    The best compression would be provided by a custom statistical (CM) compressor
    (since there're probably correlations between columns).

    TurboPFor doesn't have any really complex algorithms - its mostly
    just delta (subtracting predicted value from each number) and bitfield
    rearrangement/transposition.

    The main purpose of this library is that it provides efficient SIMD
    implementations of these algorithms for multiple platforms.

    But if gigabytes-per-second speed is not really necessary for you,
    then you can just as well use something else, like self-written delta + zstd.

    > Once I have progressed further, can I break apart the TurboPFor code?
    > Could I take a small amount of the files to run the method that seems
    > to work best from benchmarking or would this result in errors
    > with all of the interdependencies within TurboPFor?

    You can drop some of the files - actually it seems to build a library
    (libic.a) with the relevant part.

    Unfortunately its not very readable due to all the speed optimizations.

  11. #9
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,942
    Thanks
    291
    Thanked 1,286 Times in 728 Posts
    > Shelwien, what does your utility do? Does it just convert numbers from text to binary? What does it do with other text?

    It convert out3.txt to binary and back to text losslessly (except for headers).

    > Do you convert floats to IEEE binary?

    Yes.

Similar Threads

  1. Random questions
    By Trench in forum Data Compression
    Replies: 11
    Last Post: 15th March 2020, 17:04
  2. TurboPFor: Integer Compression
    By dnd in forum Data Compression
    Replies: 50
    Last Post: 15th November 2019, 15:48
  3. Replies: 4
    Last Post: 22nd June 2015, 00:32
  4. Greetings, Questions, and Benchmarks
    By musicdemon in forum Data Compression
    Replies: 4
    Last Post: 8th January 2012, 21:45
  5. Questions about compression
    By 0011110100101001 in forum Data Compression
    Replies: 12
    Last Post: 8th December 2011, 01:31

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •