(first post reserved for topic heading)
(first post reserved for topic heading)
Sometime ago i've started project of rewriting FreeArc from scratch. Now i'm approaching the point of first public alpha, so i will briefly describe its current state. Improvements over existing FreeArc:
- 64-bit versions for Windows and Linux
- global deduplication a-la ZPAQ (but without archive updating/generations)
- Lua-programmable program options
Main drawbacks:
- incompatible archive format (it will be fixed only in the next version)
- archive updating and lots of other features aren't yet implemented (help screen shows the features already implemented)
The current agenda:
- april: will be released soon
- may: FreeArc archive fromat, automatic filetype detection based on contents
- june: simple GUI (a-la HtmlArc)
- july: not determied yet, but probably archive updating/generations
Now you can stare at program docs and fix my dirty EnglishEditing of the Wiki should be available for anyone with GitHub account, but if it doesn't work, i can add you to the list of project contributors.
avitar (11th April 2015),Euph0ria (22nd August 2015),load (11th April 2015),Mike (11th April 2015),Nania Francesco (11th April 2015),Stephan Busch (12th April 2015),Surfer (11th April 2015),surfersat (13th April 2015)
Why rewrite?
Still Haskell + C++?
Any chance for a library?
please do not use advanced hash but slower if not necessary.
sha1 is much better than sha256 for example because twice as fast. md5 even batter
and make a fast extraction of single file.
finally a very fast list function
in other words... zpaq ++![]()
Last edited by fcorbelli; 11th April 2015 at 15:53.
yes, my initial idea with FB was a zpaq++, but i changed it to implement freearc++ toohashing in FA is already pretty fast, 500 MB/s on i7-4770, and it can be further improved using better SHA implementation. SHA1 and MD5 aren't recommnded by NIST for new programs as current cyptoanalysis can break MD5 and close to break SHA1, but for those who don't believe in cryptography i can provide alternative hashing methods with speed of 20 GB/s
about fast adding/listing/extraction - i especially care about scenaries with millions of files and gigabytes of data. in particular, i plan to provide alternative archive format that allows faster handling of archives with millions of files. but ATM i yet implement more basic things
technically, it's already a library. at least i taked into account all problems i had with old freearc code: all errors are handled with exceptions, no global vars, all user interaction is performed in the single, replaceable module. but looking at the lzturbo story, i've changed my mind about open-sourceness. the same story holds for dll - it will allow anyone from china incorporate freearс compression into his own program, name himself gmbh and start selling his program. if you have idea how to distribute dll without risks of being selled without my control - i'm all ears. if you just need a .lib library for your closed-source project - drop me a mail
i rewrote program because it's 10 year old and there are lot of places that need rewrite. i believe that programs should be rewritten from scratch every few years in order to keep them tight, so it even was too late. It was started as FB, with haskell and C++ code, but finally i switched to C++14 and Boost. i believe it's more portable than haskell
I assume that you are interested in getting some profits from your code. Several times I have been asked for recommendation of the best compressor for a specific type of data and given bandwidth requirements.
Once the best compressors from my experiments were NanoZip, FreeArc, CCM, and CMM. The company resigned of closed-source programs, because they have bigger possibility of messy and buggy code and FreeArc won.
IMHO it's not a problem. If your code is not available they will take 7-zip or another compressor. Do you really lose anything? Maybe one day they will have to legalize their source codes, and will buy license from you.
Last edited by inikep; 12th April 2015 at 13:01.
it may be my bad english. i don't plan to make it open-sourced nor produce dll because i fear that source/dll will be stealed and distributed outside of my control
i don't get any profit from that
inikep, if you believe yourself, why don't you spread all your software as open-source and free?
All my significant programs are open-source and can be downloaded from http://pskibinski.pl/. Programs that I wrote later are fully owned by my employer with a transfer of copyrights.
There is an additional advantage of open-source programs, people can help you find bugs and introduce some improvements. For example I was working with you on 4x4/tornado issues in 2009. I didn't know that the company didn't buy a license from you. I think they chose another compressor.
Maybe he stole your code and he build lzturbo, but I'm sure that currently lzturbo doesn't use your code almost at all. For example lzturbo -29 (bytewise encoding) gives much better compression than tornado. Actually I have working decompressor for lzturbo -29. This strength of compression can be achieved only with parser similar to lzma (which is better than tornado's).
Last edited by inikep; 12th April 2015 at 21:44.
ps back in old days late 80 there is a gentleman agreement to report the original creator.
today... not so common
i suggest a couple of features
first: alphanumeric notes when adding. in future extract of version via name, not sequential number
second: append only format. this make very easy and fast rsync
third: stdio support
sorry for my terrible language but sometimes i use a smartphone with italian keyboard that do not like very much english![]()
And, as you are changing the format, maybe replace the outdated Huffman to catch up with lzturbo: http://encode.su/threads/2017-LzTurb...ll=1#post39815
is it fast enough? but current version can extract only entire archiveCode:C:\FB>timer fa.exe create m:\a c:\ d:\ e:\ z:\ --no-data --no-warnings -ds Scanning: 7,532,018,850,656 bytes in 328,525 folders and 2,700,795 files (RAM 204 MiB, cpu 2.964 sec, real 20.065 sec) Archive directory: 153,917,518 bytes Kernel Time = 16.692 = 00:00:16.692 = 82% User Time = 3.120 = 00:00:03.120 = 15% Process Time = 19.812 = 00:00:19.812 = 97% Global Time = 20.281 = 00:00:20.281 = 100% C:\FB>timer fa.exe l m:\a |tail 172,586,496 2011-02-22 03:39:35 .A.... z:\vs2010\sp1\VS10sp1-KB983509.msp =============================================================================== 7,531,763,218,119 => 0 bytes in 328,546 folders and 2,700,803 files Kernel Time = 0.421 = 00:00:00.421 = 7% User Time = 4.929 = 00:00:04.929 = 91% Process Time = 5.350 = 00:00:05.350 = 99% Global Time = 5.383 = 00:00:05.383 = 100% C:\FB>timer fa.exe l m:\a -nNO-SUCH-FILE |tail =============================================================================== 0 => 0 bytes in 328,546 folders and 0 files Kernel Time = 0.312 = 00:00:00.312 = 13% User Time = 1.887 = 00:00:01.887 = 84% Process Time = 2.199 = 00:00:02.199 = 97% Global Time = 2.247 = 00:00:02.247 = 100%
Last edited by Bulat Ziganshin; 11th April 2015 at 16:51.
Skymmer, I have got used to the fact that data compression specialists are literally Huffman worshipers (see e.g. http://pages.cs.brandeis.edu/~dcc/Pr...rogram2015.pdf ) - because they have written dozens of papers/compressors on it ... but now it just means wasting both space and time ( http://encode.su/threads/1920-In-mem...entropy-coders ).
I see there is an issue understanding ANS - I could help, there are also descriptions of a few other persons, or one can just use e.g. FSE ... changing the format is the perfect moment for the upgrade.
avitar (11th April 2015)
..
Last edited by JangoFatXL; 11th April 2015 at 22:45.
Jarek, forum frequenters like me know about ANS very well. but freearc is an archiver, not a compressor program, and it relies on lzma, ppmd and other existing algorithms. i don't plan to change the archive format and don't have time to work on improving these compression algorithms
I apology, I was thinking about Tornado.
i know md5 and sha1 pretty well, and nist does not handle this case
you do not need to take the fingertips of a ssl certificate, neither exchange DH.
you need something with a very or virtual null collision rate
do not use an hammer if you neeed a needle.
turning on notes i suggest to add alphanumeric text to the version.
notes is almoa useless for tradizional archives.
but here you can use to extract the versions.
suppose you have a codebase of freearc and you want to take all the versions in one single file.
you can do pretty easily with zpaq
but how to extract v27.2 from the archive?
you cannot, because you need to know that say version 3 of the zpaq contain your source code 27.3
as in a database i want a table to associate internal version numbers 1'2'3'4... to ASCII text initial'ok'deployed'newhash
so i can restore say extract - version=deploy
about working with very big file number, in future, i suggest to split data block file from metadata index
so you can have your little database with say a binary tree or whatever
and yes, i' m a senior or better mysql guru, so i like to think in db ways.
about stdio: does not matter extrac, but ccompress.
because is much common to dump a database to ascii and restore direct
Last edited by fcorbelli; 12th April 2015 at 13:08.
http://encode.su/threads/1747-Extrem...ll=1#post34220
> turning on notes i suggest to add alphanumeric text to the version.
ok, i got it
> about working with very big file number, in future, i suggest to split data block file from metadata index
> so you can have your little database with say a binary tree or whatever
yes, we have already discussed it. it may be even my own idea
Hello,
<off topic>Argh! I... Must... Resist...
I can't.
As I am a real bastard, dirty grammar nazi in my native language, could you please use next time "à la" instead of a-la? Thanks a lot.
</off topic>
Anyway, this is a nice job you try to accomplish here! Congratulations!
AiZ
Added Handling the compression method chapter to the docs. Now the docs approached 20kb for this single feature. mein got!
open Source is a nonsense in a non USA/UE world.
so when someone post the source code almost nothing can prevent a commercial closed source rebrand.
this can be bad, but it's true.
so open is about launch and forget, like a sidewinder
someone cares, someone do not.
Well, open-source also means at least theoretical possibility, that if one day the original author disappears or abandon the project for various reasons, there is still a chance for users that someone will take over, not speaking about bugs etc... Of course the chance is bigger with smaller and simpler programs, but still... From a user point of view closed-sourceness is not good news.
But nice to see u work on this nice/usefull program, and the plans. Good luck![]()