Results 1 to 19 of 19

Thread: C as default for data compression?; Disturbing deletion of posts here

  1. #1
    Member JamesWasil's Avatar
    Join Date
    Dec 2017
    Location
    Arizona
    Posts
    88
    Thanks
    89
    Thanked 16 Times in 15 Posts

    C as default for data compression?; Disturbing deletion of posts here

    I noticed that the post here about whether C++ is considered the official language for data compression, considering the majority of code and pseudocode even is rendered as that for nearly all references, was deleted and vanished about a week after I posted it to ask how people felt about it, whether it should be considering Java is as popular or more popular than C, etc. A few people responded and said they felt it was more for speed consideration to stick to C, however if that was the case, wouldn't raw x86 assembly be best then, since most here are posting windows binaries?

    Do I have to worry about honest questions getting deleted or hit with facebook-like censorship on that note? If yes, then I guess there are other forums where that won't happen. Your thoughts and ideas are still welcome, even if this gets deleted again.

  2. #2
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,495
    Thanks
    26
    Thanked 131 Times in 101 Posts
    I think someone suggested that there was a database backup restore, so that's why some recent posts were gone. Hosting for this forum doesn't seem to be very stable (but anyway it's awesome this forum exists), so such thing could happen.

    OTOH some time ago I've requested deletion of a thread about (IIRC) Ukraine, because I started to feel uneasy about my posts. My posts were quoted by others within the thread so deleting my posts wouldn't erase them completely. I think that such scenario is less probable in this case than backup restore. Anyway, censorship seems very unlikely, especially when it's about technical discussion.

  3. #3
    Programmer michael maniscalco's Avatar
    Join Date
    Apr 2007
    Location
    Boston, Massachusetts, USA
    Posts
    140
    Thanks
    26
    Thanked 94 Times in 31 Posts
    I'm not sure what the cause was but the loss appeared to be systemic as all post on the main forum (for that same time period) disappeared as well. I seriously doubt there was any foul play involved. I had replied to this topic back then but this was just prior to the loss so that reply disappeared very quickly as well.

    C++ is the language of choice for any seriously high performance application and not just data compression. For instance, my field is HFT where nanoseconds make all the difference. Any language which has non deterministic behavior such as garbage collection would never be suitable for this type of work because consistency is crucial. Unexpected delays can cause bad trades and cause the loss of large amounts of money in a very short amount of time simply because the information is stale by a few microseconds.

    C++ has huge advantages over assembly for many reasons. Maintainability, superior compilers, abstraction in the form of templates, pre-compile optimizations such as template meta programming, constexpr etc.

    In the end though, I personally write everything in C++ simply because it's my native language. (^:
    Last edited by michael maniscalco; 21st November 2018 at 18:06. Reason: clarification of language

  4. #4
      webmaster's Avatar
    Join Date
    Jun 2010
    Location
    Saint-Petersburg, Russia
    Posts
    70
    Thanks
    12
    Thanked 53 Times in 24 Posts
    This server has a big hardware issue. Forum tomorrow will be moved to the new server. The forum database was automatically restored from the full backup. I apologize for the lost messages.

  5. Thanks (3):

    encode (22nd November 2018),Mike (22nd November 2018),schnaader (22nd November 2018)

  6. #5
      webmaster's Avatar
    Join Date
    Jun 2010
    Location
    Saint-Petersburg, Russia
    Posts
    70
    Thanks
    12
    Thanked 53 Times in 24 Posts
    We are on a new server.

  7. Thanks (5):

    anormal (3rd December 2018),encode (22nd November 2018),hunman (23rd November 2018),Mike (22nd November 2018),snowcat (26th November 2018)

  8. #6
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,569
    Thanks
    777
    Thanked 687 Times in 372 Posts
    webmaster, look at https://encode.su/threads/1739-BCM-T...ll=1#post58687

    and while we are here, can you pease add .arc ti the list of allowed attach extensions, as well as increase max.attach size to 25 MB?

  9. #7
      webmaster's Avatar
    Join Date
    Jun 2010
    Location
    Saint-Petersburg, Russia
    Posts
    70
    Thanks
    12
    Thanked 53 Times in 24 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    thx. fixed.

    Quote Originally Posted by Bulat Ziganshin View Post
    and while we are here, can you pease add .arc ti the list of allowed attach extensions, as well as increase max.attach size to 25 MB?
    done. check, plz.

  10. Thanks (3):

    anormal (3rd December 2018),Bulat Ziganshin (27th November 2018),Mike (25th November 2018)

  11. #8
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts
    Quote Originally Posted by JamesWasil View Post
    I noticed that the post here about whether C++ is considered the official language for data compression, considering the majority of code and pseudocode even is rendered as that for nearly all references, was deleted and vanished about a week after I posted it to ask how people felt about it, whether it should be considering Java is as popular or more popular than C, etc. A few people responded and said they felt it was more for speed consideration to stick to C, however if that was the case, wouldn't raw x86 assembly be best then, since most here are posting windows binaries?

    Do I have to worry about honest questions getting deleted or hit with facebook-like censorship on that note? If yes, then I guess there are other forums where that won't happen. Your thoughts and ideas are still welcome, even if this gets deleted again.
    I'm working on a fundamentally new programming language, that would be a C/C++ replacement, so this is an interesting topic for me. C and C++ (and Visual C++) are the standard right now. Maybe C especially, because I've seen cases where a new codec had been written in C++ and had to be rewritten in C to gain acceptance. (This is probably due to pressure from Unix/Linux camps.)

    Java isn't C/C++'s opposite number here. The new competition is from Rust and Go. Java can't give you the performance and determinism that C/C++ and Rust can. Go can give almost as much performance and almost as much determinism, since the GC latency is impressively low.

    (My language will be designed scientifically, based on research, to elicit more interest in programming, while beating C in performance. The whole effort is based on the observation that all popular programming languages in our timeline are terrible, with hardly any design insight or research basis, and that they deter millions of smart people from pursuing programming careers, especially women.)

  12. #9
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,495
    Thanks
    26
    Thanked 131 Times in 101 Posts
    Java isn't C/C++'s opposite number here. The new competition is from Rust and Go. Java can't give you the performance and determinism that C/C++ and Rust can. Go can give almost as much performance and almost as much determinism, since the GC latency is impressively low.
    Java has already at least a few low latency GCs to choose from:
    - Azul C4 GC, which is commercial, highly optimized and was available for many years already
    - ZGC https://wiki.openjdk.java.net/display/zgc/Main which is included in Java 11
    - Shenandoah https://wiki.openjdk.java.net/display/shenandoah/Main which is going to be included in Java 12

    AFAIK at least Shenandoah GC works on Windows.

    Also HFT can be done in Java: https://www.youtube.com/watch?v=iINk7x44MmM

    JVM-based sort algorithms can win in efficiency metrics. Look at http://sortbenchmark.org/ . http://sortbenchmark.org/NADSort2016.pdf sorts 100TB of data for 144$. If you look into the paper then you'll see that this sort was based on Apache Spark which itself is written mostly in Scala.

    Java guys are working on reducing overhead of calling to native code and implementing value types, including vectors (SIMD) types with corresponding instructions that will map well to low level instructions (like SSE, AVX, etc). These are goals of two (highly connected) projects: http://openjdk.java.net/projects/panama/ and http://openjdk.java.net/projects/valhalla/ Expect that the performance gap between pure Java code and pure C++ code will be getting significantly smaller.

    Despite already implemented low-latency GCs and future low-level optimizations (SIMD instructions, value types, etc), I don't think Java will replace C as a language to write libraries used by everyone. Every popular language can talk with libraries that expose C API, but talking with Java libraries is usually reserved to other Java-based code.

  13. #10
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts
    Quote Originally Posted by Piotr Tarsa View Post
    Java has already at least a few low latency GCs to choose from:
    - Azul C4 GC, which is commercial, highly optimized and was available for many years already
    - ZGC https://wiki.openjdk.java.net/display/zgc/Main which is included in Java 11
    - Shenandoah https://wiki.openjdk.java.net/display/shenandoah/Main which is going to be included in Java 12

    AFAIK at least Shenandoah GC works on Windows.

    Java guys are working on reducing overhead of calling to native code and implementing value types, including vectors (SIMD) types with corresponding instructions that will map well to low level instructions (like SSE, AVX, etc). These are goals of two (highly connected) projects: http://openjdk.java.net/projects/panama/ and http://openjdk.java.net/projects/valhalla/ Expect that the performance gap between pure Java code and pure C++ code will be getting significantly smaller.

    Despite already implemented low-latency GCs and future low-level optimizations (SIMD instructions, value types, etc), I don't think Java will replace C as a language to write libraries used by everyone. Every popular language can talk with libraries that expose C API, but talking with Java libraries is usually reserved to other Java-based code.
    Thanks, I didn't know about those projects, except for Azul. I thought there was some new sort of compiler called Graal or Grael. Is it precompilation/AOT?

    I was impressed by the Simple Binary Encoding (SBE) that was implemented in Java. I think it was for HFT or just trading in general.

  14. #11
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,495
    Thanks
    26
    Thanked 131 Times in 101 Posts
    GraalVM has a full-fledged website now: www.graalvm.org

    GraalVM has both JIT mode and AOT mode. JIT mode is much more advanced than in standard JVM (HotSpot VM based), but it excels rather in high-level optimizations (devirtualization, partial escape analysis, etc) rather than low-level optimizations (like careful selection of SIMD instructions).

    AOT mode in GraalVM works only for Java code. Peak performance of Java code in AOT mode is unfortunately significantly worse than in JIT mode, but OTOH startup is much faster and there's no warm-up penalty (i.e. JITing lag). For short-lived Java programs AOT is actually a big win. Example: https://sites.google.com/a/athaydes....esonly4mbofram - AOT compiled Java programs (HTTP client and HTTP server) were faster than their native counterparts written in C (curl and Apache Server).

    JIT mode in GraalVM works for many languages, i.e. GraalVM is a polyglot engine. You can run JavaScript, Python, Ruby, LLVM bitcode, Java bytecode, etc on GraalVM JIT. Even regexps can be JITed into efficient native code: https://github.com/oracle/graal/tree/master/regex The downside actually is that warm-up takes a lot of time compared to competing solutions, e.g. it takes orders of magnitude more iterations to get full JavaScript code speed than it takes Google Chrome to do it. Part of the reason is that GraalVM itself is written in Java. IIUC there's ongoing work to AOT compile GraalVM so the warm-up overhead will be reduced.

    One of GraalVM's biggest goals is to be an efficient research platform to implement new languages. With a thing called Truffle API you can implement your language relatively quickly. IIUC it was used to implement the various languages on top of GraalVM. Here is article about implementing your own language on GraalVM: http://www.graalvm.org/docs/graalvm-...ment-language/ With Truffle API your job is to implement lexer and parser for your language and also finally generator of Truffle AST representing the operations in a program. GraalVM together with Truffle will then interpret the generated AST and JIT it into efficient native code. Together with interpreter and JIT you'll also get debugger, profiler, sampler, garbage collector, etc for free. I'm not sure if you can "beat C performance" with GraalVM, but even the LLVM bitcode JIT does reasonably well: https://2016.splashcon.org/event/vmi...-ir-on-truffle (keep in mind that GraalVM sandboxes the code and that adds overhead). GraalVM can be a good starting point to develop your language.

  15. #12
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts
    Quote Originally Posted by Piotr Tarsa View Post

    AOT mode in GraalVM works only for Java code.
    Is this strict? So not even Kotlin can compile?

  16. #13
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts
    Quote Originally Posted by Piotr Tarsa View Post

    JIT mode in GraalVM works for many languages, i.e. GraalVM is a polyglot engine. You can run JavaScript, Python, Ruby, LLVM bitcode, Java bytecode, etc on GraalVM JIT. Even regexps can be JITed into efficient native code: https://github.com/oracle/graal/tree/master/regex The downside actually is that warm-up takes a lot of time compared to competing solutions, e.g. it takes orders of magnitude more iterations to get full JavaScript code speed than it takes Google Chrome to do it. Part of the reason is that GraalVM itself is written in Java. IIUC there's ongoing work to AOT compile GraalVM so the warm-up overhead will be reduced.

    One of GraalVM's biggest goals is to be an efficient research platform to implement new languages. With a thing called Truffle API you can implement your language relatively quickly.
    How do GraalVM and Java handle what C/C++ compilers call Link-Time Optimization (LTO) or Whole Program Optimization? (-flto on gcc and clang) That makes a big difference for a lot of C/C++ programs, much more than -O3.

    I know that the new language Zig does it automatically. It's another precompiled language: https://ziglang.org

  17. #14
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,495
    Thanks
    26
    Thanked 131 Times in 101 Posts
    Correction: native image generation works for Java bytecode (not source code) so AOT compilation for e.g. Kotlin and Scala is possible. Example: https://www.graalvm.org/docs/examples/java-kotlin-aot/ Native image generation is a subject for many limitations https://github.com/oracle/graal/blob...LIMITATIONS.md so that limits the number of frameworks and languages possible for AOT.

    I don't know about LTO, though. OTOH, there's PGO.

  18. #15
    Member
    Join Date
    Mar 2018
    Location
    sun
    Posts
    34
    Thanks
    19
    Thanked 16 Times in 7 Posts
    Quote Originally Posted by SolidComp View Post
    I'm working on a fundamentally new programming language, that would be a C/C++ replacement
    There is only one thing I can think of that would be really needed and thats freshly revamped C. C++ updates, clones or other derrivates already exists and they are mostly not good for many reasons. Here is what C need:

    - abolish "long", "short", "unsigned" etc and use mandatory int8, int16... uint8, uint32, uint64... float8, float16..., ufloat64
    - bool being native part
    - introduce "private" keyword into "struct {}" for variables, but *nothing more*, and certainly no inheritance whatsoever
    - simple(and single) vector/list capability-like STL, nothing more

    Thats it. Some of above is already in C through extra headers but it should be enforced and native. Anything more and you end up with complicated, overbloated garbage like C++, Rust and plenty of others.

  19. #16
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,943
    Thanks
    291
    Thanked 1,286 Times in 728 Posts
    > - abolish "long", "short", "unsigned" etc and use mandatory int8, int16...
    > uint8, uint32, uint64... float8, float16..., ufloat64

    Afaik original idea behind "variable-size" char/short/int/long types was
    to use native platform types for variables (cpu registers etc).
    For example, see http://processors.wiki.ti.com/index...._the_C28x_CPU#

    I think that was a valid idea, although developers got used to
    "int" having at least 32 bits and so lots of code won't work
    eg. when compiled under DOS with 16-bit int.

    > - bool being native part

    As it is, bool is kinda bad for code efficiency.
    On each access compiler has to perform value conversion -
    for example, if we pass "(x>>31)" as bool, it would add something like
    "test al,al; setne al;".

    I'd accept it if it was possible to create bit arrays using bool,
    but its actually char-sized so I can't think of any use for it.

    On other hand, it could be cool to have language support for cpu flags
    (bool would be useful there).
    As it is, x86-32 still has only 8 registers (7 since compilers don't want to use SP)
    and 5+ flags (CF,SF,ZF,PF,OF) remain ignored, although they are anyway updated by cpu for all scalar
    arithmetic instructions and could be used to store bool values
    (and are used in manually written asm code).

    > - introduce "private" keyword into "struct {}" for variables, but *nothing
    > more*, and certainly no inheritance whatsoever

    Sure, if there's another method to add fields of struct1 into struct2.

    As it is, I'm actually relying on inheritance quite a lot (including multiple),
    and even virtual members and virtual inheritance could be useful
    if their implementation was more open and portable.

    So instead, I think that structures need several features to deal
    with modern demands, like https://en.wikipedia.org/wiki/AOS_and_SOA
    and data alignment.

    In fact, I have an idea to write a preprocessor for structure construction,
    which would work kinda like C++, but with more flexibility, like
    sorting fields of the whole inheritance tree based on some priority markup.
    (With standard C++, if some buffer is a part of base class, it would
    remain in the same place in all derived classes, even though there
    could be new important fields deserving to be cached along with ones
    from the base class).

    > Anything more and you end up with complicated, overbloated garbage
    > like C++, Rust and plenty of others.

    Actually specifically C++ is not overbloated at all.
    Its still perfectly possible to compile a 2k exe from C++ source on windows,
    and I don't see why it would be impossible on linux, although I didn't try.
    Of course, STL is pure bloat, but you're not forced to use it,
    and exceptions and RTTI can be disabled in all compilers.

    At the same time, C++ template is a very useful feature for code optimization,
    and even most of C++1x new features are pretty good (constexpr,auto,lambda).

    That's the problem with Rust and Go actually - there may be some benefits,
    but they also add problems (GC, extra strictness with pointer manipulation),
    while their compilers remain much worse than C++ ones.

  20. Thanks (2):

    elit (12th February 2019),encode (13th February 2019)

  21. #17
    Member
    Join Date
    Mar 2018
    Location
    sun
    Posts
    34
    Thanks
    19
    Thanked 16 Times in 7 Posts
    Greetings Shelwien. Re int types:
    Indeed that was the reason. Idea was that "int" will always be native for whatever platform you compile to. So if you compiled for 16bit in old days, int would be 16bit, if on 32bit windows you would get 32bit. But, as is typical in this area it has became brothel. Today for example on 64bit platforms, int often still mean 32bit not 64bit anymore, then you can have things like "long long" and _int64 on same platform etc. "Short" may not be certain what it means anymore(could be 16bit or 32bit) and so on. I know there were for example differences in msys264 vs VS64 in some int types last time I coded. Things have became outright stupid. Today, developers anyway don't think in terms of "what int type I chose to get to native cpu/platform", they simply chose "what size do I need to hold max here". CPU's evolved significantly since then and if someone need that max possible speed, I often see ASM routines, like in srep. Having mandatory and precise types like uint8, int64 etc., knowing you always get that 8bit or 64bit regardless of platform or CPU, and that you get exactly what you *SEE* and you can see it clearly years after, without guessing and confusion, is the most wonderful thing I have encountered in programming over years. I would be willing to give up on any extra C++ "feature" just to have this natively(and as the *only* option to make sure everyone stick to this system) in new language. Besides, programmer surely know what platform (s)he compile to - whether 32bit or 64bit, so using int32/int64 will still get you native-to-cpu type.

    Re bool:
    I may have not clarified it correctly. By "native" I meant "to be part of default language", how is it done internally for the best of speed is not important to me. C++ have it by default thats the point, while in C you need specific header or define macro. Its not a big deal to implement manually, but still..

    Re "private" keyword:
    Idea was that even without needing full classes, most often you may need to at least have some private variable for internal or temporary value hold/calculation within struct. Since you seem to appreciate full benefits of advanced C++ features, in that case you may need C++ or its derrivates, not C. I still think most of that is not needed and even though it may seem helpful at glance, it does put a strain to maintain/remember, not to mention reading other peoples code. In plzip for example, when I forked it to make xnlz, it have some parts pure C++ and others C. With C++ ones I really needed time and effort to digest it. C was fine. Remember why all those C++ OO features like classes etc. propagated in the first place, supposedly to make your code easier to read and organized. That was the reason for classes and C++. Its not the case in reality. And you may say "well just use subset of C++ you appreciate and leave rest", first int's are there still same confusing mess in C++ as in current C anyway and: others won't, meaning every C++ you read from someone else will be different in difficulty. This variation of style will differ on log scale with each new feature, hence why C is great(er) IMO. If you had this enhanced C with like mandatory types and only few known needed popular features(eg. vector, private in struct), that would be it and it still would be simple as C, even cleaner thanks to specific int types. Every person's code including yours after years would be as easy to understand as is C now, unless you really code badly .

    Re overbloated C++:
    I did not meant in binary size as you probably guessed from previous paragraph, indeed above's what I meant .

    Regards

  22. #18
    Member
    Join Date
    Mar 2018
    Location
    sun
    Posts
    34
    Thanks
    19
    Thanked 16 Times in 7 Posts
    Quote Originally Posted by JamesWasil View Post
    wouldn't raw x86 assembly be best then, since most here are posting windows binaries?
    Since C is relatively simple language, it is easier for compiler to produce assembly that will be on par, if not better than your own ASM, unless you are really good. C compiler also tend to be faster(compile time) than C++ one for that reason. As for managed/preproccessed languages like Java etc., on the paper they should be even better than native ones because of JIT advantages etc., again reality is different(just like with C vs C++ usability/organization/easy to read code claims):

    https://benchmarksgame-team.pages.de.../gcc-java.html

  23. #19
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    889
    Thanks
    482
    Thanked 279 Times in 119 Posts
    A relatively under-appreciated property of C is its ABI stability.

    That is, one writes a library in C, it's immediately accessible for any other program, even those written using a different languages.
    This is extremely powerful, and no other language gets there. Not even C++.

    Another well known strength of C is its popularity, which simply guarantees the presence of a compliant compiler (at least C90) on any potential computing target, well beyond the gcc/clang/visual world.
    In term of reach, it's a huge boost.
    C++ gets close to this goal, but as the number of versions continues to grow, the situation becomes difficult for portability (it's not enough to say "C++", it must be more clearly labelled "C++03", or "C++11", etc.)

    As one can see, I've not mentioned speed anywhere yet.
    While it's nice (kind of bare minimum), several competing languages can get it right on this front.
    imho, speed is simply not enough to justify C.

Similar Threads

  1. loseless data compression method for all digital data type
    By rarkyan in forum Random Compression
    Replies: 244
    Last Post: 23rd March 2020, 16:33
  2. selfextracting with default path from registry
    By SvenBent in forum The Off-Topic Lounge
    Replies: 4
    Last Post: 10th May 2013, 12:09
  3. Forum BUG: Posts don't have newlines when using NoScript
    By m^2 in forum The Off-Topic Lounge
    Replies: 6
    Last Post: 24th January 2012, 20:18
  4. RSS feeds to track new posts
    By schngrg in forum The Off-Topic Lounge
    Replies: 0
    Last Post: 28th June 2011, 17:27
  5. Data Compression Evolution
    By encode in forum Forum Archive
    Replies: 3
    Last Post: 11th February 2007, 14:33

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •