Can Block Cipher ( n bits input n bits output ) ever produces different distributions of symbols from input distribution of symbols ?
Can Block Cipher ( n bits input n bits output ) ever produces different distributions of symbols from input distribution of symbols ?
Entropy coder (accurate: AC/ANS), if replacing encoder with decoder, can transform a looking random sequence of bits (e.g. encrypted data) into a sequence of symbols/bits of practically any chosen statistical model ... not making it easier to decrypt.
Something like that can be done for steganography/watermarking ( https://en.wikipedia.org/wiki/Steganography ) applications, e.g. to encrypt data as QR code (0/1 sequence) resembling a chosen image (grayness of pixel determines probability of using '1' in code for this pixel): http://ieeexplore.ieee.org/xpl/login...number=7348695
Examples of such codes obtained with https://github.com/JarekDuda/PRcodes/ , rate is the number of hidden bits/pixel:
Mike (22nd March 2020)
It would appear input symbols of 49 1s 51 0s would not possible encrypt into 100 bits of 4
eg40 1s 60 0s ?
any sane cipher produces white nois with equal distribution of byte values
Sure, enforcing some distribution (constraints e.g. statistical) on encoding sequence is at cost of reduced information content - "rate" in example above saying how many bits/constrained bit can we hide there.
Theoretical limit is just Shannon entropy: enforcing p probability of '1' in sequence of 0/1, we can store at most h(p) = -p lg(p) - (1-p)lg(1-p) bits/value.
It is obtained with entropy coder, but requiring that both encoder and decoder know the statistical model (e.g. 'p' here, image to resemble for QR code).
This requirement can be weakened to only encoder knowing the statistical model ('p' or image here), still approaching the same theoretical boundary (at higher computational cost for encoder) - generalization of Kuznetsov-Tsybakov problem in github and paper above.
In simple few words how would encoder needs to encode ( knowing the defective cells ) so decoder can similar correct decode here ( not knowing the defective cells ) ?
This is Kuznetsov-Tsybakov problem, you can see it as that we have some 'freedom bits' chosen to satisfy the constraint, and some complex bijection T:
encoder searches for 'freedom bits' such that
T(encoded bits|freedom bits) satisfies the constraints, then decoder just inverts T and discards 'freedom bits'.
It would reach the theoretical limit, but requiring exponential computation cost ... however, approximations turn out quite good, see the paper.
Supposing of 49 1s 51 0s input , 29 1s 21 0s in 1st 50 bits & 20 1s 30 0s in 2nd 50 bits ....will block cipher always produce white noise e.g. 24 1s 26 0s in 1st 50 bits & e.g. 25 1s 25 0s in 2nd 50 bits ?
Does Block Cipher always produce exact same 49 vs 51 output symbols frequencies ?
If so , would input's existing 2-bits-unit symbol frequencies ( ie 00s 01s 10s 11s) be ALSO exact same preserved in Cipher Block's output of the 49 vs 51s single bits symbol frequencies ?
Last edited by LawCounsels; 22nd March 2020 at 18:53.
I think that the widely used encryption algorithms tend to make the output distribution near uniform, effectively prohibiting compression. They are also as unpredictable as possible, because predictable encryption algorithms are by definition weak. Therefore you cannot rely on any particular structure of encrypted file.
Do you have URL s to such Block Cipher .exe ( simplest ones , N input N output ) to quick easy verify this so ( produces white noise binary output distributions) for any skewed input N bits
It had seemed theoretic that Block Cipher could only maintain exact same binary output symbol frequencies as binary input symbol frequencies . Suggest one such white noise output symbols distribution simple Block Cipher scheme
It had also seemed the original input's 2-bits-units symbols frequencies is NOT exact preserved, except only so far as falling within the general usual fluctuation chances dictated by the exact same output binary symbol distributions preserved
Last edited by LawCounsels; 23rd March 2020 at 02:06.
Yes, standard ciphers should produce looking completely random sequence of bits: i.i.d. Pr(0)=Pr(1)=1/2, carrying 1 bit/bit (rate=1).
But if needed, e.g. its output can be transformed into sequence of any chosen statistical model (using reversed entropy coding or Kuznetsov-Tsybakov) - not making it easier to decrypt, at cost of longer encoding sequence: the original length has to be divided by rate.
Yes
But NOT applicable when N bits input N bits output strictly
In N constrained bits you can hide N/rate bits of information, where rate is average Shannon entropy of these constraints.
If you want to store N bits of information in N bits, then there is no place for constraints, you can only apply some 2^N -> 2^N bijection on them.
Can there even be one simplest N bits to N bits Block Cipher .exe url link , where the output symbol frequencies does not EXACT preserve input symbol frequencies ?
N to N bit cipher is a bijection - for every input sequence there exists exactly one output sequence.
Hence there exists e.g. input leading to output being all zeros, so YES - output frequencies can be very different from input.
However, these are relatively rare cases, on average and typically ( https://en.wikipedia.org/wiki/Typical_set ) they have the same i.i.d. Pr(0)=Pr(1)=1/2 frequencies.
Yes very rare exceptions
If input symbols ratio is 30 vs 70 ( or 50 vs 50 ) , then overall in general
Output symbol ratios in general will be exact same 30 vs 70 ( or 50 vs 50 )