# Thread: Block Cipher to produce more skewed or just different distributions of symbols ?

1. ## Block Cipher to produce more skewed or just different distributions of symbols ?

Can Block Cipher ( n bits input n bits output ) ever produces different distributions of symbols from input distribution of symbols ?  Reply With Quote

2. Entropy coder (accurate: AC/ANS), if replacing encoder with decoder, can transform a looking random sequence of bits (e.g. encrypted data) into a sequence of symbols/bits of practically any chosen statistical model ... not making it easier to decrypt.

Something like that can be done for steganography/watermarking ( https://en.wikipedia.org/wiki/Steganography ) applications, e.g. to encrypt data as QR code (0/1 sequence) resembling a chosen image (grayness of pixel determines probability of using '1' in code for this pixel): http://ieeexplore.ieee.org/xpl/login...number=7348695

Examples of such codes obtained with https://github.com/JarekDuda/PRcodes/ , rate is the number of hidden bits/pixel:   Reply With Quote

3. ## Thanks:

Mike (22nd March 2020)

4. It would appear input symbols of 49 1s 51 0s would not possible encrypt into 100 bits of 4
eg40 1s 60 0s ?  Reply With Quote

5. any sane cipher produces white nois with equal distribution of byte values  Reply With Quote

6. Sure, enforcing some distribution (constraints e.g. statistical) on encoding sequence is at cost of reduced information content - "rate" in example above saying how many bits/constrained bit can we hide there.

Theoretical limit is just Shannon entropy: enforcing p probability of '1' in sequence of 0/1, we can store at most h(p) = -p lg(p) - (1-p)lg(1-p) bits/value.
It is obtained with entropy coder, but requiring that both encoder and decoder know the statistical model (e.g. 'p' here, image to resemble for QR code).
This requirement can be weakened to only encoder knowing the statistical model ('p' or image here), still approaching the same theoretical boundary (at higher computational cost for encoder) - generalization of Kuznetsov-Tsybakov problem in github and paper above.  Reply With Quote

7. In simple few words how would encoder needs to encode ( knowing the defective cells ) so decoder can similar correct decode here ( not knowing the defective cells ) ?  Reply With Quote

8. This is Kuznetsov-Tsybakov problem, you can see it as that we have some 'freedom bits' chosen to satisfy the constraint, and some complex bijection T:
encoder searches for 'freedom bits' such that
T(encoded bits|freedom bits) satisfies the constraints, then decoder just inverts T and discards 'freedom bits'.

It would reach the theoretical limit, but requiring exponential computation cost ... however, approximations turn out quite good, see the paper.  Reply With Quote

9. Supposing of 49 1s 51 0s input , 29 1s 21 0s in 1st 50 bits & 20 1s 30 0s in 2nd 50 bits ....will block cipher always produce white noise e.g. 24 1s 26 0s in 1st 50 bits & e.g. 25 1s 25 0s in 2nd 50 bits ?

Does Block Cipher always produce exact same 49 vs 51 output symbols frequencies ?
If so , would input's existing 2-bits-unit symbol frequencies ( ie 00s 01s 10s 11s) be ALSO exact same preserved in Cipher Block's output of the 49 vs 51s single bits symbol frequencies ?  Reply With Quote

10. I think that the widely used encryption algorithms tend to make the output distribution near uniform, effectively prohibiting compression. They are also as unpredictable as possible, because predictable encryption algorithms are by definition weak. Therefore you cannot rely on any particular structure of encrypted file.  Reply With Quote

11. Do you have URL s to such Block Cipher .exe ( simplest ones , N input N output ) to quick easy verify this so ( produces white noise binary output distributions) for any skewed input N bits

It had seemed theoretic that Block Cipher could only maintain exact same binary output symbol frequencies as binary input symbol frequencies . Suggest one such white noise output symbols distribution simple Block Cipher scheme

It had also seemed the original input's 2-bits-units symbols frequencies is NOT exact preserved, except only so far as falling within the general usual fluctuation chances dictated by the exact same output binary symbol distributions preserved  Reply With Quote

12. Yes, standard ciphers should produce looking completely random sequence of bits: i.i.d. Pr(0)=Pr(1)=1/2, carrying 1 bit/bit (rate=1).
But if needed, e.g. its output can be transformed into sequence of any chosen statistical model (using reversed entropy coding or Kuznetsov-Tsybakov) - not making it easier to decrypt, at cost of longer encoding sequence: the original length has to be divided by rate.  Reply With Quote

13. Yes

But NOT applicable when N bits input N bits output strictly  Reply With Quote

14. In N constrained bits you can hide N/rate bits of information, where rate is average Shannon entropy of these constraints.

If you want to store N bits of information in N bits, then there is no place for constraints, you can only apply some 2^N -> 2^N bijection on them.  Reply With Quote

15. Can there even be one simplest N bits to N bits Block Cipher .exe url link , where the output symbol frequencies does not EXACT preserve input symbol frequencies ?  Reply With Quote

16. N to N bit cipher is a bijection - for every input sequence there exists exactly one output sequence.

Hence there exists e.g. input leading to output being all zeros, so YES - output frequencies can be very different from input.

However, these are relatively rare cases, on average and typically ( https://en.wikipedia.org/wiki/Typical_set ) they have the same i.i.d. Pr(0)=Pr(1)=1/2 frequencies.  Reply With Quote

17. Yes very rare exceptions

If input symbols ratio is 30 vs 70 ( or 50 vs 50 ) , then overall in general
Output symbol ratios in general will be exact same 30 vs 70 ( or 50 vs 50 )  Reply With Quote

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•