Page 1 of 2 12 LastLast
Results 1 to 30 of 31

Thread: Side channel information leakage

  1. #1
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,610
    Thanks
    30
    Thanked 65 Times in 47 Posts

    Side channel information leakage

    Well, compression is a major weakness when put before encryption.
    There's inherent information leak in the form of compression ratio. Timing attacks work too.
    I wonder if there's some work to mitigate them.

    Bulat's branchless entropy codec looks like a full solution to the second for of leakage, though alone it will produce rather bad compression ratio. Has anyone considered branchless modelling?

    Is there any research known to you on reducing the first form of leakage? Randomised compression?

  2. #2
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,572
    Thanks
    780
    Thanked 687 Times in 372 Posts
    1. it was not an entropy coder of any kind but putbits() implementation
    2. i think that since compression ratio affects timing, this problem is just unsolvable

  3. #3
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,610
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    1. it was not an entropy coder of any kind but putbits() implementation
    Oh, I haven't took time to look into the thread deeper than surface. Thanks for clarification.
    Quote Originally Posted by Bulat Ziganshin View Post
    2. i think that since compression ratio affects timing, this problem is just unsolvable
    The timing problem is trivial to solve in a dumb way - estimate the worst-case time, measure actual time and wait until the worst time passes. But, obviously, it's not a great solution.
    But binding between ratio and time? There are multiple bindings; some make coded faster on compressible data and some make it slower.
    The only part that's inherent to output size is what we mentioned already - putbits ( / getbits). And for many codecs that's a small part of total compression time.

    ADDED: And is it inherent really? If it had const time per symbol and we always created the same number of symbols, the total time would be const too.

  4. #4
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    889
    Thanks
    483
    Thanked 279 Times in 119 Posts
    The model in HPACK seems to be specially dedicated to avoid such attacks.
    And from an external look, it seems it's getting reasonably close to its objective.
    In particular, there is no "cross field" compression possible, so no way to get information from a field by modifying another one.

    The cost is a reduced compression performance, but it's a secondary target...

  5. #5
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,497
    Thanks
    26
    Thanked 132 Times in 102 Posts
    I think that:
    - any form of static model eliminates the cross field compression,
    - some fast compressors can be made branchless, eg SHARC's chameleon encoding algorithm (ie the one without predictions); if we shrink the chunk table to fit in L1 cache then we should have constant speed compressor,
    - also CM with all data structures fitting in L1 should be capable of being constant speed, but I wonder what would be the efficiency,
    - BWT's and LZ's speed always depends on compressibility,
    - Shindler Transform's speed should be constant when everything fits in L1 cache; after that we can do FSE coding which also has constant speed,
    - but ST's isn't well suited for being static model (same for BWT and LZ),
    - OTOH there's no problem with making SHARC's chameleon algorithm work with static model,

    Or maybe I don't understand fully the problems presented in this thread
    Last edited by Piotr Tarsa; 21st January 2014 at 21:55.

  6. #6
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,572
    Thanks
    780
    Thanked 687 Times in 372 Posts
    SHARC without static model may suffer from dictionary read-after-write delay that depending on CPU may be less or more than pure L1 cache read. also, it may suffer from overlapped writes delay, considering the writing of compressed data

  7. #7
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    What are the scenarios where a timing attack would be a concern for an encrypted archive?

    ZPAQ encryption provides privacy (but not authentication) for an archive sitting on a disk or a website. It is not designed to provide protection from an attacker who has access to the computer (ability to run programs) where zpaq is reading or writing the archive. Are there any file encryption products with this type of security?

  8. #8
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,497
    Thanks
    26
    Thanked 132 Times in 102 Posts
    Please explain how read-after-write delays could affect performance. From what I understand the delays appear in situations where a read is of bigger size than write, but in SHARC's case reads and writes have equal size.

    As to overlapped writes - the writes could be buffered in registers and dumped in branchless manner (ie dumped also when unnecessary).

  9. #9
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,572
    Thanks
    780
    Thanked 687 Times in 372 Posts
    when the following code executes:

    dict[x] := a
    ...
    b := dict[y]

    the second operation has 4 ticks delay when x and y are different but when they are the same, result of write may be forwarded to the read operation. so if forwarded data are delivered in less than 4 or more than 4 clocks, the execution time depends on the data


    it seems that you are right about write buffering

  10. #10
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,497
    Thanks
    26
    Thanked 132 Times in 102 Posts
    If that's the case then we can eg use 2 dictionaries: one for reading and one for writing. Then regularly swap their roles so that we have adaptation. That would further decrease the compression ratio but would mostly eliminate store forwarding.


    I wonder, however, how the read-after-write delays could reveal any information. Those delays would happen irrespectively of any data futher than one chunk away.



    Matt:
    Well, suppose that Deflate algorithm is faster when we have long matches. Then if attacker injects various chunks of data and observes than for some chunks the response time is quicker than for other chunks then he can assume that the data contains more matches with the chunks that lead to faster respone - and that's without decrypting the response. That's what I understood.

    Edit:
    I've just realized: the common matches could also be determined by increased compression ratio. So the timing attack become more a mystery
    Last edited by Piotr Tarsa; 22nd January 2014 at 00:08.

  11. #11
    Member
    Join Date
    Jun 2013
    Location
    USA
    Posts
    98
    Thanks
    4
    Thanked 14 Times in 12 Posts
    Isn't this mainly a problem with encryption of streams going over the internet? CRIME for example has to do with HTTP/S.

    Encryption of compressed data locally is probably fine. Timing attacks could be exploited if malware was monitoring CPU usage and whatnot but at that point, there are bigger things to worry about.

    You'd also need to perform a statistical attack on the compression ratio but that depends on knowing more or less what the compressed data is and how much of it.

    Actually a way to mitigate the first concern is to use AES-NI as it has no software timing attacks and is faster. edit: Another is to use a cipher like Salsa20(or ChaCha20 if speed is important) which is secure against timing attacks even in software.

  12. #12
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,610
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Piotr Tarsa View Post
    I've just realized: the common matches could also be determined by increased compression ratio. So the timing attack become more a mystery
    Consider 2 strings:
    aaaaaa(...)
    ababab(...)

    They'll be compressed with basically the same ratio.
    But for LZ parser they are quite different.

    Though certainly, compression ratio is the bigger side leak.

  13. #13
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,610
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Mangix View Post
    Isn't this mainly a problem with encryption of streams going over the internet? CRIME for example has to do with HTTP/S.
    Not really. Any case where your encrypted data is visible to an attacker who has some knowledge about the.structure of what's inside. I haven't seen attacks with passive attacker yet, though I believe they are possible.

    Quote Originally Posted by Mangix View Post
    Actually a way to mitigate the first concern is to use AES-NI as it has no software timing attacks and is faster. edit: Another is to use a cipher like Salsa20(or ChaCha20 if speed is important) which is secure against timing attacks even in software.
    No. AES-NI and ChaCha ciphers are resistant to timing attacks. But a combination (compressor+cipher) is not.
    Last edited by m^2; 22nd January 2014 at 06:27.

  14. #14
    Member
    Join Date
    Jun 2013
    Location
    USA
    Posts
    98
    Thanks
    4
    Thanked 14 Times in 12 Posts
    Data at rest should(in theory) be safe. Even if you know the structure, unless the data is being compressed or decompressed, very little can be determined.

    Any cases to the contrary IMO reflect a failure of the encryption system.

  15. #15
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    358
    Thanks
    12
    Thanked 36 Times in 30 Posts
    Quote Originally Posted by m^2 View Post
    Consider 2 strings:
    aaaaaa(...)
    ababab(...)

    They'll be compressed with basically the same ratio.
    But for LZ parser they are quite different.

    Though certainly, compression ratio is the bigger side leak.
    Please explain more.
    Because, AFAIK, time attacks works only with "man-in-the-middle", say during a TLS session, when you can capture (million) of fake ecrypted-compressed packet (in theory).

    Second question: why don not add a "brutal" pseudorandom dummy cycle to "hide" decode time into "blank rumors"?

  16. #16
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    The most important information leaked is who is sending and receiving, when, and how much. This is why the NSA is more interested in phone and email records than their content.

    The second most important fact is that the data is encrypted. It means that they should pay attention to it and the people using it. To protect from this, you should use steganography.

  17. #17
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,610
    Thanks
    30
    Thanked 65 Times in 47 Posts
    There's no theoretical difference between data at rest and data at motion. A record of an attack allows key retrieval just as easily as the attack itself. Input manipulation is being used to make sure that data has certain properties. I haven't seen any such attack yet, but data may have such properties naturally. So in theory, there's no difference. In practice - there's a big one.

    Timing attacks may work in various cases where attacker can observe time that various operations take. It may be f.e. that attacker sends request, which leads to server contacting another machine with some compressed data that depends on attacker input and after having received a reply, answers the attacker. Timing differences may indicate differences in server dialogues. Attacker is on a side, yet gains some information. Actually they may not even need to know the exact data being sent. There used to be a way to tell if a Unix user existed by measuring the times it takes to login to check password. Knowing that a user exists is a good start to a brute-force attack. Compression may be an enabler for different vulnerabilities of this kind.

    Adding noise (i.e. random delays) to a system makes timing attacks harder and in some cases is enough to push boundaries between exploitable and unexploitable. Though sometimes noise removal methods work surprisingly well ( http://www.cs.unc.edu/~reiter/papers/2012/CCS.pdf ).

    Matt - machines can't differentiate between compression and encryption unless they know the specific protocol / format. But yes, you can treat unrecognised, random-like data as suspicious. Chinese do it.
    Last edited by m^2; 23rd January 2014 at 19:16.

  18. #18
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    358
    Thanks
    12
    Thanked 36 Times in 30 Posts
    Quote Originally Posted by m^2 View Post
    There's no theoretical difference between data at rest and data at motion.
    Are you sure?
    AFAIK time attacks relay on measuring different reponse time from "something" during authentication (example: TLS).
    If you don't have something "dinamic" how time attack can be more efficient than brute force?
    I think this is highly overstimated.
    About the same thing during internet man-in-the-middle attack, because with WAN latency is infeasible to gain million of packets, as opposite then WIFI.

    But yes, you can treat unrecognised, random-like data as suspicious. Chinese do it.
    USA and UK too.
    And Italy too for PGP signature in emails.

    On delays: all is about sigma, AFAIK
    Last edited by fcorbelli; 23rd January 2014 at 19:21.

  19. #19
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,610
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by fcorbelli View Post
    Are you sure?
    AFAIK time attacks relay on measuring different reponse time from "something" during authentication (example: TLS).
    If you don't have something "dinamic" how time attack can be more efficient than brute force?
    I think this is highly overstimated.
    About the same thing during internet man-in-the-middle attack, because with WAN latency is infeasible to gain million of packets, as opposite then WIFI.

    USA and UK too.
    And Italy too for PGP signature in emails.

    On delays: all is about sigma, AFAIK
    Oh yeah, I'm talking about 2 very different things w/out making a clear distinction which is which. It is confusing, sorry for that, I will do better.

    w.r.t. the leak introduced by compression ratio, there's no difference. That's what I had in mind when writing that sentence.
    For timing attacks there must be a way to measure time.

  20. #20
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    Here is an example of a specific attack on ZPAQ. Encrypted archives provide privacy but not authentication. The format is AES-256 in CTR mode using a 32 byte salt as prefix. The salt is used to strengthen the password using Scrypt and also as an IV for the counter.

    When listing, extracting, or adding to an archive on an NFS mounted remote drive, only encrypted data is sent over the network in both directions. But because there is no authentication, an attacker with access to the network who can guess any of the bits of the plaintext can flip those bits by injecting packets. An archive normally has a fixed header. Also, each operation (add, list, extract) starts by reading the index (version headers, fragment tables, file names) skipping over the data blocks. This reveals the location of the start of each version, which generally has a fixed or easily guessed header.

    One possibility is to truncate an archive, causing old versions of files to be extracted. If the attacker knows the time that an update occurs, then he also knows the location where the date and time is stored and its value. He can flip these bits to mark the end of the archive, or flip bits in the header to skip over versions. There are other possibilities if he can guess what files are stored.

  21. #21
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,610
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Some thoughts:
    1. Sorting can be done in data-independent time. So can be BWT.
    2. MTF too.
    3. Add RLE to the mix and you're very close from total data independence since RLE is by far the fastest part.
    4. What if RLE copied data also while counting? Just copy the same thing over and over. Speed still depends on data, but much less.
    5. One could replace RLE with ANS to get something much stronger while still quite invariable.

  22. #22
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,610
    Thanks
    30
    Thanked 65 Times in 47 Posts
    I tried idea 4 today. It didn't work.
    But there's something better.
    A fast RLE is limited by memory bandwidth, but can't saturate cache bandwidth (at least I can't on my CPU).
    If we evicted RLE input from the cache, we would make read bandwidth the limiting factor. This should make the RLE speed constant.

  23. #23
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    I'm not sure which question you are answering. Normally you compress before encrypting (because the other way doesn't work), so the compression algorithm should not matter. The issue is using a compressed format that inserts known plaintext (biject will pipe in here )

    About BWT in zpaq, I experimented with BWT+MTF+RLE+CM, BWT+RLE+CM, and BWT+CM. The MTF step always made compression worse with no speed benefit so I dropped it. RLE also makes compression worse but can sometimes save time by reducing the input to the CM stage. zpaq uses BWT+CM with either 1 or 2 components (ICM or ICM+ISSE) but you can use a longer ISSE chain using the advanced options for better compression.

  24. #24
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,610
    Thanks
    30
    Thanked 65 Times in 47 Posts
    I was talking about resistance to timing attacks. Ideally, compression time should be a function of only input size, not of input contents. I'd like to find something that meets this goal or at least is close.

  25. #25
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,497
    Thanks
    26
    Thanked 132 Times in 102 Posts
    Allocate a time budget and periodically check whether you have chance to complete before timeout. If chances go down, switch to faster compression strategies. Or develop compression algorithm with iterative refinement.

    If you're encrypting after compression then remember that encryption also takes time, so as compression ration improves, less data needs to be encrypted.

  26. #26
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    OK I understand about timing attacks. If an attacker with non-root access to a shared server can monitor events like cache timing while your compression algorithm is running, he can guess some plaintext even if he does not have read permission on the input or output. There are similar attacks on AES, so far only theoretical, but it was enough for Intel to consider AES instructions that would resist such attacks.

    Well, I'm not sure it is possible to prevent such attacks, or that it is something we should even try to prevent. I guess it depends on the application. Usually there are other layers where you should apply security instead. If an attacker already has user level access, there are probably lots of other doorways that are easier to open.

  27. #27
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 72 Times in 56 Posts
    Quote Originally Posted by Matt Mahoney View Post
    OK I understand about timing attacks. If an attacker with non-root access to a shared server can monitor events like cache timing while your compression algorithm is running, he can guess some plaintext even if he does not have read permission on the input or output. There are similar attacks on AES, so far only theoretical, but it was enough for Intel to consider AES instructions that would resist such attacks.

    Well, I'm not sure it is possible to prevent such attacks, or that it is something we should even try to prevent.
    It's probably possible, but since it seems to require making code slower, it seems to be at cross purposes with what we do here.
    Last edited by nburns; 29th May 2014 at 18:38.

  28. #28
    Member
    Join Date
    Jun 2013
    Location
    USA
    Posts
    98
    Thanks
    4
    Thanked 14 Times in 12 Posts
    Quote Originally Posted by Matt Mahoney View Post
    There are similar attacks on AES, so far only theoretical, but it was enough for Intel to consider AES instructions that would resist such attacks.

    Well, I'm not sure it is possible to prevent such attacks, or that it is something we should even try to prevent. I guess it depends on the application. Usually there are other layers where you should apply security instead. If an attacker already has user level access, there are probably lots of other doorways that are easier to open.
    Not so theoretical... http://cr.yp.to/antiforgery/cachetiming-20050414.pdf

    As for preventing timing attacks with AES, Peter Schwabe and Emilia Käsper developed a bit-sliced implementation of AES-128 which is immune from timing attacks. Slower than OpenSSL's implementation but still fast: https://eprint.iacr.org/2009/129 . AES-NI also solves this as you mentioned.

  29. Thanks:

    Matt Mahoney (31st May 2014)

  30. #29
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 72 Times in 56 Posts
    Quote Originally Posted by m^2 View Post
    Some thoughts:
    1. Sorting can be done in data-independent time. So can be BWT.
    2. MTF too.
    3. Add RLE to the mix and you're very close from total data independence since RLE is by far the fastest part.
    4. What if RLE copied data also while counting? Just copy the same thing over and over. Speed still depends on data, but much less.
    5. One could replace RLE with ANS to get something much stronger while still quite invariable.
    The way to make sorting data-independent (or, at least, moreso) is to randomly shuffle before sorting, I think. Of course, most of the time that would make it slower. The most popular (AFAIK) MTF algorithm has performance that's tightly bound up with the data. It has very poor behavior in the worst case, but on real-world BWT data, with lots of redundancy, it performs very well. It seems to me that requiring sameness means eliminating all data-dependent optimizations and demanding worst-case behavior always.

    Crypto strength isn't something that you can really measure, so if you're paranoid, what you want is really to feel confident in the strength of your countermeasures. If compression might give you away, then why compress at all? I'd feel much safer just skipping compression rather than taking the word of somebody that a compressor has been made safe.
    Last edited by nburns; 30th May 2014 at 10:48.

  31. #30
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,610
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Optimising algorithm and bitslicing - I like both ideas. Has anyone done a bitsliced compression algo?

    Quote Originally Posted by nburns View Post
    The way to make sorting data-independent (or, at least, moreso) is to randomly shuffle before sorting, I think.
    Not really. People have been fooling randomised qsort based on knowledge of the pseudo-random algorithm. Merge sort and sorting networks are the way to do sorting.

    Quote Originally Posted by nburns View Post
    It seems to me that requiring sameness means eliminating all data-dependent optimizations and demanding worst-case behavior always.
    Indeed. The most classical example is strcmp. You can't quit at the first difference found, you have to keep iterating until the very last character.
    It's not only slow, but also requires you to make sure that your compiler is not smart enough to optimise this out.

    Quote Originally Posted by nburns View Post
    Crypto strength isn't something that you can really measure, so if you're paranoid, what you want is really to feel confident in the strength of your countermeasures. If compression might give you away, then why compress at all? I'd feel much safer just skipping compression rather than taking the word of somebody that a compressor has been made safe.
    Like everywhere, there are tradeoffs. At the most critical level compression may be unsafe both by the means of potentially enabling algorithmic attacks and by increasing complexity and therefore - the risk of bugs or the cost of formal verification. For most encryption users, base64 encoding would work well enough; fear of timing attacks is unfounded. But there's a lot of middle ground where a less leaky algorithm might find its place.

Page 1 of 2 12 LastLast

Similar Threads

  1. Information content of human genome
    By Matt Mahoney in forum The Off-Topic Lounge
    Replies: 52
    Last Post: 17th December 2013, 08:10
  2. Estimating mutual information
    By Matt Mahoney in forum Data Compression
    Replies: 8
    Last Post: 18th February 2013, 00:16
  3. Channel Capacity
    By azizever83 in forum Data Compression
    Replies: 11
    Last Post: 27th May 2012, 16:09
  4. Channel info
    By Shelwien in forum The Off-Topic Lounge
    Replies: 0
    Last Post: 18th August 2010, 18:53
  5. Creating our IRC channel
    By Bulat Ziganshin in forum The Off-Topic Lounge
    Replies: 15
    Last Post: 28th November 2009, 12:16

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •