Its called hash collision, you can get "desired output" from any hash function, simply because it has limited output size, but unlimited input.
Of course, for cryptographic hashes its quite expensive. (See http://shattered.io/)
But any of the "fast hashes" can be cracked.
As to the method, its like this:
1) See how https://github.com/Shelwien/sudoku-cbmc works
2) Replace condition which has to be satisfied to reach "match" label with hash function of your choice
3) Run CBMC
I don't mean brute forcing. The above refs on CRC are about directly calculating the needed data modification.
SAT is not bruteforcing.
https://en.wikipedia.org/wiki/Boolea...bility_problem
As a demo, here I did xxhash64: http://nishi.dreamhosters.com/u/xxhash64_0.rar
Its tricky, because CBMC is not very stable, but its possible.Code:Z:\xxhash64_0>test.bat aaaaaaaaaaaaaaaa sample1 8 bytes 0.828s deadbeef0badf00d sample2 8 bytes 0.813s deadbeef0badf00d sample3 12 bytes charset=[A-z] 10.515s 0123456789abcdef sample4 16 bytes charset=[0-Z] 8670.453s
For example, finding 8 bytes of data for a given hash value without charset limit takes 0.8s here.
DernmIl (4th April 2019)
Interesting, thanks! I'll have to study this.
How would the the real data stream be fed in? (In the middle of which the calculated bytes should be inserted.)
If a given hash algorithm can be solved, does this imply it's solvable for any input data and target hash?
Will solve time remain similar regardless of inputs?
Found this example of using also an SMT solver:
https://blog.lse.epita.fr/articles/2...g-algorit.html
> How would the the real data stream be fed in?
> (In the middle of which the calculated bytes should be inserted.)
CBMC's intended use is something like a fuzzer.
You write whatever algorithm, add some assert directives,
and it finds conditions where these asserts fail.
So in theory you can simply calculate whatever you want
(have to include all the known data in the source though),
use some of CBMC ways to leave some unknown variables,
then let it work... or even just generate a CNF system for C/C++ source
and feed it to an external SAT solver.
But in practice CBMC is quite unstable... for example
I had to get rid of memset,memcpy and _rotl64 in xxhash64 - CBMC
accepts them, but ends up working incorrectly for some reason.
On other hand, its still easier than manually generating
input for a solver.
> If a given hash algorithm can be solved,
> does this imply it's solvable for any input data and target hash?
That depends on specific hash implementation.
Some fast hashes are lossy, so it might be not possible to find data
for a given hash value.
But in general any hash is solvable, only complexity is different.
So for fast hashes with relatively low complexity its much easier
to do than for cryptographic hashes.
> Will solve time remain similar regardless of inputs?
I tried a few xxhash values with 8-byte unlimited input and time seems the same.
But it goes much slower with a limited charset.
Still, in general, the answer is no.
For example, you can make a composite hash which uses either crc64 or xxhash64
based on input parity.
Hash values that are mapped via crc64 would be then faster to crack.
By "target hash" I didn't mean different hash functions. I meant the same hash function, just different desired output values.That depends on specific hash implementation.
Lossy in what sense? Aren't they all?Some fast hashes are lossy
That's the problem.
Assuming solvers run through the whole input multiple times, that's not suitable for large inputs, no? Say 1GB of data.
Suppose the source data is ordered as follows:
[X: a specific piece of data]
[Y: n bytes reserved, fixed length, for modification by the solver]
[Z: a specific piece of data]
In the case of xxhash, I think it should be possible to precalculate X by hashing the bytes, and keeping the internal state instead of the final mixed hash. Then, that could serve as the initial state for solver. Though you might have to assure the length of X is a multiple of the stripe size, which is doable, but less desirable.
But what about Z? If you precalculate the hash for it, wouldn't the input to the solver need to include a way to go from xxh(XY) and xxh(Z) to xxh(XY+Z)?
(related: https://github.com/Cyan4973/xxHash/issues/125)
>> That depends on specific hash implementation.
> By "target hash" I didn't mean different hash functions.
> I meant the same hash function, just different desired output values.
I meant that too.
Sure, in cases that I actually tested, solver time was the same for different hash values.
But specifically for fast hashes its quite possible that some values would be
less frequent (or totally impossible).
>> Some fast hashes are lossy
> Lossy in what sense? Aren't they all?
For datasize<=hashsize a good hash should be lossless.
But for example, suppose a hash contains a step that looks like this: "x-=x>>5;"
It maps 0..0xFFFFFFFF to 0..0xF8000000, so its lossy.
> Suppose the source data is ordered as follows:
> [X: a specific piece of data]
> [Y: n bytes reserved, fixed length, for modification by the solver]
> [Z: a specific piece of data]
> In the case of xxhash, I think it should be possible to precalculate X by
> hashing the bytes, and keeping the internal state instead of the final
> mixed hash. Then, that could serve as the initial state for solver.
Yes
> Though you might have to assure the length of X is a multiple of the stripe
> size, which is doable, but less desirable.
You can just move unaligned part of X to Y.
It would still have to be processed by hash function,
but we can declare it as known data for SAT.
> But what about Z?
> If you precalculate the hash for it, wouldn't the input
> to the solver need to include a way to go from xxh(XY) and xxh(Z) to xxh(XY+Z)?
Yes.
Its possible for crc, but not for most other hashes.
So in general, processing of Y and Z parts has to be included in the SAT script.
There're two known optimizations though:
1) Only steps depending on data from Y have to be performed in SAT script.
So if hash function independently computes a block hash first, then combines
it with accumulated hash value, it means we can skip this part in SAT script
and just provide precomputed block hash value.
This method should be applicable for xxhash, for example.
Actually SAT tools would be usually able to optimize away independent parts on their own,
but it makes sense as a speed optimization too.
2) Sometimes reverse calculation is possible.
That is, since hash value and data are both known, we might be able to
process hash function steps in reverse order.
Its usually possible to do at least for a small number of steps even with cryptohashes.
But for some hash functions it may be quite as well possible to reach Y from the other side.
In particular, its probably possible with xxhash, since steps like these are reversible:
Solve[Mod[11400714785074694791*x,2^64]==1,Integers] -> x=614540362697595703+(2^64)*n
Mod[11400714785074694791*614540362697595703, 2^64] == 1
f[x_] := Mod[x*11400714785074694791 + 9650029242287828579, 2^64]
g[x_] := Mod[x*614540362697595703 + 8565294228366677947, 2^64]
g[f[123]]==123
To avoid manual work it might make sense to resolve this by multiple runs of the same SAT.
That is, we can solve hash(state,data)==value for "state" multiple times
(to avoid feeding GBs of data at once to SAT), until we get "state" for a point close to Y.
If a hash gets 10 in SMHasher, isn't it safe to rule out some output bits never getting used?
But in the current case the problem isn't that the bulk hashing is called thru the SAT solver (it's not going to insert itself in the middle the tight loop), it's that there's a whole lot of bulk hashing to do.1) Only steps depending on data from Y have to be performed in SAT script.
What tool did you use for this Solve[]?In particular, its probably possible with xxhash, since steps like these are reversible:
Solve[Mod[11400714785074694791*x,2^64]==1,Integers] -> x=614540362697595703+(2^64)*n
Mod[11400714785074694791*614540362697595703, 2^64] == 1
f[x_] := Mod[x*11400714785074694791 + 9650029242287828579, 2^64]
g[x_] := Mod[x*614540362697595703 + 8565294228366677947, 2^64]
g[f[123]]==123
Could you elaborate?To avoid manual work it might make sense to resolve this by multiple runs of the same SAT.
That is, we can solve hash(state,data)==value for "state" multiple times
(to avoid feeding GBs of data at once to SAT), until we get "state" for a point close to Y.
DernmIl (19th April 2019)
> If a hash gets 10 in SMHasher, isn't it safe to rule out some output bits never getting used?
Not really.
For example, what would happen if we'd feed a modified cryptohash to SMHasher?
Something like "md5a(x) = (md5(x)==0) ? 1 : md5(x);"
I don't think its possible to notice without actually checking the whole input range,
and statistics would be good otherwise.
> But in the current case the problem isn't that the bulk hashing is called thru the SAT solver
Its actually a problem, because CBMC is not optimized for this kind of tasks.
It theory, adding more operations on constants shouldn't affect SAT system complexity,
but in practice, CBMC is not that efficient.
> What tool did you use for this Solve[]?
Wolfram Mathematica: https://www.wolframalpha.com/input/?...D1,Integers%5D
But you actually just as well can use CBMC for this (see attach).
The trick with byte array is necessary because CBMC won't print the solution otherwise.
> Could you elaborate?
Well, I'm just proposing to split the task.
Suppose we have a hash function:
state' = hash(state,block);
And the system we're trying to solve is something like this:
sX=hash(init_state,X);
sY=hash(sX,Y);
sZ=hash(sY,Z);
Here sX can be simply precalculated.
And sZ,Z are also known, but we normally don't have an inverse function for the hash.
But we still can solve "sZ=hash(sY,Z);" with SAT separately, just to find the sY value.
And then solve "sY=hash(sX,Y);", where Y is unknown, which makes the dependency much more complex.
Comparing to that, solving "sZ=hash(sY,Z);" for sY should be mostly immediate,
even though we may have to split it further, depending on CBMC capability,
but in any case, the complexity should be linear on Z size.
DernmIl (19th April 2019)