# Thread: FARSH: hashing 30 GB/s!

1. ## FARSH: hashing 30 GB/s!

FARSH stands for fast and reliable (but not secure) hash, and I just released version 0.1

I'm interested in results of running benchmark executables (farsh*.exe) on your computers and, of course, in your versions with better bit mixing.

2. ## Thanks (2):

Cyan (29th May 2015),VoLT (29th May 2015)

3. I'm working on my side on an AVX2 version of xxHash.
It's not published yet because, as farsh 0.1, its quality is not yet good enough.

xxh_avx is quite fast, but not as fast as farsh numbers.
Plus, I like its uhash-like ability to generate any key size up to 1024 bits.
So it's a pretty good start.

4. The universal hashing formula used here (and derived from UMAC) is as simple as

for (i=0; i < elements; i+=2)
sum += (data[i] + key[i]) * (ULONG)(data[i+1] + key[i+1]);
Universal hashing mentions shifting result to the right.

I've made an experiment and made histograms of multiplication results after shifting to the right. Here's Scala code:
Code:
```object Main {
val factorSize = 8
val factorsNum = 1 << factorSize
val factors = 0 until factorsNum
val shifts = 0 to factorSize
val resultSize = factorSize
val resultsNum = 1 << resultSize
val resultMask = (1 << resultSize) - 1

val countsForShifts = Array.ofDim[Int](resultsNum, shifts.length)

val ideal = factorsNum * factorsNum / resultsNum

def tabS(s: String): String = f"\$s%12s "
def tabI(i: Int): String = tabS(i.toString)
def countErrorsSum(shift: Int): Int = {
countsForShifts.map(row => math.abs(ideal - row(shift))).sum
}

def main(args: Array[String]): Unit = {
for (factor1 <- factors; factor2 <- factors; shift <- shifts) {
val result = ((factor1 * factor2) >> shift) & resultMask
countsForShifts(result)(shift) += 1
}
println(tabS("Value") + tabS("Ideal") + shifts.map(tabI).mkString)
println("Rows:")
for ((countsRow, i) <- countsForShifts.zipWithIndex) {
println(tabI(i) + tabI(ideal) + countsRow.map(tabI).mkString)
}
println("Total error sums:")
println(tabS("") + tabI(0) + shifts.map(shift =>
println(s"Best shift is \$bestShift by \$timesBetter times than default one")
}
}```
The output is:
Code:
```       Value        Ideal            0            1            2            3            4            5            6            7            8
Rows:
0          256         1280          894          765          730          729          767          905         1226         1968
1          256          128          208          224          249          276          366          528          910         1301
2          256          256          268          279          288          318          366          549          713         1173
3          256          128          186          226          251          305          388          566          757         1100
4          256          384          334          291          268          271          364          441          659         1032
5          256          128          204          244          282          328          421          495          693          976
6          256          256          272          277          287          315          377          458          622          937
7          256          128          206          240          284          330          420          489          640          909
8          256          512          362          303          283          308          336          440          593          870
9          256          128          184          218          233          277          329          441          614          841
10          256          256          268          279          309          339          355          437          562          821
11          256          128          190          248          277          330          360          448          575          784
12          256          384          330          299          276          299          348          412          551          772
13          256          128          196          250          282          308          339          411          544          750
14          256          256          256          249          271          318          355          421          530          734
15          256          128          186          246          289          357          377          445          552          732
16          256          640          458          365          350          296          328          377          499          696
17          256          128          184          204          214          271          320          396          522          682
18          256          256          260          263          252          249          318          407          504          674
19          256          128          214          248          295          321          357          417          499          654
20          256          384          334          305          288          285          326          366          472          646
21          256          128          204          256          307          322          337          397          510          636
22          256          256          264          271          284          302          361          396          463          623
23          256          128          206          232          266          290          303          366          466          602
24          256          512          378          319          303          313          338          393          481          609
25          256          128          184          204          252          273          298          361          449          580
26          256          256          252          257          276          278          322          375          444          584
27          256          128          214          250          274          289          320          367          457          565
28          256          384          306          273          296          331          329          375          445          562
29          256          128          188          226          232          242          306          346          434          549
30          256          256          264          271          314          336          353          392          439          550
31          256          128          214          266          307          292          322          361          447          549
32          256          768          518          397          292          303          307          340          404          513
33          256          128          200          256          260          266          283          348          431          528
34          256          256          252          253          281          305          318          356          407          497
35          256          128          190          218          228          246          288          352          420          517
36          256          384          326          301          304          337          337          349          405          496
37          256          128          196          210          216          236          312          355          413          493
38          256          256          256          265          273          278          301          336          388          475
39          256          128          198          240          253          297          326          349          407          478
40          256          512          390          315          287          283          294          334          391          473
41          256          128          192          236          251          272          284          323          388          467
42          256          256          252          265          267          265          290          339          387          455
43          256          128          194          244          276          294          320          359          385          456
44          256          384          318          293          292          306          293          303          361          437
45          256          128          212          268          289          326          336          354          405          457
46          256          256          256          263          273          268          294          313          359          429
47          256          128          178          226          252          262          291          337          370          433
48          256          640          450          369          310          280          290          328          379          425
49          256          128          192          210          229          259          283          319          362          420
50          256          256          252          267          276          269          276          306          354          415
51          256          128          194          238          256          286          312          335          359          405
52          256          384          314          301          290          299          316          314          351          405
53          256          128          188          232          248          258          287          306          360          399
54          256          256          264          271          270          274          292          338          352          399
55          256          128          198          232          251          254          273          298          339          381
56          256          512          382          323          315          301          297          311          340          387
57          256          128          200          250          265          271          293          318          351          383
58          256          256          268          267          268          295          292          303          328          375
59          256          128          186          212          212          249          265          303          344          373
60          256          384          334          317          294          305          318          311          331          371
61          256          128          196          242          278          280          290          310          335          367
62          256          256          264          277          254          254          274          305          335          365
63          256          128          198          262          258          278          281          311          335          365
64          256          896          582          347          269          250          264          284          315          339
65          256          128          200          246          303          308          306          309          321          345
66          256          256          260          249          239          252          278          301          315          345
67          256          128          206          212          236          250          264          288          323          351
68          256          384          314          289          282          277          292          280          308          323
69          256          128          204          248          275          272          278          302          306          327
70          256          256          264          251          243          267          288          300          323          339
71          256          128          190          220          237          242          266          282          304          331
72          256          512          390          329          314          302          273          290          300          311
73          256          128          200          242          270          268          280          293          317          327
74          256          256          252          241          240          264          280          283          296          313
75          256          128          186          210          239          270          286          298          306          313
76          256          384          322          301          282          265          250          276          285          305
77          256          128          188          214          246          269          292          279          298          301
78          256          256          256          267          275          289          279          295          302          305
79          256          128          194          208          245          264          270          274          279          299
80          256          640          434          327          289          250          253          277          292          294
81          256          128          192          220          244          262          281          294          301          289
82          256          256          276          269          278          268          266          263          280          295
83          256          128          190          226          231          240          272          293          286          281
84          256          384          322          295          282          265          263          256          277          295
85          256          128          188          214          236          253          257          271          275          267
86          256          256          256          249          257          269          282          291          291          285
87          256          128          198          240          242          269          285          266          265          271
88          256          512          390          333          310          259          250          265          275          268
89          256          128          192          238          250          253          234          256          267          265
90          256          256          260          263          287          303          300          290          272          267
91          256          128          198          228          259          269          282          282          286          271
92          256          384          318          283          268          260          259          247          253          253
93          256          128          188          220          227          240          235          260          270          257
94          256          256          264          257          278          283          283          278          265          260
95          256          128          198          232          235          226          249          256          261          249
96          256          768          534          399          283          264          256          260          267          247
97          256          128          200          220          239          252          272          273          252          247
98          256          256          252          243          257          273          270          256          258          241
99          256          128          178          220          245          250          251          260          257          244
100          256          384          342          299          263          250          247          247          246          231
101          256          128          188          228          234          246          254          266          264          249
102          256          256          248          251          254          275          271          256          238          225
103          256          128          186          204          226          238          255          250          253          231
104          256          512          398          349          320          295          263          250          238          224
105          256          128          200          216          229          240          255          266          256          233
106          256          256          268          279          280          281          260          269          259          231
107          256          128          206          234          237          243          232          231          222          209
108          256          384          342          319          287          256          270          256          258          227
109          256          128          196          228          255          276          275          256          230          206
110          256          256          248          253          253          235          238          237          233          215
111          256          128          202          220          227          252          245          252          239          211
112          256          640          442          339          277          242          243          241          233          223
113          256          128          192          248          259          255          264          245          232          198
114          256          256          260          253          250          248          240          245          235          207
115          256          128          202          238          260          264          259          249          227          197
116          256          384          318          279          265          246          270          258          237          205
117          256          128          204          236          255          269          231          217          218          192
118          256          256          248          225          223          235          251          249          236          207
119          256          128          170          196          214          222          232          242          216          181
120          256          512          398          349          306          289          263          252          237          204
121          256          128          208          216          248          262          244          229          212          181
122          256          256          268          269          241          239          251          244          218          189
123          256          128          206          268          276          275          250          237          230          187
124          256          384          314          259          255          252          259          238          211          182
125          256          128          204          218          248          251          242          239          227          187
126          256          256          272          263          270          256          238          235          207          179
127          256          128          190          192          212          207          243          242          228          187
128          256         1024          514          337          259          250          238          228          207          168
129          256          128          176          230          256          263          257          226          205          169
130          256          256          244          259          247          232          233          236          223          179
131          256          128          198          252          266          266          257          231          190          162
132          256          384          306          255          253          245          221          224          222          175
133          256          128          180          214          245          258          248          220          192          161
134          256          256          240          249          262          243          230          229          201          158
135          256          128          178          202          211          230          255          243          226          173
136          256          512          406          339          286          256          231          208          180          151
137          256          128          200          256          278          270          241          234          210          162
138          256          256          244          259          252          256          252          226          201          161
139          256          128          194          222          226          220          232          216          187          145
140          256          384          310          263          247          256          238          224          203          158
141          256          128          188          216          225          242          228          231          198          151
142          256          256          256          245          228          210          218          214          191          145
143          256          128          198          232          248          264          251          230          201          154
144          256          640          438          353          303          256          241          213          189          139
145          256          128          200          238          257          238          231          210          186          145
146          256          256          252          259          259          271          255          242          200          144
147          256          128          170          206          210          205          218          210          186          141
148          256          384          306          277          263          253          229          212          185          133
149          256          128          180          190          202          221          222          213          188          136
150          256          256          248          259          277          279          271          233          194          143
151          256          128          178          202          223          235          227          219          184          127
152          256          512          390          323          276          226          201          180          166          132
153          256          128          200          242          253          259          257          227          192          133
154          256          256          260          245          257          260          235          228          176          127
155          256          128          170          212          229          232          208          198          179          126
156          256          384          334          309          277          249          243          215          184          129
157          256          128          196          228          247          238          238          211          172          123
158          256          256          248          259          245          236          232          216          182          116
159          256          128          170          196          232          220          210          194          169          121
160          256          768          506          385          301          269          251          230          189          123
161          256          128          184          210          201          209          219          194          158          110
162          256          256          260          265          256          252          231          214          175          117
163          256          128          194          236          243          234          221          202          162          110
164          256          384          314          289          267          247          233          204          178          121
165          256          128          188          228          235          243          222          211          168          109
166          256          256          256          245          242          250          233          206          167          108
167          256          128          186          210          222          223          227          200          162          103
168          256          512          378          311          260          250          218          200          180          120
169          256          128          192          230          244          237          224          215          168           99
170          256          256          260          245          242          253          228          181          140           97
171          256          128          190          210          213          205          218          208          172          100
172          256          384          322          289          279          254          219          181          160          105
173          256          128          172          190          202          220          236          228          167          101
174          256          256          256          243          230          227          206          181          148           94
175          256          128          206          236          237          251          218          195          164           93
176          256          640          446          365          309          242          226          200          154           94
177          256          128          192          220          234          243          240          202          154           95
178          256          256          260          263          251          228          220          195          159           95
179          256          128          190          196          199          194          201          191          147           86
180          256          384          326          297          275          250          219          204          165           89
181          256          128          196          242          265          259          237          186          140           82
182          256          256          248          267          277          268          240          207          160           89
183          256          128          186          218          222          226          214          185          150           91
184          256          512          386          311          272          235          200          196          146           78
185          256          128          184          212          226          239          219          181          143           79
186          256          256          244          255          247          225          223          198          147           80
187          256          128          198          206          209          188          192          180          147           79
188          256          384          306          289          271          249          233          196          150           78
189          256          128          188          232          259          237          224          191          146           81
190          256          256          248          241          223          249          229          196          142           73
191          256          128          186          200          197          204          182          170          142           70
192          256          896          570          343          298          265          249          211          139           73
193          256          128          184          196          208          205          192          181          146           68
194          256          256          252          237          230          246          225          168          129           69
195          256          128          178          206          215          223          209          197          149           76
196          256          384          326          317          313          287          234          192          134           67
197          256          128          180          190          206          206          198          182          136           61
198          256          256          248          247          236          205          210          166          132           62
199          256          128          194          234          240          234          208          187          140           61
200          256          512          378          309          277          258          221          182          123           64
201          256          128          184          180          195          201          196          180          135           59
202          256          256          260          245          227          201          194          168          129           60
203          256          128          198          216          222          230          228          200          146           65
204          256          384          318          289          283          264          235          182          122           53
205          256          128          196          216          211          217          200          164          128           54
206          256          256          256          263          262          235          208          186          124           51
207          256          128          190          210          210          213          186          168          131           52
208          256          640          462          363          306          252          225          193          132           53
209          256          128          192          234          257          237          205          158          115           48
210          256          256          236          233          239          234          224          192          135           51
211          256          128          194          216          224          224          201          167          120           48
212          256          384          318          275          259          239          224          186          138           53
213          256          128          196          236          219          202          201          175          114           45
214          256          256          256          245          222          212          192          164          118           42
215          256          128          186          222          237          198          193          158          111           41
216          256          512          378          305          275          284          235          201          127           40
217          256          128          192          212          209          197          188          165          119           41
218          256          256          252          259          252          237          209          166          118           38
219          256          128          186          206          226          230          212          172          106           39
220          256          384          322          287          255          238          215          172          124           36
221          256          128          196          222          234          216          178          155          108           37
222          256          256          248          237          243          247          203          179          122           36
223          256          128          186          202          206          194          204          155          105           35
224          256          768          490          355          256          242          214          181          125           38
225          256          128          184          210          204          193          197          161          113           31
226          256          256          260          263          250          246          223          176          116           29
227          256          128          206          222          236          226          188          158           96           28
228          256          384          298          263          226          222          212          170          116           27
229          256          128          196          230          227          203          177          156          104           26
230          256          256          264          263          251          219          210          179          108           25
231          256          128          198          242          247          234          193          144          101           24
232          256          512          370          305          277          268          226          180          113           23
233          256          128          184          214          220          223          203          159          103           22
234          256          256          244          235          231          197          181          156          102           21
235          256          128          178          208          230          216          178          154          100           20
236          256          384          298          251          230          228          211          167          108           19
237          256          128          188          210          210          202          192          165          108           18
238          256          256          264          265          256          254          216          156           94           17
239          256          128          182          214          212          197          196          155           95           16
240          256          640          454          335          288          218          192          170          109           15
241          256          128          192          218          226          231          208          166          102           14
242          256          256          252          241          243          205          185          144           91           13
243          256          128          182          224          245          234          194          154           96           12
244          256          384          322          275          254          251          221          171          102           11
245          256          128          180          186          188          192          182          142           92           10
246          256          256          264          261          238          205          179          154           96            9
247          256          128          214          250          265          246          219          170           95            8
248          256          512          370          297          247          225          186          148           99            7
249          256          128          176          218          217          226          195          150           86            6
250          256          256          244          233          220          203          192          159           95            5
251          256          128          178          210          235          228          201          161           94            4
252          256          384          326          287          260          212          192          143           87            3
253          256          128          180          204          195          204          189          154           93            2
254          256          256          240          243          221          199          178          146           90            1
255          256          128          194          242          273          287          243          179           97            0
Total error sums:
0        32768        16832         8970         6498         7128        10694        17826        29832        48550
Best shift is 3 by 5.042782394582948 times than default one```
Each column is for different shift. Shift 0 is the default one and it's the shift that Bulat used (shift 0 is no shift).

If I change factorSize and comment out printing of rows, ie:
Code:
```object Main {
val factorSize = 15
val factorsNum = 1 << factorSize
val factors = 0 until factorsNum
val shifts = 0 to factorSize
val resultSize = factorSize
val resultsNum = 1 << resultSize
val resultMask = (1 << resultSize) - 1

val countsForShifts = Array.ofDim[Int](resultsNum, shifts.length)

val ideal = factorsNum * factorsNum / resultsNum

def tabS(s: String): String = f"\$s%12s "
def tabI(i: Int): String = tabS(i.toString)
def countErrorsSum(shift: Int): Int = {
countsForShifts.map(row => math.abs(ideal - row(shift))).sum
}

def main(args: Array[String]): Unit = {
for (factor1 <- factors; factor2 <- factors; shift <- shifts) {
val result = ((factor1 * factor2) >> shift) & resultMask
countsForShifts(result)(shift) += 1
}
println(tabS("Value") + tabS("Ideal") + shifts.map(tabI).mkString)
//    println("Rows:")
//    for ((countsRow, i) <- countsForShifts.zipWithIndex) {
//      println(tabI(i) + tabI(ideal) + countsRow.map(tabI).mkString)
//    }
println("Total error sums:")
println(tabS("") + tabI(0) + shifts.map(shift =>
println(s"Best shift is \$bestShift by \$timesBetter times than default one")
}
}```
Then I get the following:
Code:
```       Value        Ideal            0            1            2            3            4            5            6            7            8            9           10           11           12           13           14           15
Total error sums:
0    536870912    269083040    135153166     68433600     35241912     19274406     13546862     13617772     18553286     30496378     53392724     94561504    166028768    286628836    483389300    790058434
Best shift is 6 by 39.63064745178625 times than default one```
With growing factor size there are growing differences. They certainly will cause problems when there's only one pair of inputs to hash and the shift is bad. I don't know wheter the spread would be bad if there are many pairs to be hashed, but there's SMHasher and its reports.

5. Piotr: yes, the hash value produced directly by this formula, should be shifted by 32 for correct result, since lower bits aren't 100% random. but since we are using this value only as entropy source, saving entire 64 bits can't make results worser. from the practical grounds, my first version saved only higher 32 bits and i've quickly discovered that similar short source data (such as {1} and {2}) leads to the similar hashes

uhash saves entire 64 bit results for 1024-byte blocks, concatenates them and then runs more complex polynomial hashing over these data. i just tried to calculate crc32 of the same data (sequence of 64-bit block hashes) and it made much better smhasher results - now only Avalanche and Zeroes tests are fail

6. i7-4770/16Gb/W7C x64
farsh-x86.exe | Hashing 100 GiB (b004ee2e): 16754.775 milliseconds = 6.409 GB/s = 5.968 GiB/s
farsh-x86-sse2.exe | Hashing 100 GiB (b004ee2e): 3448.710 milliseconds = 31.135 GB/s = 28.996 GiB/s
farsh-x86-avx2.exe | Hashing 100 GiB (b004ee2e): 1853.249 milliseconds = 57.938 GB/s = 53.959 GiB/s
farsh-x64.exe | Hashing 100 GiB (b004ee2e): 3960.820 milliseconds = 27.109 GB/s = 25.247 GiB/s
farsh-x64-avx2.exe | Hashing 100 GiB (b004ee2e): 1891.681 milliseconds = 56.761 GB/s = 52.863 GiB/s

7. hey, 404, i have exactly the same cpu! i would like to see other cpu results but it seems that only haswellers are brave enough to run these executables

8. Originally Posted by Bulat Ziganshin
i would like to see other cpu results

Code:
```farsh-x86.exe
Hashing 100 GiB (b004ee2e): 32914.037 milliseconds = 3.262 GB/s = 3.038 GiB/s

farsh-x86-sse2.exe
Hashing 100 GiB (b004ee2e): 15302.369 milliseconds = 7.017 GB/s = 6.535 GiB/s

farsh-x64.exe
Hashing 100 GiB (b004ee2e): 15822.045 milliseconds = 6.786 GB/s = 6.320 GiB/s```
C2D E6550 @2.33 GHz, 6GB DDR2, W7x64

9. PAQer, thanks, you are reading my thoughts! i expected good perfromance on Core2 since it got 128-bit SIMD engines, but your results are discouraging. i looked into the problem and it seems that CPU spends half of the time on unaligned loads. so i prepared executables with all data aligned and, optionally, aligned loads are used. can you please run at least all 4 farsh-x86-sse2.exe programs from the attached archive. even better - run all non-avx executables and give me a console output (i can format it myself )

these executables should tell a lot for any pre-haswell cpu, since only haswell finally get rid of unaligned penalty. if it will show the difference, i will probably need to provide 2-4 variants of SSE2 code with dynamic selection on the alignment of key+data

10. Phenom II 945, 3 GHz, W7x64, 4 GB RAM

farsh0.1\benchmark>farsh-x86.exe
Hashing 100 GiB (b004ee2e): 24995.086 milliseconds = 4.296 GB/s = 4.001 GiB/s

farsh0.1\benchmark>farsh-x64.exe
Hashing 100 GiB (b004ee2e): 7903.949 milliseconds = 13.585 GB/s = 12.652 GiB/s

farsh0.1\benchmark>farsh-x86-sse2.exe
Hashing 100 GiB (b004ee2e): 7244.024 milliseconds = 14.822 GB/s = 13.804 GiB/s

---

farsh-alignment\align-both>farsh-x86.exe
Hashing 100 GiB (fe82e924): 30476.906 milliseconds = 3.523 GB/s = 3.281 GiB/s

farsh-alignment\align-both>farsh-x64.exe
Hashing 100 GiB (fe82e924): 8754.846 milliseconds = 12.265 GB/s = 11.422 GiB/s

farsh-alignment\align-both>farsh-x86-sse2.exe
Hashing 100 GiB (fe82e924): 7384.217 milliseconds = 14.541 GB/s = 13.542 GiB/s

---

farsh-alignment\align-data>farsh-x86.exe
Hashing 100 GiB (fe82e924): 30467.067 milliseconds = 3.524 GB/s = 3.282 GiB/s

farsh-alignment\align-data>farsh-x64.exe
Hashing 100 GiB (fe82e924): 8760.540 milliseconds = 12.257 GB/s = 11.415 GiB/s

farsh-alignment\align-data>farsh-x86-sse2.exe
Hashing 100 GiB (fe82e924): 7394.061 milliseconds = 14.522 GB/s = 13.524 GiB/s

---

farsh-alignment\align-key>farsh-x86.exe
ashing 100 GiB (fe82e924): 30472.935 milliseconds = 3.524 GB/s = 3.282 GiB/s

farsh-alignment\align-key>farsh-x64.exe
Hashing 100 GiB (fe82e924): 8758.215 milliseconds = 12.260 GB/s = 11.418 GiB/s

farsh-alignment\align-key>farsh-x86-sse2.exe
Hashing 100 GiB (fe82e924): 7382.260 milliseconds = 14.545 GB/s = 13.546 GiB/s

---

farsh-alignment\no-align>farsh-x86.exe
Hashing 100 GiB (fe82e924): 30452.065 milliseconds = 3.526 GB/s = 3.284 GiB/s

farsh-alignment\no-align>farsh-x64.exe
Hashing 100 GiB (fe82e924): 8966.305 milliseconds = 11.975 GB/s = 11.153 GiB/s

farsh-alignment\no-align>farsh-x86-sse2.exe
Hashing 100 GiB (fe82e924): 7545.100 milliseconds = 14.231 GB/s = 13.254 GiB/s

11. Code:
```align-both.farsh-x64-avx2.exe | Hashing 100 GiB (fe82e924):  2037.786 milliseconds = 52.692 GB/s = 49.073 GiB/s
align-both.farsh-x64.exe      | Hashing 100 GiB (fe82e924):  3565.271 milliseconds = 30.117 GB/s = 28.048 GiB/s
align-both.farsh-x86-avx2.exe | Hashing 100 GiB (fe82e924):  2227.479 milliseconds = 48.204 GB/s = 44.894 GiB/s
align-both.farsh-x86-sse2.exe | Hashing 100 GiB (fe82e924):  3522.907 milliseconds = 30.479 GB/s = 28.386 GiB/s
align-both.farsh-x86.exe      | Hashing 100 GiB (fe82e924): 18804.064 milliseconds =  5.710 GB/s =  5.318 GiB/s
align-data.farsh-x64-avx2.exe | Hashing 100 GiB (fe82e924):  2040.561 milliseconds = 52.620 GB/s = 49.006 GiB/s
align-data.farsh-x64.exe      | Hashing 100 GiB (fe82e924):  3577.289 milliseconds = 30.016 GB/s = 27.954 GiB/s
align-data.farsh-x86-avx2.exe | Hashing 100 GiB (fe82e924):  2245.868 milliseconds = 47.810 GB/s = 44.526 GiB/s
align-data.farsh-x86-sse2.exe | Hashing 100 GiB (fe82e924):  3559.210 milliseconds = 30.168 GB/s = 28.096 GiB/s
align-data.farsh-x86.exe      | Hashing 100 GiB (fe82e924): 18721.620 milliseconds =  5.735 GB/s =  5.341 GiB/s
align-key.farsh-x64-avx2.exe  | Hashing 100 GiB (fe82e924):  2030.425 milliseconds = 52.883 GB/s = 49.251 GiB/s
align-key.farsh-x64.exe       | Hashing 100 GiB (fe82e924):  3581.340 milliseconds = 29.982 GB/s = 27.923 GiB/s
align-key.farsh-x86-avx2.exe  | Hashing 100 GiB (fe82e924):  2230.221 milliseconds = 48.145 GB/s = 44.839 GiB/s
align-key.farsh-x86-sse2.exe  | Hashing 100 GiB (fe82e924):  3510.804 milliseconds = 30.584 GB/s = 28.484 GiB/s
align-key.farsh-x86.exe       | Hashing 100 GiB (fe82e924): 18792.595 milliseconds =  5.714 GB/s =  5.321 GiB/s
no-align.farsh-x64-avx2.exe   | Hashing 100 GiB (fe82e924):  2034.917 milliseconds = 52.766 GB/s = 49.142 GiB/s
no-align.farsh-x64.exe        | Hashing 100 GiB (fe82e924):  3997.399 milliseconds = 26.861 GB/s = 25.016 GiB/s
no-align.farsh-x86-avx2.exe   | Hashing 100 GiB (fe82e924):  2217.646 milliseconds = 48.418 GB/s = 45.093 GiB/s
no-align.farsh-x86-sse2.exe   | Hashing 100 GiB (fe82e924):  3588.777 milliseconds = 29.919 GB/s = 27.865 GiB/s
no-align.farsh-x86.exe        | Hashing 100 GiB (fe82e924): 18802.944 milliseconds =  5.710 GB/s =  5.318 GiB/s```

12. Intel i7 4960X OC 4.5GHz, 32GB DDR3 2000MHz, Win 8.1 64bit:

farsh-x64.exe
Hashing 100 GiB (b004ee2e): 3311.381 milliseconds = 32.426 GB/s = 30.199 GiB/s

benchmark>farsh-x64-avx2.exe
Crash

farsh-x86.exe
Hashing 100 GiB (b004ee2e): 14734.266 milliseconds = 7.287 GB/s = 6.787 GiB/s

farsh-x86-sse2.exe
Hashing 100 GiB (b004ee2e): 3302.627 milliseconds = 32.512 GB/s = 30.279 GiB/s

farsh-x86-avx2.exe
Crash

13. Intel i7 4960X is an Ivy Bridge and it doesn't support AVX2 so the crash is rather expected although I think printing error message would be a lot better than just plainly crashing :]

14. Originally Posted by choochootrain
Phenom II 945, 3 GHz, W7x64, 4 GB RAM
thank you, that's interesting - seems that phenom2 doesn't have any problems with misaligned data

15. i've fixed Zeroes - it was just a bug in my code. now i have two versions that fails only on Avalanche test:
- version 1 with report
- version 2 with report

16. Intel Pentium M 1,5 GHz + Win XP SP3 32 bit Czech + 512 MB RAM

align-both\farsh-x32.exe Hashing 100 GiB (fe82e924): 74240.206 milliseconds = 1.446 GB/s = 1.347 GiB/s
align-both\farsh-x32-avx2.exe Not supported
align-both\farsh-x32-sse2.exe Hashing 100 GiB (fe82e924): 40271.890 milliseconds = 2.666 GB/s = 2.483 GiB/s

align-data\farsh-x32.exe Hashing 100 GiB (fe82e924): 73956.563 milliseconds = 1.452 GB/s = 1.352 GiB/s
align-data\farsh-x32-avx2.exe Not supported
align-data\farsh-x32-sse2.exe Hashing 100 GiB (fe82e924): 46726.115 milliseconds = 2.298 GB/s = 2.140 GiB/s

align-key\farsh-x32.exe Hashing 100 GiB (fe82e924): 74018.630 milliseconds = 1.451 GB/s = 1.351 GiB/s
align-key\farsh-x32-avx2.exe Not supported
align-key\farsh-x32-sse2.exe Hashing 100 GiB (fe82e924): 46018.345 milliseconds = 2.333 GB/s = 2.173 GiB/s

no-align\farsh-x32.exe Hashing 100 GiB (fe82e924): 73991.978 milliseconds = 1.451 GB/s = 1.351 GiB/s
no-align\farsh-x32-avx2.exe Not supported
no-align\farsh-x32-sse2.exe Hashing 100 GiB (fe82e924): 51969.406 milliseconds = 2.066 GB/s = 1.924 GiB/s

17. thanks. Pentium M was the last Intel CPU with 64-bit SSE engine, so it shows less speed and less difference between aligned and non-aligned versions. but aligned is still 30% faster

i feel that on older CPUs xxHash may be faster than any Farsh variants, just because SSE was too slow and pure x86 version is faster in xxHash

18. Originally Posted by Bulat Ziganshin
PAQer, thanks, you are reading my thoughts! i expected good perfromance on Core2 since it got 128-bit SIMD engines, but your results are discouraging. i looked into the problem and it seems that CPU spends half of the time on unaligned loads.
Yes, aligned loads almost 1.6 times faster here compared to unaligned.
BTW: Conroe has slow shuffle unit (Penryn doubled shuffle perfomance).

Code:
```T:\farsh-alignment\align-both>farsh-x86.exe
Hashing 100 GiB (fe82e924): 43673.399 milliseconds = 2.459 GB/s = 2.290 GiB/s

T:\farsh-alignment\align-both>farsh-x86-sse2.exe
Hashing 100 GiB (fe82e924): 10475.474 milliseconds = 10.250 GB/s = 9.546 GiB/s

T:\farsh-alignment\align-both>farsh-x64.exe
Hashing 100 GiB (fe82e924): 12140.137 milliseconds = 8.845 GB/s = 8.237 GiB/s

T:\farsh-alignment\align-data>farsh-x86.exe
Hashing 100 GiB (fe82e924): 43610.016 milliseconds = 2.462 GB/s = 2.293 GiB/s

T:\farsh-alignment\align-data>farsh-x86-sse2.exe
Hashing 100 GiB (fe82e924): 13461.910 milliseconds = 7.976 GB/s = 7.428 GiB/s

T:\farsh-alignment\align-data>farsh-x64.exe
Hashing 100 GiB (fe82e924): 15137.458 milliseconds = 7.093 GB/s = 6.606 GiB/s

T:\farsh-alignment\align-key>farsh-x86.exe
Hashing 100 GiB (fe82e924): 43140.574 milliseconds = 2.489 GB/s = 2.318 GiB/s

T:\farsh-alignment\align-key>farsh-x86-sse2.exe
Hashing 100 GiB (fe82e924): 14095.930 milliseconds = 7.617 GB/s = 7.094 GiB/s

T:\farsh-alignment\align-key>farsh-x64.exe
Hashing 100 GiB (fe82e924): 15810.221 milliseconds = 6.791 GB/s = 6.325 GiB/s

T:\farsh-alignment\no-align>farsh-x86.exe
Hashing 100 GiB (fe82e924): 43529.906 milliseconds = 2.467 GB/s = 2.297 GiB/s

T:\farsh-alignment\no-align>farsh-x86-sse2.exe
Hashing 100 GiB (fe82e924): 17283.675 milliseconds = 6.212 GB/s = 5.786 GiB/s

T:\farsh-alignment\no-align>farsh-x64.exe
Hashing 100 GiB (fe82e924): 17802.779 milliseconds = 6.031 GB/s = 5.617 GiB/s```

19. Originally Posted by PAQer
BTW: Conroe has slow shuffle unit (Penryn doubled shuffle perfomance).
UMAC employs PSRLDQ instead of PSHUFD, that is 2x faster on early CPUs. On newer ones, though, it uses the same port as PMUL, so it may limit performance to 2 cycles per one SIMD word. Probably, interleaving them will provide better perfromance for SandyBridge+ CPUs (which can load 2 SIMD words per cycle), while PSRLDQ-only cycle is optimum for older Intel CPUs

although in the case of Core2, the speed may be limited by the code size - Core2 can't decode more than 16 instruction bytes per cycle and has only 64-byte loop buffer. At least, it may be the reason why x64 version is slower than x86-sse2 one. And it may be the reason why aligned version is faster - it should have smaller code

20. Impressive work !

I've run all test on my i7-4700MQ (running @ ~3.2 GHz).

Order is
farsh-x64
farsh-x64-avx2
farsh-x86
farsh-x86-avx2
farsh-x86-sse2
for all tests.
Code:
```align-both
Hashing 100 GiB (fe82e924): 4081.477 milliseconds = 26.308 GB/s = 24.501 GiB/s
Hashing 100 GiB (fe82e924): 2304.709 milliseconds = 46.589 GB/s = 43.389 GiB/s
Hashing 100 GiB (fe82e924): 21166.664 milliseconds = 5.073 GB/s = 4.724 GiB/s
Hashing 100 GiB (fe82e924): 2501.702 milliseconds = 42.920 GB/s = 39.973 GiB/s
Hashing 100 GiB (fe82e924): 3968.424 milliseconds = 27.057 GB/s = 25.199 GiB/s

align-data
Hashing 100 GiB (fe82e924): 4070.672 milliseconds = 26.378 GB/s = 24.566 GiB/s
Hashing 100 GiB (fe82e924): 2298.015 milliseconds = 46.725 GB/s = 43.516 GiB/s
Hashing 100 GiB (fe82e924): 21197.086 milliseconds = 5.066 GB/s = 4.718 GiB/s
Hashing 100 GiB (fe82e924): 2515.694 milliseconds = 42.682 GB/s = 39.750 GiB/s
Hashing 100 GiB (fe82e924): 3994.229 milliseconds = 26.882 GB/s = 25.036 GiB/s

align-key
Hashing 100 GiB (fe82e924): 4070.239 milliseconds = 26.380 GB/s = 24.569 GiB/s
Hashing 100 GiB (fe82e924): 2300.552 milliseconds = 46.673 GB/s = 43.468 GiB/s
Hashing 100 GiB (fe82e924): 21193.883 milliseconds = 5.066 GB/s = 4.718 GiB/s
Hashing 100 GiB (fe82e924): 2499.901 milliseconds = 42.951 GB/s = 40.002 GiB/s
Hashing 100 GiB (fe82e924): 3969.799 milliseconds = 27.048 GB/s = 25.190 GiB/s

no-align
Hashing 100 GiB (fe82e924): 4563.363 milliseconds = 23.530 GB/s = 21.914 GiB/s
Hashing 100 GiB (fe82e924): 2315.089 milliseconds = 46.380 GB/s = 43.195 GiB/s
Hashing 100 GiB (fe82e924): 21247.594 milliseconds = 5.053 GB/s = 4.706 GiB/s
Hashing 100 GiB (fe82e924): 2503.921 milliseconds = 42.882 GB/s = 39.937 GiB/s
Hashing 100 GiB (fe82e924): 4055.867 milliseconds = 26.474 GB/s = 24.656 GiB/s```
I could run tests on a i7-4500U ULV processor as well. Sadly, I don't have a non-Haswell CPU anymore..

21. Intel Core2 Duo CPU T7100 1,8 GHz + Win XP SP3 32 bit Czech + 1024 MB RAM

C:\Data\x>.\align-both\farsh-x32.exe Hashing 100 GiB (fe82e924): 48604.204 milliseconds = 2.209 GB/s = 2.057 GiB/s
C:\Data\x>Rem .\align-both\farsh-x32-avx2.exe Not supported
C:\Data\x>.\align-both\farsh-x32-sse2.exe Hashing 100 GiB (fe82e924): 11917.074 milliseconds = 9.010 GB/s = 8.391 GiB/s
C:\Data\x>.\align-data\farsh-x32.exe Hashing 100 GiB (fe82e924): 48586.506 milliseconds = 2.210 GB/s = 2.058 GiB/s
C:\Data\x>Rem .\align-data\farsh-x32-avx2.exe Not supported
C:\Data\x>.\align-data\farsh-x32-sse2.exe Hashing 100 GiB (fe82e924): 15099.395 milliseconds = 7.111 GB/s = 6.623 GiB/s
C:\Data\x>.\align-key\farsh-x32.exe Hashing 100 GiB (fe82e924): 48682.345 milliseconds = 2.206 GB/s = 2.054 GiB/s
C:\Data\x>Rem .\align-key\farsh-x32-avx2.exe Not supported
C:\Data\x>.\align-key\farsh-x32-sse2.exe Hashing 100 GiB (fe82e924): 15657.168 milliseconds = 6.858 GB/s = 6.387 GiB/s
C:\Data\x>.\no-align\farsh-x32.exe Hashing 100 GiB (fe82e924): 48682.790 milliseconds = 2.206 GB/s = 2.054 GiB/s
C:\Data\x>Rem .\no-align\farsh-x32-avx2.exe Not supported
C:\Data\x>.\no-align\farsh-x32-sse2.exe Hashing 100 GiB (fe82e924): 19191.652 milliseconds = 5.595 GB/s = 5.211 GiB/s

22. 1. i've found a way to make keys always aligned at 16 byte boundary - FARSH_KEYS by itself is aligned and each next (32-bit) hash will use key material at +16 offset compared to previous one. this means that overall key size for 4*n bytes of hash will become 1024+(n-1)*16 bytes instead of 1024+(n-1)*4 in the first version. in particular, 1024-bit hash result will require 1520 bytes of key material instead of 1148 bytes

2. and algnment of data being hashed will be checked in farsh_fast(), so the price will be one always-taken branch - very low for any CPU with branch-prediction capabilities

3. now i translate UHASH higher-level hashes (i.e bit mixers) into my code. so the whole FARSH becomes UHASH with prebuilt key material. The UHASH mixers are ~100 lines of code, so it may be not much price for guaranteed quality

UPDATE: i copied UHASH bit mixing algorithm for blocks up to 1024 bytes: code & report. But either my copy is wrong or it doesn't make SMHasher happy. Now i need to add vanilla UHASH to my SMHasher tests to find on what side is problem

23. Intel Core i3-3220 CPU 3,3 GHz + Win 7 32 bit Czech + 4 GB RAM
.\align-both\farsh-x32.exe Hashing 100 GiB (fe82e924): 25963.731 milliseconds = 4.136 GB/s = 3.852 GiB/s
.\align-both\farsh-x32-avx2.exe Not supported
.\align-both\farsh-x32-sse2.exe Hashing 100 GiB (fe82e924): 4696.721 milliseconds = 22.862 GB/s = 21.291 GiB/s
.\align-data\farsh-x32.exe Hashing 100 GiB (fe82e924): 25863.286 milliseconds = 4.152 GB/s = 3.866 GiB/s
.\align-data\farsh-x32-avx2.exe Not supported
.\align-data\farsh-x32-sse2.exe Hashing 100 GiB (fe82e924): 4941.686 milliseconds = 21.728 GB/s = 20.236 GiB/s
.\align-key\farsh-x32.exe Hashing 100 GiB (fe82e924): 25962.321 milliseconds = 4.136 GB/s = 3.852 GiB/s
.\align-key\farsh-x32-avx2.exe Not supported
.\align-key\farsh-x32-sse2.exe Hashing 100 GiB (fe82e924): 4883.320 milliseconds = 21.988 GB/s = 20.478 GiB/s
.\no-align\farsh-x32.exe Hashing 100 GiB (fe82e924): 25708.534 milliseconds = 4.177 GB/s = 3.890 GiB/s
.\no-align\farsh-x32-avx2.exe Not supported
.\no-align\farsh-x32-sse2.exe Hashing 100 GiB (fe82e924): 4999.023 milliseconds = 21.479 GB/s = 20.004 GiB/s

24. meanwhile, i've found interesting suite of PRNGs. among other things, their xorshift128+ may be used to initialize the FARSH_KEYS[]

25. i added uhash, vhash and poly-1305 algos to my SMHasher suite. of those, only VHASH succesfully passed all tests

i also extended information printed by the AvalancheTest and made conclusion that block hashes in the Farsh return enough entropy, only their mixing need to be improved. so i tried mixing from xxHash64 which, finally, made all SMHasher tests happy. also i added 64-bit 'seed' in the same way as it is implemented in xxHash64, in order to satisfy both SMHasher and most demanding users

the mixing algo kidnapped from xxHash64 made computations up to 40% slower, so i will look further into balancing speed&quality. but at least Farsh idea works, now passing all SMHasher tests while being faster than any other hashes. although numeric quality measures made by SMHasher tests are slightly worse than those for murmur3a and probably xxHash too

26. ## Thanks:

Cyan (8th June 2015)

27. Timings from my old netbook, it may seem a bit derogatory, but I am surprised it got so fast there!

(Netbook Aspire One Intel(R) Atom(TM) CPU N2600 @ 1.60GHz, 2 cores Ubuntu 64-bit )
Code:
```~/farsh/farsh-alignment/no-align \$ wine farsh-x64.exe
Hashing 100 GiB (fe82e924): 99102.602 milliseconds = 1.083 GB/s = 1.009 GiB/s
~/farsh/farsh-alignment/no-align \$ wine farsh-x86.exe
Hashing 100 GiB (fe82e924): 132066.573 milliseconds = 0.813 GB/s = 0.757 GiB/s
~/farsh/farsh-alignment/no-align \$ wine farsh-x86-sse2.exe
Hashing 100 GiB (fe82e924): 87109.251 milliseconds = 1.233 GB/s = 1.148 GiB/s

~/farsh/farsh-alignment/align-key \$ wine farsh-x64.exe
Hashing 100 GiB (fe82e924): 65907.654 milliseconds = 1.629 GB/s = 1.517 GiB/s
~/farsh/farsh-alignment/align-key \$ wine farsh-x86.exe
Hashing 100 GiB (fe82e924): 131712.031 milliseconds = 0.815 GB/s = 0.759 GiB/s
~/farsh/farsh-alignment/align-key \$ wine farsh-x86-sse2.exe
Hashing 100 GiB (fe82e924): 61781.707 milliseconds = 1.738 GB/s = 1.619 GiB/s

~/farsh/farsh-alignment/align-data \$ wine farsh-x64.exe
Hashing 100 GiB (fe82e924): 65907.564 milliseconds = 1.629 GB/s = 1.517 GiB/s
~/farsh/farsh-alignment/align-data \$ wine farsh-x86.exe
Hashing 100 GiB (fe82e924): 131792.882 milliseconds = 0.815 GB/s = 0.759 GiB/s
~/farsh/farsh-alignment/align-data \$ wine farsh-x86-sse2.exe
Hashing 100 GiB (fe82e924): 61750.769 milliseconds = 1.739 GB/s = 1.619 GiB/s

~/farsh/farsh-alignment/align-both \$ wine farsh-x64.exe
Hashing 100 GiB (fe82e924): 43861.363 milliseconds = 2.448 GB/s = 2.280 GiB/s
~/farsh/farsh-alignment/align-both \$ wine farsh-x86.exe
Hashing 100 GiB (fe82e924): 131727.429 milliseconds = 0.815 GB/s = 0.759 GiB/s
~/farsh/farsh-alignment/align-both \$ wine farsh-x86-sse2.exe
Hashing 100 GiB (fe82e924): 46215.390 milliseconds = 2.323 GB/s = 2.164 GiB/s```

28. it has speeds close to Pentium M. effect of alignment is up to 2.3x speedup that is absolute record. thanks!

29. Bulat, I have found minor issues in the docs:
1. https://github.com/Bulat-Ziganshin/FARSH#the-main-loop
The link to the source code is outdated
2.
OLD NOTE: FARSH isn't yet ready for practical use since SMHasher shows a lot of problems in the current implementation. But FARSH main loop implements universal hashing scheme that's mathematically proven to guarantee ideal hashing (as far as key material is random) and it employs the formula successfully used in cryptographic UMAC/VMAC algorithms.
This is not just old, but outdated. I think it's best to remove it, otherwise please be clear that it doesn't apply anymore.
The following paragraph (maybe two) is outdated too and it's not indicated at all.
3. Is the bitstream stable already? I don't see it mentioned and it's important for the person I'm trying to convert to FARSH.

Also, it would be good to compare the algorithm with its competitors.

30. Well, it was started as experiment - i want to check whether it will be possible to use UHash construction to provide non-crypto hash that is both very fast and has a high quality.

FARSH 0.1 reached the high-speed goal, but not passed some SMHasher tests. So i added an extra bit-mixing code from xxHash and produced FARSH 0.2. It passes all SMHasher tests and only ~10% slower than 0.1 version. So the code for 0.2 is finished, but i should finish updating the docs in order to release it.

Overall, this project wasn't finished, so i hardly can talk about bitstream stability. Even if i will update the docs and release 0.2, i still doesn't have answer to the question - is it really as high-quality as for example xxHash? It's possible that by adding extra mixing in 0.2 i just have hidden dust under a carpet.

I had a plan to check the same extra-mixing approach on other lower-quality hashes like CRC - if their SMHasher rank also can be improved this way, it would be a strong signal (against quality of FARSH 0.2 and usability of SMHasher overall). But i never implemented that plan, too.

Anyway, little may be done to improve this construction quality. Regarding speed, it may be improved by 10-20%, but no more. Overall, you can employ the current code as hash of unknown quality, but definitely better than CRC32 and faster than CRC32 in most configs. xxHash64 is a faster way to produce 64-bit hash result on 64-bit pre-AVX2 CPU, in all remaining situations FARSH should be faster. Bitstream most probably will be changed in future versions.

31. Analyzis of strong and weak points of FARSH and xxHash led me to the hash construction that should be best of both worlds - xxHash quality at FARSH speed.

That is the FARSH weak point? Each 4-byte word inside 4 KB block is processed independent of other words and results are just summed up. So the data from different words are mixed up in the rather simple and predictable way. This potentially simplifies bit flips that may remain undetected by the hash. xxHash looks stronger here - each input word is mixed up into 128-bit intermediate value whoise bits are continued to mix up on the every computation step. I feel more confident with such scheme since it better differentiates influence of different input words on the final hash value.

Also, FARSH intermediate value is only 64-bit, and i feel that for 32-bit final value 128-bit intermediate value would be preferable.

xxHash weak point is that you need to finish processing of current 128 bits prior to going to the next chunk. It's strongly optimized for x86 cpus, but doesn't leave room for further SIMD optimizations (aside of calculation miltiple hashes simultaneously). FARSH scheme is ideal here - you can run everuthing parallel and overall speed is limited only by serial (and rather slow) process of combining 8-byte intermediate values.

So, the ideal hash should split input data into fixed size blocks (say 4 KB) and process each block independently, returning 128 bits of intermediate hash. On the next level, those 128-bit values from each block should be combined with secondary hashing to produce the final result. Secondary hashing process may be stronger (up to cryptohashes), since it needs to process much less data.

The overall hash speed will be limited to the speed of secondary hashing (let's conservatively say 1 GB/s) multiplied by the primary hash reduction coeffecient (4KB/128bit=256), i.e. 256 GB/s even for this conservative estimation. And by adding one more level of secondary hashing, we may change the limit to 256*256 GB/s = 64 TB/s. Of course, it's just the upper limit for infinite computation resources, but f.e. GPUs will reach this limit in the next 5 years.

Overall, the idea of multi-level hashing is well known in cryptography as tree hahsing, and you can find in http://eprint.iacr.org/2009/210.pdf how to build cryptohash this way. While we don't have such high goals, obeying these rules may be useful to improve our hash quality.

While 128-bit blocks are ideal fit for SSE registers, they are sub-optimal for avx2/avx-512 and especially GPUs (nvidia/amd has 1024/2048-bit words while Intel GPUs support multiple widths but 512 bits is optimum). I think that 512-bit blocks would be the best choice, summing up all platforms.

Inside 512-bit chunk, we should have four 128-bit streams, processed independent on each other. This means that older platforms like i386 will process only one stream simultaneously, and then switch to the next stream. SSE2-capable cpus will process all 4 streams simultaneously - it will provide enough ILP to hide dealys. AVX2 cpus will process 2 streams per AVX register, 8 streams from 2 blocks simultaneously in order to hide delays. And AVX-512 devices will process 4 blocks simultaneously. So any device will get enough ILP to hide operation delays and fill all its computation resources.

nvidia/amd GPUs will need to process 2/4 blocks in the single register but their LD engines should easily handle such non-coalescing access (or extra warp shuffle operation). Overall, they can reach multi-terabyte/s speed, much faster than their on-board DRAM speeds.

We haven't discussed yet the internal hashing loop, but it can be any function squeezing two 128-bit values into one: hash128=update(hash128,data128 ). One possibility is the xxHash code. Another one is slightly modified FARSH code:

Code:
```uint32 data[4]
uint32 hash[4] === uint64 hash64[2] === uint128 hash128

hash64[0] = (data[0]+hash[0]) * (data[1]+hash[1])
hash64[1] = (data[2]+hash[2]) * (data[3]+hash[3])
ROTATE hash128,1   // rotate the entire 128-bit register by 1 byte```
This algorithm mixes the input data into all 128 hash bits, but it's still simpler than xxHash inner cycle. It's just 4 SIMD operations per 16 input bytes:

PSHUFD xmm1,xmm0,xxx
PMULUD xmm0,xmm1
PALIGNR xmm0,xmm0,1

so we may expect the same speed as with FARSH on CPUs with SSSE3. Since AVX operations process each 128-bit lane of AVX register independently, the same sequence will work with AVX2 too, doubling the speed compared to SSE2. AVX-512 will probably require more commands to rotate each 128-bit lane independently. CPUs with SSE2 but no SSSE3 (Pentium4,PenitumM,Core1,K8..K10) require 3 operations - PSLLDQ+PSRLDQ+POR. Pre-SSE2 cpus will need 4 SHRD operations.

And finally, like FARSH we can use multiple initial hash128 values to produce multiple independent 32-bit hashes (and combine them into 64+ bit hashes), and XOR initvals with user-supplied SEED value in order to provide control to the user.

32. ## Thanks (3):

Cyan (26th June 2016),Mike (26th June 2016),SolidComp (28th June 2016)

33. I haven't done QA for FARSH, but it's core looks more trustworthy than the permutation+mul+rol of xxHash. Which I definitely don't consider to be a good quality.

Page 1 of 3 123 Last