> The way your scheme "encodes" bytes into branch decisions destroys most of
> the useful information in there, because you're essentially forcing the
> branch predictor to predict the positions of byte value cc, without having
> any actual context.
Yes, but its not totally wrong even completely without context - it could work
as order0 model at least.
Also sure, maybe half of the bits of the global branch history are taken
by the loop branch, but then half still remains?
Also, I'm pretty sure that there's also a local history for a specific branch now.
> It *will* be able to pick up patterns when say a given byte value occurs
> every 4 positions, but not the more interesting case of characters that
> tend to occur grouped together etc.
Somehow its pretty bad in "every 4 positions case" actually, but gets better
at "every 16+ position" - see the image http://encode.su/threads/?p=51181&pp=1
> I'm not saying it wouldn't work at all - it obviously does - just that it's
> not a useful way to gauge the capabilities of the hardware.
Yes, its unlikely to be better than a real order0 coder - its still much worse
now though.
Atm I have 4 ideas about improving it:
1. Generating the code for estimate(), which would have a single branch per page,
and NOP padding in-between.
2. Converting bytewise data to 255 bitstreams first.
3. Adding cpu load in c!=cc case:
Code:
volatile uint f0,f1;
for( i=0,f0=0x12345678,f1=0x12345678; i<l; i++ ) {
if( p[i]!=cc ) f0=f1+f0/f1,f0=f1+f0/f1,f0=f1+f0/f1,f0=f1+f0/f1,++f1; // else ++f1;
}
4. Using some better method to convert timings to probabilities,
likely based on logistic function, currently its just
Code:
for( i=1; i<=256; i++ ) tim[i] = qword(t_max+1-t_min)*64/(tim[i]+1-t_min)+1; // (tim[i]-t_min)/(winsize/32)+1;
> A branch predictor isn't generating probability estimates for an arithmetic
> coder. Its end result is a single bit, "taken" or "not taken", which feeds
> into a bunch of machinery that ultimately decides which address to continue
> fetching instructions at.
Even for a single bit, its likely generating at least 2 bits - see https://en.wikipedia.org/wiki/Branch...rating_counter
Also, if there was no prediction at all, just more clocks to take a branch,
it would be already an equivalent of occurence counter
And again, look at the graph - compression smoothly improves with less
occurences of '1'. http://encode.su/threads/?p=51181&pp=1
> So for those branches, you keep track simply of whether they're biased one
> way or the other, mostly ignore the context, and always predict the most
> likely (in the past) outcome.
Uh, not quite.
1. Remember, it takes into account the clocks of _all_ branches on a given symbol.
So the timings would be smaller even depending on prediction quality of previous
symbols in the window. As I said, ideally it could perform like order0, or a little better.
2. Its not a fact that there're only 2 decisions on every branch.
There can also be "speculative execution", where both cases would be processed at once,
then one discarded - it could be enabled based on (un)certainty of BP prediction.
> I agree that yes, this is a thing you can measure, but very much disagree
> that it's a useful thing to measure, or that it tells you anything relevant
> about the branch predictor (or how well it does context modelling).
And I agree that this coder can be improved, but disagree that its useless atm -
let's just wait for the results of test with loop files from @encode.
Btw, files are here: http://nishi.dreamhosters.com/u/bpc_testfiles.rar
Run
Code:
for %a in (*.txt) do BPC.exe c %a %~na.bpc
Then
Code:
for %a in (*.bpc) do echo %~za >>results.txt
to store results.