@lauriewired If you only modify the taken hit bit of the instruction, you still need to wait for the next cacheline clock cycle when encountering a conditional boundary for instruction fetching. Therefore, you must modify both the speculative bit and the instruction distribution。
I love looking at old crazy ideas in expired computer patents.
Modern CPU Branch Predictors are invisible. You, as a dev, can’t really see *what* paths the CPU is guessing…it’s all a bunch of AMD/Intel/Apple secret sauce.
For a brief moment in the 80s, there was this wacky proposition of storing the prediction IN the opcodes themselves.
If you don’t understand why that’s insane, bear with me for a second.
Imagine your binary has an ordinary if statement, that compiles down to a jump if equal instruction (JE). You run your program, and let’s say the if statement evaluates as true 99% of the time.
This patent suggested the CPU would then EDIT the running binary, in memory, to a new “JE-probably-taken” instruction, which upon subsequent execution would just assume true.
That might not sound that wild, until you realize the entire structure relies on the branch predictor itself being self modifying code, which you’d then be able to see / evaluate with a debugger! In other words, you’d have a compiled binary, that would then radically change at runtime where you could see all the hints!
The idea ended up not working; a few years later CPU’s started gaining instruction caches, and the round trip back and forth to rewrite the binary in memory would be much too slow. Weird to think about though, to me it feels like it would have been kinda JVM-y / V8ish but at a much much lower level of abstraction.