I usually write and optimize code to be run on the CPU, however I’m currently trying to write shaders for light scattering.
I know CPUs have certain optimizations in order to try to get as close to 1 instruction per cycle as possible: instruction pipelining, branch prediction, etc.
However, I don’t know much about how GPU cores handle this. From what I’ve read, they tend not to do branch ‘prediction’ per se, and so I wonder if they pipeline instructions?
Whether they do this might impact how I split up an algorithm, as I can have multiple independent calculations with more net operations, or one sequential, data-dependent calculation with fewer overall operations.