why the loop order influence the efficiency of code in GEMM?
I’m writing a GEMM code in C++, but the loop order actually influences a lot to the efficiency. Code:
I’m writing a GEMM code in C++, but the loop order actually influences a lot to the efficiency. Code: