A dirty way to measure GFLOPS
In Writing an AVX2 DGEMM kernel there was a valuable metric when estimating GFLOPS.
double time_taken = ((double)(end - start)) / CLOCKS_PER_SEC;
double flops = 2.0 * m * n * k;
double gflops = flops / (time_taken * 1e9);
What is actually going on here?
The expression 2.0 * m * n * k counts the base operations in matmul, i.e. one multiplication and one addition per element pair.
For each element in the result matrix
- Multiply
for each ( multiplications) - Add the products together (
additions ≈ additions for large )
This is roughly
Keep in mind this is an estimation that ignores floating-point nuances, e.g. FMA counts as 2 ops, there is also some non-negligible loop overhead.
It's what CPU/GPU manufacturers use when reporting theoretical peak FLOPS, not by definition the standard industry heuristic for reporting D/SGEMM performance which would be profiling the kernels. There are more details in Intel & NVIDIA's developer forums.
- ← Previous
Writing an AVX2 DGEMM kernel - Next →
Linear algebra reference