A dirty way to measure GFLOPS

06 August 2025
graphics

In Writing an AVX2 DGEMM kernel there was a valuable metric when estimating GFLOPS.

double time_taken = ((double)(end - start)) / CLOCKS_PER_SEC;
double flops = 2.0 * m * n * k;
double gflops = flops / (time_taken * 1e9);

What is actually going on here?

The expression 2.0 * m * n * k counts the base operations in matmul, i.e. one multiplication and one addition per element pair.

For each element in the result matrix $C (i, j)$ , you'd compute:

Multiply $A (i, k) \times B (k, j)$ for each $k$ ( $n$ multiplications)
Add the products together ( $n - 1$ additions ≈ $n$ additions for large $n$ )

This is roughly $2 n$ ops per output element. With $m \times n$ output elements, the total is $2 \times m \times n \times k$ ops.

Keep in mind this is an estimation that ignores floating-point nuances, e.g. FMA counts as 2 ops, there is also some non-negligible loop overhead.

It's what CPU/GPU manufacturers use when reporting theoretical peak FLOPS, not by definition the standard industry heuristic for reporting D/SGEMM performance which would be profiling the kernels. There are more details in Intel & NVIDIA's developer forums.

← Previous
Writing an AVX2 DGEMM kernel
Next →
Linear algebra reference