Skip to main content
Abdi Moalim

A dirty way to measure GFLOPS

In Writing an AVX2 DGEMM kernel there was a valuable metric when estimating GFLOPS.

double time_taken = ((double)(end - start)) / CLOCKS_PER_SEC;
double flops = 2.0 * m * n * k;
double gflops = flops / (time_taken * 1e9);

What is actually going on here?

The expression 2.0 * m * n * k counts the base operations in matmul, i.e. one multiplication and one addition per element pair.

For each element in the result matrix C(i,j), you'd compute:

This is roughly 2n ops per output element. With m×n output elements, the total is 2×m×n×k ops.

Keep in mind this is an estimation that ignores floating-point nuances, e.g. FMA counts as 2 ops, there is also some non-negligible loop overhead.

It's what CPU/GPU manufacturers use when reporting theoretical peak FLOPS, not by definition the standard industry heuristic for reporting D/SGEMM performance which would be profiling the kernels. There are more details in Intel & NVIDIA's developer forums.