Skip to content Skip to sidebar Skip to footer

PyCUDA Precision Of Matrix Multiplication Code

I am trying to learn CUDA and using PyCUDA to write a simple matrix multiplication code. For two 4x4 randomly generated matrices I get the following solution: Cuda: [[ -5170.861816

Solution 1:

Just adding to what Warren has said. I don't think there's anything wrong here. You're comparing the floating point results generated by two different machines (CPU and GPU). They are not guaranteed to be bitwise identical for the operations at the level you are thinking about, partly because the order of operations on the GPU is not necessarily the same as the order of operations on the GPU. As you increase the size of the matrices, you are increasing the number of values summed together, and your absolute error increases, since you are adding a bunch of small bit errors together.

In general, these types of considerations should always come into play when comparing floating point results. Bitwise identical results are rarely to be expected from two different computations. And even something as simple as changing the order of operations can make it a different calculation, for floating point operations. You may want to read this paper especially section 2.2.


Post a Comment for "PyCUDA Precision Of Matrix Multiplication Code"