Skip to content

Multiplying two matrices - not working in 3D range #173

@ZahariStoyanov

Description

@ZahariStoyanov

Currently working on a Java library for matrix operations and ML. I use Aparapi to utilize the GPU.
I've written this code to multiply two matrices:

   public static NDmatrix matMul(float[][] a, float[][] b) {
        int[] aDim = new int[]{a.length, a[0].length};
        int[] bDim = new int[]{b.length, b[0].length};
        if(aDim[1] == bDim[0]){
            int[] Dim = new int[]{aDim[0], bDim[1]};
            int aVSize = aDim[0] * aDim[1];
            float[] aVector = new float[aVSize];
            for(int i = 0; i < aDim[0]; i++)
                System.arraycopy(a[i], 0, aVector, i * aDim[1], aDim[1]);
            int bVSize = bDim[0] * bDim[1];
            float[] bVector = new float[bVSize];
            for(int i = 0; i < bDim[0]; i++)
                System.arraycopy(b[i], 0, bVector, i * bDim[1], bDim[1]);
            int resVSize = Dim[0] * Dim[1];
            float[] resVector = new float[resVSize];
            int d[] = new int[]{aDim[1]};
            Kernel mKernel = new Kernel() {
                final int ht = Dim[0];
                final int wt = Dim[1];
                final int dpt = d[0];
                public void run() {
                    int c = getGlobalId(0);
                    int r = getGlobalId(1);
                    int l = getGlobalId(2); // used with 3D range
                    localBarrier(); // used with 3D range
                    //for(int l = 0; l < dpt; l++) // used with 2D range
                    resVector[r * wt + c] += aVector[r * dpt + l] * bVector[l * wt + c]; // in 3D range, this only works for l = 0
                }
            };
            mKernel.setExplicit(true);
            mKernel.put(aVector);
            mKernel.put(bVector);
            mKernel.put(resVector);
            mKernel.put(Dim);
            mKernel.put(d);
            mKernel.execute(Range.create3D(Dim[1], Dim[0], d[0]));
            //mKernel.execute(Range.create2D(Dim[1], Dim[0]));
            mKernel.get(resVector);
            mKernel.dispose();
            return new NDmatrix(Dim, resVector, null);
        }
        System.out.println("The number of columns in left matrix and number of rows in right matrix do not match.");
        System.out.println();
        return null;
    }

However, it looks like resVector[some_index] only gets updated once(r + aVector[0 * dpt + 0] * bVector[0 * wt + c]). If, instead, I use a 2D range and a loop(the commented bits in the code), it works correctly. What could be the reason for such behavior and how can I force it to work fully parallel?

Interestingly, I tried one thing that "worked" - after updating resVector[some_index], I called this.put(resVector). However, it then couldn't compile in OpenCL and ended up using Java's multi-threading instead, eventually resulting in a correct result.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions