Question

I have a compute shader and the C# script which goes with it used to modify an array of vertices on the y axis simple enough to be clear.

But despite the fact that it runs fine the shader seems to forget the first vertex of my shape (except when that shape is a closed volume?)

Here is the C# class :

Mesh m;
//public bool stopProcess = false; //Useless in this version of exemple
MeshCollider coll;
public ComputeShader csFile; //the compute shader file added the Unity way
Vector3[] arrayToProcess; //An array of vectors i'll use to store data
ComputeBuffer cbf; //the buffer CPU->GPU (An early version with exactly 
                   //the same result had only this one)
ComputeBuffer cbfOut; //the Buffer GPU->CPU
int vertexLength;

void Awake() { //Assigning my stuff
  coll = gameObject.GetComponent<MeshCollider>();
  m = GetComponent<MeshFilter>().sharedMesh;
  vertexLength = m.vertices.Length;
  arrayToProcess = m.vertices; //setting the first version of the vertex array (copy of mesh)
}

void Start () {

   cbf = new ComputeBuffer(vertexLength,32); //Buffer in
   cbfOut = new ComputeBuffer(vertexLength,32); //Buffer out
   csFile.SetBuffer(0,"Board",cbf); 
   csFile.SetBuffer(0,"BoardOut",cbfOut);

}

void Update () {
   csFile.SetFloat("time",Time.time);
   cbf.SetData(m.vertices);
   csFile.Dispatch(0,vertexLength,vertexLength,1); //Dispatching (i think there is my mistake)
   cbfOut.GetData(arrayToProcess); //getting back my processed vertices
   m.vertices = arrayToProcess; //assigning them to the mesh
   //coll.sharedMesh = m; //collider stuff useless in this demo
}

And my compute shader script :

#pragma kernel CSMain

RWStructuredBuffer<float3> Board : register(s[0]);
RWStructuredBuffer<float3> BoardOut : register(s[1]);

float time;

[numthreads(1,1,1)]
void CSMain (uint3 id : SV_DispatchThreadID)
{
    float valx = (sin((time*4)+Board[id.x].x));
    float valz = (cos((time*2)+Board[id.x].z));
    Board[id.x].y = (valx + valz)/5;
    BoardOut[id.x] = Board[id.x];
}

At the beginning I was reading and writing from the same buffer, but as I had my issue I tried having separate buffers, but with no success. I still have the same problem.

Maybe I misunderstood the way compute shaders are supposed to be used (and I know I could use a vertex shader but I just want to try compute shaders for further improvements.)

To complete what I said, I suppose it is related with the way vertices are indexed in the Mesh.vertices Array.

I tried a LOT of different Blocks/Threads configuration but nothing seems to solve the issue combinations tried :

Block            Thread   
60,60,1         1,1,1
1,1,1           60,60,3
10,10,3         3,1,1

and some others I do not remember. I think the best configuration should be something with a good balance like :

Block : VertexCount,1,1 Thread : 3,1,1

About the closed volume: I'm not sure about that because with a Cube {8 Vertices} everything seems to move accordingly, but with a shape with an odd number of vertices, the first (or last did not checked that yet) seems to not be processed

I tried it with many different shapes but subdivided planes are the most obvious, one corner is always not moving.

EDIT :

After further study i found out that it is simply the compute shader which does not compute the last (not the first i checked) vertices of the mesh, it seems related to the buffer type, i still dont get why RWStructuredBuffer should be an issue or how badly i use it, is it reserved to streams? i cant understand the MSDN doc on this one.

EDIT : After resolution

The C# script :

using UnityEngine;
using System.Collections;


public class TreeObject : MonoBehaviour {
    Mesh m;
    public bool stopProcess = false;
    MeshCollider coll;
    public ComputeShader csFile;
    Vector3[] arrayToProcess;
    ComputeBuffer cbf;
    ComputeBuffer cbfOut;
    int vertexLength;
    // Use this for initialization
    void Awake() {
        coll = gameObject.GetComponent<MeshCollider>();
        m = GetComponent<MeshFilter>().mesh;
        vertexLength = m.vertices.Length+3; //I add 3 because apparently  
                                                        //vertexnumber is odd
        //arrayToProcess = new Vector3[vertexLength];
        arrayToProcess = m.vertices;
    }

    void Start () {

        cbf = new ComputeBuffer(vertexLength,12);
        cbfOut = new ComputeBuffer(vertexLength,12);
        csFile.SetBuffer(0,"Board",cbf);
        csFile.SetBuffer(0,"BoardOut",cbfOut);

    }

    // Update is called once per frame
    void Update () {
        csFile.SetFloat("time",Time.time);
        cbf.SetData(m.vertices);
        csFile.Dispatch(0,vertexLength,1,1);
        cbfOut.GetData(arrayToProcess);
        m.vertices = arrayToProcess;
        coll.sharedMesh = m;
    }

}

I had already rolled back to a Blocks VCount,1,1 Before your answer because it was logic that i was using VCount*VCount so processing the vertices "square-more" times than needed.

To complete, you were absolutely right the Stride was obviously giving issues could you complete your answer with a link to doc about the stride parameter? (from anywhere because Unity docs are VOID and MSDN did not helped me to get why it should be 12 and not 32 (as i thought 32 was the size of a float3)

so Doc needed please

In the mean time i'll try to provide a flexible enough (generic?) version of this to make it stronger, and start adding some nice array processing functions in my shader...

Was it helpful?

Solution

I'm familiar with Compute Shaders but have never touched Unity, but having looked over the documentation for Compute Shaders in Unity a couple of things stand out.

The cbf and cbfOut ComputeBuffers are created with a stride of 32 (bytes?). Both your StructuredBuffers contain float3s which have a stride of 12 bytes, not 32. Where has 32 come from?

When you dispatch your compute shader you're requesting a two-dimensional dispatch (vertexLength,vertexLength, 1) but you're operating on a 1D array of float3s. You will end up with a race condition where many different threads think they're responsible for updating each element of the array. Although awful for performance, if you want a thread group size of [numthreads(1,1,1)] then you should dispatch (vertexLength, 1, 1) numbers of waves/wavefronts when calling Dispatch (ie, Dispatch (60,1,1) with numThreads(1,1,1)).

For best/better performance the number of threads in your thread group / wave should at least be a multiple of 64 for best efficiency on AMD hardware. You then need only dispatch ceil(numVertices/64) wavefronts and then simply insert some logic into the shader to ensure id.x is not out of bounds for any given thread.

EDIT:

The documentation for the ComputeBuffer constructor is here: Unity ComputeBuffer Documentation While it doesn't explicitly say "stride" is in bytes, it's the only reasonable assumption.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top