Question

I'm working on a genetic program in which I am porting some of the heavy lifting into CUDA. (Previously just OpenMP).

It's not running very fast, and I'm getting an error related to the recursion:

Stack size for entry function '_Z9KScoreOnePdPiS_S_P9CPPGPNode' cannot be statically determined

I've added a lump of the logic which runs on CUDA. I believe its enough to show how its working. I'd be happy to hear about other optimizations I could add, but I would really like to take the recursion if it will speed things up.

Examples on how this could be achieved are very welcome.

__device__ double Fadd(double a, double b)   {
    return a + b;
};


__device__ double Fsubtract(double a, double b)   {
        return a - b;
};

__device__ double action (int fNo, double aa , double bb, double cc, double dd) {
    switch (fNo) {
    case 0 :
        return Fadd(aa,bb);
    case 1 :
        return Fsubtract(aa,bb);
    case 2 :
        return Fmultiply(aa,bb);
    case 3 :
        return Fdivide(aa,bb);
    default:
        return 0.0;
    }

}

__device__ double solve(int node,CPPGPNode * dev_m_Items,double * var_set) {
    if (dev_m_Items[node].is_terminal) {
        return var_set[dev_m_Items[node].tNo];
    } else {
        double values[4];
        for (unsigned int x = 0; x < 4; x++ ) {
            if (x < dev_m_Items[node].fInputs) {
                values[x] = solve(dev_m_Items[node].children[x],dev_m_Items,var_set);
            } else {
                values[x] = 0.0;
            }
        }
        return action(dev_m_Items[node].fNo,values[0],values[1],values[2],values[3]);
    }
}

__global__ void KScoreOne(double *scores,int * root_nodes,double * targets,double * cases,CPPGPNode * dev_m_Items) {
    int pid = blockIdx.x;

    // We only work if this node needs to be calculated
    if (root_nodes[pid] != -1) {
        for (unsigned int case_no = 0; case_no < FITNESS_CASES; case_no ++) {
            double result = solve(root_nodes[pid],dev_m_Items,&cases[case_no]);
            double target = targets[case_no];
            scores[pid] += abs(result - target);
        }
    }
}

I'm having trouble making any stack examples work for a large tree structure, which is what this solves.

Was it helpful?

Solution

I've solved this issue now. It was not quite a case of placing the recursive arguments into a stack but it was a very similar system.

As part of the creation of the node tree, I append each node each to into a vector. I now solve the problem in reverse using http://en.wikipedia.org/wiki/Reverse_Polish_notation, which fits very nicely as each node contains either a value or a function to perform.

It's also ~20% faster than the recursive version, so I'm pleased!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top