Stop integration with odeint used with thrust

Question 1

As already mentioned, there is no straight-forward solution to this problem.

One possible solution is to provide a

A vector of ints reporting which ODE has already stopped. So, if the i-th element of this vector is zero means that the i-th ODE is still running.
A custom error checker in runge_kutta_dopri5, which sets the error to 0 if the current system is already stopped. Thereby you avoid that this error influences the step size control mechanism, at least it does not affect the step size control negatively. See below for a sketch how this could be implemented
A function checking if the integration is stopped. This can be placed in principle into the observer.

A sketch for the custom error checker could be (it is copied from the default_error_checker):

class custom_error_checker
{
public:

    typedef double value_type;
    typedef thrust_algebra algebra_type;
    typedef thrust_operations operations_type;

    default_error_checker(
            const thrust::device_vector< int > &is_stopped ,
            value_type eps_abs = static_cast< value_type >( 1.0e-6 ) ,
            value_type eps_rel = static_cast< value_type >( 1.0e-6 ) ,
            value_type a_x = static_cast< value_type >( 1 ) ,
            value_type a_dxdt = static_cast< value_type >( 1 ) )
    : m_is_stopped( is_stopped ) ,
      m_eps_abs( eps_abs ) , m_eps_rel( eps_rel ) ,
      m_a_x( a_x ) , m_a_dxdt( a_dxdt )
    { }


    template< class State , class Deriv , class Err , class Time >
    value_type error( const State &x_old ,
                      const Deriv &dxdt_old ,
                      Err &x_err , Time dt ) const
    {
        return error( algebra_type() , x_old , dxdt_old , x_err , dt );
    }

    template< class State , class Deriv , class Err , class Time >
    value_type error( algebra_type &algebra ,
                      const State &x_old ,
                      const Deriv &dxdt_old ,
                      Err &x_err , Time dt ) const
    {
        // this overwrites x_err !
        algebra.for_each3( x_err , x_old , dxdt_old ,
            typename operations_type::template rel_error< value_type >(
                m_eps_abs , m_eps_rel , m_a_x ,
                m_a_dxdt * get_unit_value( dt ) ) );

        thrust::replace_if( x_err.begin() ,
                            x_err.end() ,
                            m_is_stopped.begin() ,
                            thrust::identity< double >()
                            0.0 );

        value_type res = algebra.reduce( x_err ,
                typename operations_type::template maximum< value_type >() ,
                    static_cast< value_type >( 0 ) );
        return res;
    }

private:

    thrust::device_vector< int > m_is_stopped;
    value_type m_eps_abs;
    value_type m_eps_rel;
    value_type m_a_x;
    value_type m_a_dxdt;
};

You can use such a checker in the controlled runge kutta via

typedef runge_kutta_dopri5< double , state_type , double , state_type ,
    thrust_algebra , thrust_operation > stepper;
typedef controlled_runge_kutta< stepper ,
    custom_error_checker > controlled_stepper ;
typedef dense_output_runge_kutta< controlled_stepper > dense_output_stepper;

thrust::device_vector< int > is_stopped;
// initialize an is_finished
dense_output_stepper s(
    controlled_stepper(
        custom_controller( is_stopped , ... )));

Finally you need a function checking which ODE is already stopped. Let's call this function check_finish_status:

void check_finish_status(
    const state_type &x ,
    thrust::device_vector< int > &is_stopped );

You can call this function inside the observer, and you need to pass is_stopped to the observer.

We have also an experimental and dirty branch where the step size control runs directly on the GPU and controls each ODE separately. This is really powerful and highly performant. Unfortunately this functionality can not easily be integrated into the main branch since a lot of the __device__ __host__ specifiers need to be added to odeint's methods. If you like, you can check the cuda_controlled_stepper branch in the odeint repository and/or drop me a message for further instructions.

Question 2

I think this is a quite hard problem to be implemented on a GPU with thrust.

I once did a similar simulation, where I had to integrate many initial conditions of the same system, but each integration stopped after different numbers of steps. Not just slight variations but orders of magnitude, like between 1000 and 10^6 steps. I wrote a parallelization for that using OpenMP that ran on 48 cores. What I did was very simple for OpenMP parallelization: whenever one initial condition is finished the next one starts. This is reasonably efficient as long as the total number of trajectories is much larger than the parallel threads. In principle, you could implement it this way on the GPU as well. As soon as one trajectory finishes, you replace it with a new initial condition. You have to do some book keeping of course, especially if your system is time dependent. But in general this might work, I think.