By my current understanding each thread gets its own dlist stored in local memory, is this true?
That is correct. Local variables are created per thread. They will be stored either in a register or in a local memory, where the variable ends depends mostly on the compiler.
If that is the case, would there be any way at the end of the kernels execution to grab each of the dlist objects (from another kernel), or should I be using a
__shared__
array of dynamic lists allocated by the first thread?
Local memory is private to the thread (an exception: starting with compute capability 3.0 there are some intrawarp instruction that can facilitate exchange of thread-local variables between the threads within a warp) so you would need to copy the local variable to some global memory variable if you need to get it's value outside the kernel.
__shared__
memory is allocated per threadblock and is only accessible within that threadblock so again you would need to copy the value to a global memory location.
What you probably need is something like a global array of lists that you pass around to your kernels as a parameter.