cuBLAS synchronization best practices

Question 1

If you are using a single stream, it doesn't make a difference whether you will synchronize that one stream or you use cudaDeviceSynchronize(). In terms of performance and effect it should be exactly the same. Note that when using events to time part of your code (e.g., a cublas call) it's always good practice to call cudaDeviceSynchronize() to get meaningful measurements. From my experience, it doesn't impose any significant overhead and, besides, it's safer to time your kernels with it.

If your application uses multiple streams, then it makes sense to synchronize only against the stream you want. I believe that this question will be helpful to you. Also, you can read the CUDA C Programming guide, Section 3.2.5.5.

Question 2

It's not clear in your example that you would need to use explicit synchronization at all or why you would need to use it.

CUDA operations issued to the same stream are serialized. If you launch a kernel, or a cublas call, and then follow that kernel or cublas call with a cudaMemcpy operation (or cublasGetVector/Matrix, etc.), the copy operation is guaranteed not to start until all previous CUDA activity issued to the same stream is complete.

The best practice for general cases is not to use explicit synchronization at all. Place activities which must be serially dependent in the same stream. Place activities which have no dependency on each other in separate streams.

There are many cuda codes, using cublas and otherwise, that don't use explicit synchronization at all. Your example has no particular need of it. Note that in the first answer you linked, talonmies said:

you need to call a blocking API routine like a synchronous memory transfer or...

In your example, that is exactly what you would do. You would call a memory transfer, either issued to the same stream (e.g. cudaMemcpyAsync) or default blocking transfer (like cudaMemcpy) and it would work just fine. No need for an explicit sync.

You may wish to read the appropriate programming guide section