What redistributable is required from end-users to run OpenCL applications?

Question 1

They need GPU drivers. For Intel CPU, they may manually download the necessary binaries.

AMD device compiler's compiling action takes some time while Nvidia's can compile quickly. Compiling time is very low when you target CPU. I converted a basic C++ fluid&raytracer simulation into opencl version and it compiled after 3 minutes!(I mean device opencl-c compiling of kernels) If you want to give people an already-compiled project, then you would need to have every single type of card on your access and compile&save binaries for all of them.

Some gl-cl-dx sharing operations can be incompatible between vendors.

Dont use platform-specific constants, they may not be mapped fully on other platforms.

Tell people your targeted opencl version.

Dont use larger than 256 local work group size for GPU computing. AMD GPUs' maximum local work group size is 256 while Nvidia's is 1024.

Dont spill private registers, decrease depth of pseudo-recursive functions if you need it badly. Sometimes AMD compiler tries to optimize so much that it explodes at native device compile time.

Use a "platform & device query wrapper" of your own that finds a proper gpu, dont just get platform[0] or device[0]. Users may have multiple platforms such as Intel's for CPU and AMD's for GPU, maybe all of them. APUs' included GPUs may be known as ACC instead of GPU(Im not sure about this)

Your implicit synchronization of kernels&buffer_transfers can successfully run on your system while not on other systems.

Check if your dlls or app is same bitness with other peoples' machine&OS. If you target 64 bit and they have 32bit OS then it will not work.

Question 2

The recent Catalyst Drivers from AMD should already provide OpenCL support. Of course, when someone has an old card without OpenCL support and/or not installed recent drivers, it might fall back to CPU-OpenCL or it might not work at all, repspectively. I'm not sure which assumptions you can make (regarding the system requirements that you state for your program), but at least there should be no need for own, dedicated "redistributables" when the target system has up-to-date drivers.

Question 3

You should do this:

Use dynamic "OpenCL.dll/so" loading, this way your user do not need to have OpenCL at all. (Optional, but very helpful, you can even fall back to CPU mode. In addition, it will force you to use only pure OpenCL calls)
Use only common OpenCL methods, do never use company tuned methods that depend on additional DLLs. nVIDIA for example has a lot of tools inside OpenCL library that forces you to use it with nVIDIA libraries and drivers. These are for example calls that start with oclXXX().
Write the kernels and the code in a generic way, do not expect everyone to have the same work-sizes, memory, etc.. You should be able to detect situations, and adapt your kernel to these facts. Use in place compiling of the kernels, and add #defines that control your behavior inside the kernel. Then you can define them from the outside by setting the hardware detected features. (only if you don't really care about your code security.

Answering your questions:

You need to redistribute if needed the DLL from the vendor that you use (ie: nVIDIA) the client has to have the proper OpenCL enabled, even if it is from another manufacturer it should work.
The best practices are to use a clean (non-vendor related) OpenCL code, so you don't force the client to install any specific library. And you don't need to install it or distribute it with your app.