When you have a context that contains multiple devices, any buffers that you create within that context are visible to all of it's devices. This means that any device in the context could read from any buffer in the context, and the OpenCL implementation is in charge of making sure the data is actually moved to the correct devices as and when they need it. There are some grey areas around what should happen if multiple devices try and access the same buffer at the same time, but this kind of behaviour is generally avoided anyway.
Although all of the buffers are visible to all of the devices, this doesn't necessarily mean that they will be allocated on all of the devices. All of the OpenCL implementations that I've worked with use an 'allocate-on-first-use' policy, whereby the buffer is allocated on the device only when it is needed by that device. So in your particular case, you should end up with one buffer per device, as long as each buffer is only used by one device.
In theory an OpenCL implementation might pre-allocate all of the buffers on all the devices just in case they are needed, but I wouldn't expect this to happen in reality (and I've certainly never seen this happen). If you are running on a platform that has a GPU profiler available, you can often use the profiler to see when and where buffer allocations and data movement is actually occurring, to convince yourself that the system isn't doing anything undesirable.