I am pretty sure that there are no guarantees about the maximum time between one CPU writing to a memory location and another seeing it. In a NUMA system the coherency protocol will take a long time. In practice it will be as fast as possible, but I doubt there are any guarantees.
Why do you need to know this though? When you're writing synchronization primitives you only need to think about ordering. x86_64 enforces strong consistency which means that stores will be seen by other CPUs in the order they happened and that's really the only thing you need to worry about.