One important bit to understand is that memory types have no guaranted effect on the instruction stream as a whole - they affect only the ordering of memory accesses. (They may have a specific effect on a specific processor integrated in a specific way with a specific interconnect - but that can never be relied on by software.)
Another important thing to understand is that even Strongly-ordered memory provides implicit guarantees of ordering only with regards to accesses to the same peripheral. Any ordering requirements more strict than that require use of explicit barrier instructions.
A third important point is that any implicit memory access ordering that takes place due to memory types does not affect the ordering of accesses to other memory types. Again, if your application has dependencies like this, explicit barrier instructions are required.
Now, against that background - a simpler way of describing the difference between Device and Strongly-ordered memory is that Device memory accesses can be buffered - in the processor itself or in the interconnect. The difference being that a buffered access can be signalled as complete to the processor before it has completed (or even initiated) at the end point. This provides better performance at the cost of losing the synchronous reporting of any error condition.