Thread Local Storage is just that - storage per thread. Each thread has it's own private data structure. That thread, whichever processor it runs on, is the same thread. The OS doesn't schedule work WITHIN threads, it schedules which of the threads runs.
The thread local storage is acomplished by having some sort of indirection, which is changed along with the thread itself. There are several ways to do this, for example, the OS may have a particular page at a particular offset from the start of virtual memory in the process, and when a thread is scheduled, the page-table is updated to match the thread.
In x86 processors, FS or GS is typically used for "per-thread" data, so the OS will switch the FS register [or the content of the base-address of the register in case of 64-bit processors]. When reading the TLS, the compiler will use the FS or GS segment register to prefix the memory read/write operations, and thus you always get "your private data", not some other threads.
Of course, OS's may have bugs, but this is something quite a few things will rely on, so if it's broken, it would show up pretty soon (unless it's very subtle, and you have to be standing just in the right place, with the moon in the right phase, wearing the right colour clothes, and the wind in the right direction, the date divisibly by both 3 and 7, etc,etc).