Question

UPDATE: Microsoft have yet to fix it in Windows 8.1.

EDIT: This turned out to be a bug in WOW64 - GetThreadContext() may return stale contents when the thread is suspended in long mode ring-3 (user mode). I've suggested to Microsoft to use ring-2 to perform the translation. SuspendThread would then only suspend thread in ring-3 (as it does now - no changes necessary), and a crash/fault/exploit in ring-2 won't affect the kernel - it'd only affect ring-2 and ring-3.

Such changes would necessitate the change of a few WinAPI functions such as Wow64Get/SetThreadContext etc. This would break apps relying on undocumented features but that's to be expected. Granted, translation would be slower as it takes a few CPU cycles to transition from ring-3 to ring-2 (depending on the CPU family) but I'd think that the role of the OS is first and foremost to ensure correct operation. Translation already adds overhead to apps running under WOW64, so that's to be expected too.

I do hope that Microsoft would fix this issue - otherwise debuggers / Mono apps / Boehm GC / apps that rely on GetThreadContext() under WOW64 would not work (for starters, I've seen debuggers show stale stack trace).

EDIT2: Bad news. From my conversation with Alexey from MSFT (here) it looks as though it may not get fixed at all in fear that the fix would break apps that rely on undocumented features.


Original question

  • Some people seem to be confused about the following. I initially thought it was due to SuspendThread suspending a thread while in kernel-mode code. It wasn't. The following was merely my initial suspicion which turned out to have nothing to do with the actual root cause -- which was the stale contents returned by GetThreadContext().

From MSDN:

Suspending a thread causes the thread to stop executing user-mode (application) code.

What I've found however is that my 32-bit app in Windows 7 running under WOW64, Thread A calling SuspendThread on Thread B can pause it while it's running 64-bit code (which I would expect is not user-mode code). EIP shows the suspended thread stopped at

wow64cpu!X86SwitchTo64BitMode:
00000000`759c31b0 ea27369c753300  jmp     0033:759C3627

with its ESP having changed (I know this because, while the ESP is pointing to the same page as that thread's stack, it's got a much higher address than the current stack pointer). If I put a breakpoint at the instruction which the above returns to, and then get the thread to resume, I found that the ESP changes back to the value before the X86SwitchTo64BitMode call (which is the correct stack pointer). I also found that when single stepping into the same function, I can never get that higher address ESP value at any point of the single step. In fact, when single stepping, ESP value never changes before and after the X86SwitchTo64BitMode call.

Also, I did make sure SuspendThread succeed by checking against (DWORD)-1.

All of these leads me to believe that the thread is suspended in kernel-mode code.

What could be causing the OS to suspend a thread while it's running non-user-mode code? How do I prevent that? This is basically preventing me from getting the actual current stack pointer of Thread B. Note that when the app runs outside of WOW64 (on native x86 OS), no such problem exists.

Was it helpful?

Solution

I've confirmed that this is an OS issue returning stale contents when GetThreadContext is called under WOW64.

More info here.

Thanks to everyone who attempted to answer this question. I'm working with MS to resolve this.

OTHER TIPS

See this explanation : GetThreadContext in Wow64

This article explains, that the transition between x86 and amd64 modes is done in user-mode.

What does your thread do in user-mode? It seems like it's already in kernel-mode when you call SuspendThread. Is it possible that it's executing a system function in the moment you suspend it?

What could be causing the OS to suspend a thread while it's running non-user-mode code?

Many system or library calls may result in switch to the kernel-mode. And because the Windows Kernel is designed to be reentrant in most cases, switching from one thread to another while the first one is in kernel-mode is pretty normal.

How do I prevent that?

Just an idea: Create a thread that is just executing an empty loop (e.g. for(;;);) and suspend that thread. This one should not be suspended in kernel-mode.


Also, why is it important to you that the ESP registers etc. are correct? I hope you are writing some kind of debugger or something related, because that's what SuspendThread is for.

Technically, when a thread isn't running at all, it's running neither kernel-mode code nor user-mode code. So your observations do not contradict the statement.

Beisdes, you shouldn't be messing with this. It would be an OS bug if you (in user mode) could control whether kernel mode code was executed.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top