When is POSIX thread cancellation not immediate?

Question 1

The meaning is just what it says: It's not guaranteed to happen instantly. The reason for this is that a certain "liberty" for implementation details is needed and accounted for in the standard.

For example under Linux/NPTL, cancellation is implemented by sending signal nr. 32. The thread is cancelled when the signal is received, which usually happens at the next kernel-to-user switch, or at the next interrupt, or at the end of the time slice (which may accidentially be immediately, but usually is not). A signal is never received while the thread isn't running, however. So the real catch here is actually that signals are not necessarily received immediately.

If you think about it, it isn't even possible to do it much different, either. Since you can phtread_cleanup_push some handlers which the operating system must execute (it cannot just blast the thread out of existence!), the thread must necessarily run to be cancelled. There is no guarantee that any particular thread (including the one you want to cancel) is running at the exact time you cancel a thread, so there can be no guarantee that it is cancelled immediately.
Except of course, hypothetically, if the OS was implemented in a way as to block the calling thread and schedule the to-be-cancelled thread so it executes its handlers, and only unblocks pthread_cancel afterwards. But since pthread_cancel isn't specified as blocking, this would be an utterly nasty surprise. It would also be somewhat inacceptable because of interfering wtih execution time limits and scheduler fairness.

So, either your cancel type is "disable", then nothing happens. Or, it is "enable", and the cancel type is "deferred", then the thread cancels when calling a function that is listed as cancellation point in pthreads(7).
Or, it is "asynchronous", then as stated above, the OS will do "something" to cancel the thread as soon as it deems appropriate -- not at a precise, well-defined time, but "soon". In the case of Linux, by sending a signal.

Question 2

If you need to wonder when the asynchronous cancellation happen, you are doing something terribly wrong.

Following Standards: You are eating ground below your feet by deliberately creating or allowing code to exist whose correctness depends on assumptions about the platform (single core, particular implementation, whatever). It is almost always better, if possible, to follow the standards (and document clearly when it is not possible). The name PTHREAD_CANCEL_ASYNCHROUNOUS itself suggests the meaning asynchronous, which is different from immediate or even almost immediate. The original poster specifically states single core, but why should you allow code to exist that will break in non-deterministic ways, when your code is put to run in truly parallel machines (multiple cores or CPUs) where it is practically impossible to have guarantee of immediateness (this would require stopping the other cores from running or waiting for context switch or some other terrible hack which your OS/CPU is not going to support to support your unconventional wishes). Asynchronous thread cancellation mode is not meant for guaranteed immediate cancellation of a thread. Hence it is a terribly confusing hack to use them in this way even if it would work.
Async-Safeness: If you are concerned about the mechanism of asynchronous cancellation, it raises the suspicion that the threads in question (because of lack of independence) are maybe not purely computational or written in async-cancel-safe manner.

POSIX specifies only three functions as async-cancel safe: pthread_cancel(3), pthread_setcancelstate(3), and pthread_setcancelmode(3) - see IEEE Std 1003.1, 2013 Edition, 2.9.5. This cancellation mode is only suitable for purely computational tasks that do not call (other than purely computational) library functions; such code would not provide cancellation points if the threads were set to run in the default deferred cancellation mode. Hence the rationale for defining such mode.

It is possible to write async-cancel-safe code by disabling cancellation during critical sections. But library writers (including POSIX library implementors) in general should not care about async-safetyness by reasons of following general convention, avoiding complexity, and even avoiding performance overhead. Because the library writers should not care, you should never expect async-safetyness unless it is explicitly stated otherwise.

If your code is not async-safe (because for example calling other libraries, including POSIX/standard C libraries without temporarily disabling cancellation or changing cancellation mode) and asynchronous cancellation occurs, you might leak resources (memory, etc), leave behind inconsistent states and locked mutexes dead-locking other threads, and summon many other problems currently imaginable and non-imaginable. (If you are writing in C++, it seems you will have other issues to deal with due to POSIX thread cancellation's close association with exception handling.)