How can CAS simulators like PTLsim achieve cycle accurate simulation of x86 hardware?

Question 1

First Question: How do they achieve cycle accuracy if there is neither information how many cycles are used per instruction nor CPU branch prediction logic is known?

The simulator does provide a cycle accurate simulation for a sufficiently accurate CPU model but does not come with out-of-the box models for Intel's or AMD's current offerings. Someone at Intel or AMD with access to the required information could create a RTL level model and get cycle accurate simulations for current processors. People outside Intel and AMD cannot. You can still feed publically known information to the simulator and get reasonable results. These results will not be identical to the real hardware.

If you are a software developer and want to benchmark real hardware, use real hardware! Simulators like PLTsim are designed for (academic) hardware developers who want to test new hardware features without spending hundreds of thousands of dollars on a new chip.

Second Question: Is it theoretically possible to implement hard rtos on x86 based hardware?

Of course it is theoretically possible. You would need to consider the absolute worst case for each code segment for all inputs under all circumstances. The practical problem is that processors like Core 2 are very complex and the state of the processor is enormous. Additionally these processors are not designed to behave deterministically with respect to timing. A really hard RTOS would have to be extremely conservative. Finally, as you correctly observe, people outside Intel and AMD don't have access to all the information required to make those conservative assumptions. In practice it is resonable to pass on the latest and greatest cpus and instead use older, simpler cpus that have a deterministic timing.

On the other hand, if the RTOS does not have to be really hard real time, you can always just include some safety margin and hope for the best. ;-)

Question 2

This is not an answer to both questions; I'm only going to answer the second one. Feel free to upvote, although Mackie's answer seems better in general.

Hard RTOS is hard to implement on x86. One special thing that can kill every promise made by RTOS is SMM, or System Management Mode. CPU enters it after System Management Interrupt, which can fire for different reasons - hardware failure, write at some special MMIO location, out instruction to some special port. You cannot disable it, you cannot really predict when SMI happens and SMI handlers can take very long time to finish.

Essentially, you know exactly nothing about when CPU is in SMM, until something fails in your OS thanks to long time CPU has spent handling SMI. In some special cases, it can become a problem even for not-realtime OSes, not to mention hard RTOSes.

There is also this thread than can provide you with some more points about RTOSing on x86.

Question 3

On the site you linked there are statements such as;

PTLsim is a state of the art cycle accurate microprocessor simulator and virtual machine for the x86 and x86-64 instruction sets.

and

It runs directly on the same platform it is simulating (an x86-64 or x86 machine running Linux)

It is not clear to me then how this differs from any other x86 Virtual machine technology such as QEMU, VirtualBox, VMWare or Virtual PC, which would be cycle accurate by virtue of actually directly running instructions on the hardware (as well as running at core speeds). Is it a simulator or a VM? In my mind they are not the same thing; bochs for example is a simulator rather than a VM, PTLsim appears to be somewhere in-between perhaps?