Question about cycle counting accuracy when emulating a CPU

https://stackoverflow.com/questions/7122979

06-01-2021
|

Question

I am planning on creating a Sega Master System emulator over the next few months, as a hobby project in Java (I know it isn't the best language for this but I find it very comfortable to work in, and as a frequent user of both Windows and Linux I thought a cross-platform application would be great). My question regards cycle counting;

I've looked over the source code for another Z80 emulator, and for other emulators as well, and in particular the execute loop intrigues me - when it is called, an int is passed as an argument (let's say 1000 as an example). Now I get that each opcode takes a different number of cycles to execute, and that as these are executed, the number of cycles is decremented from the overall figure. Once the number of cycles remaining is <= 0, the execute loop finishes.

My question is that many of these emulators don't take account of the fact that the last instruction to be executed can push the number of cycles to a negative value - meaning that between execution loops, one may end up with say, 1002 cycles being executed instead of 1000. Is this significant? Some emulators account for this by compensating on the next execute loop and some don't - which approach is best? Allow me to illustrate my question as I'm not particularly good at putting myself across:

public void execute(int numOfCycles) 
{ //this is an execution loop method, called with 1000.
   while (numOfCycles > 0)
   {
      instruction = readInstruction();
      switch (instruction)
      {
         case 0x40: dowhatever, then decrement numOfCycles by 5;
         break; 
         //lets say for arguments sake this case is executed when numOfCycles is 3.
      }
}

After the end of this particular looping example, numOfCycles would be at -2. This will only ever be a small inaccuracy but does it matter overall in peoples experience? I'd appreciate anyone's insight on this one. I plan to interrupt the CPU after every frame as this seems appropriate, so 1000 cycles is low I know, this is just an example though.

Many thanks, Phil

Solution

Was a quite interesting article on Arstechnica talking about console simulation recently, also links to quite a few simulators that might make for quite good research:

Accuracy takes power: one man's 3GHz quest to build a perfect SNES emulator

The relevant bit is that the author mentions, and I am inclined to agree, that most games will appear to function pretty correctly even with timing deviations of +/-20%. The issue you mention looks likely to never really introduce more than a fraction of a percent timing error, which is probably imperceptible whilst playing the final game. The authors probably didn't consider it worth dealing with.

OTHER TIPS

most emulators/simulators dealing just with CPU Clock tics

That is fine for games etc ... So you got some timer or what ever and run the simulation of CPU until CPU simulate the duration of the timer. Then it sleeps until next timer interval occurs. This is very easy to simulate. you can decrease the timing error by the approach you are asking about. But as said here for games is this usually unnecessary.

This approach has one significant drawback and that is your code works just a fraction of a real time. If the timer interval (timing granularity) is big enough this can be noticeable even in games. For example you hit a Keyboard Key in time when emulation Sleeps then it is not detected. (keys sometimes dont work). You can remedy this by using smaller timing granularity but that is on some platforms very hard. In that case the timing error can be more "visible" in software generated Sound (at least for those people that can hear it and are not deaf-ish to such things like me).
if you need something more sophisticated

For example if you want to connect real HW to your emulation/simulation then you need to emulate/simulate BUS'es. Also things like floating bus or contention of system is very hard to add to approach #1 (it is doable but with big pain).

If you port the timings and emulation to Machine cycles things got much much easier and suddenly things like contention or HW interrupts, floating BUS'es are solving themselves almost on their own. I ported my ZXSpectrum Z80 emulator to this kind of timing and see the light. Many things get obvious (like errors in Z80 opcode documentation, timings etc). Also the contention got very simple from there (just few lines of code instead of horrible decoding tables almost per instruction type entry). The HW emulation got also pretty easy I added things like FDC controlers AY chips emulations to the Z80 in this way (no hacks it really runs on their original code ... even Floppy formating :)) so no more TAPE Loading hacks and not working for custom loaders like TURBO

To make this work I created my emulation/simulation of Z80 in a way that it uses something like microcode for each instruction. As I very often corrected errors in Z80 instruction set (as there is no single 100% correct doc out there I know of even if some of them claim that they are bug free and complete) I come with a way how to deal with it without painfully reprogramming the emulator.

Each instruction is represented by an entry in a table, with info about timing, operands, functionality... Whole instruction set is a table of all theses entries for all instructions. Then I form a MySQL database for my instruction set. and form similar tables to each instruction set I found. Then painfully compared all of them selecting/repairing what is wrong and what is correct. The result is exported to single text file which is loaded at emulation startup. It sound horrible but In reality it simplifies things a lot even speedup the emulation as the instruction decoding is now just accessing pointers. The instruction set data file example can be found here What's the proper implementation for hardware emulation

Few years back I also published paper on this (sadly institution that holds that conference does not exist anymore so servers are down for good on those old papers luckily I still got a copy) So here image from it that describes the problematics:

a) Full throtlle has no synchronization just raw speed
b) #1 has big gaps causing HW synchronization problems
c) #2 needs to sleep a lot with very small granularity (can be problematic and slow things down) But the instructions are executed very near their real time ...
Red line is the host CPU processing speed (obviously what is above it take a bit more time so it should be cut and inserted before next instruction but it would be hard to draw properly)
Magenta line is the Emulated/Simulated CPU processing speed
alternating green/blue colors represent next instruction
both axises are time

[edit1] more precise image

The one above was hand painted... This one is generated by VCL/C++ program:

generated by these parameters:

const int iset[]={4,6,7,8,10,15,21,23}; // possible timings [T]
const int n=128,m=sizeof(iset)/sizeof(iset[0]); // number of instructions to emulate, size of iset[]
const int Tps_host=25;  // max possible simulation speed [T/s]
const int Tps_want=10;  // wanted simulation speed [T/s]
const int T_timer=500;  // simulation timer period [T]

so host can simulate at 250% of wanted speed and simulation granularity is 500T. Instructions where generated pseudo-randomly...

I guess that depends on how accurate you want your emulator to be. I do not think that it has to be that accurate. Think emulation of x86 platform, there are so many variants of processors and each has different execution latencies and issue rates.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow