An attempt to understand what a Clock cycle is through example

Question 1

Where do I begin...

Any processor has a "clock" that ensures that bits of electronics have time to transition from one state to another before the next thing happens. At the speeds of modern devices, nothing is "instantaneous" - a "step" becomes a "slope", and even a very short trace will cause delays in tranmission of electical signals.

Depending on the architecture of a CPU, it can do certain operations "in one clock cycle", while others take "multiple cycles". Think of long division - you do a series of subtract - shift operations, and you don't know what you need to do next until you have completed the previous part of the operation. For addition, it is easier to see how you could achieve a complete operation in one cycle.

When a particular "high level" instruction is translated into machine code, the resulting code can take one or more cycles - and a simple instruction can take one or more steps. Depending on the compiler, the target, and the optimizations chosen, any of the following could happen in your above code:

the compiler realizes that the "while" condition is always true, and that nothing changes inside the loop. It further realizes that you never use the value of x, and it chooses not to implement the instruction at all
the compiler decides to use a built in register for the int variable x, and it initializes it at compile time. No time taken during the execution of the loop
the compiler loads '5' into a register, looks up the offset of x in a table, computes a pointer, and copies the register into the offset address. Could be any number of cycles.

Not sure this really helped you - but the question is rather complicated...

Question 2

A clock cycle is simply a single cycle of the oscillator that drives a processor's logic, what a processor might be capable of achieving in that cycle depends on the processor architecture and other factors such as memory speed.

The code in your example is in a high level language and almost certainly translates to multiple machine-level instructions if translated directly. In pseudo-machine code for example:

loop:
   MOV addrx,#5
   JMP loop

That would be at least two machine cycles per loop. There is little or no deterministic relationship between high level code and the generated machine instructions; although in this simple case, it may seems so.

The issue is further complicated by how an instruction set is implemented by a processor. A typical RISC processor executes an instruction in a single cycle, while on a CISC processor, different individual instructions each take a different number of cycles depending on their complexity.

Another consideration is memory bus latency. Often a processor is capable of executing instructions faster than it is able to access memory, this is often especially true of flash memory. An instruction accessing slower memory may introduce wait-states, where the processor is stalled until the data arrives.

Some processors have the ability to execute instructions in parallel, allowing multiple instructions in a single cycle. Others employ SIMD (single instruction-multiple data) instructions that can perform the same operation on different data at the same time.

Another technique that affects instruction throughput is pipe-lining, where an instruction may take multiple cycles, but a new instruction can be started on each cycle, so say if 5 four cycle instructions are each started one after the other, a result is yielded once per cycle.

Some processors employ a Harvard architecture that uses separate buses to allow the simultaneous fetching of data and instructions.

Other techniques are employed to maintain instruction throughput such as branch prediction. A high-level language compiler will often generate code that will maximise the potential of all the techniques mentioned above.

Often a performance measure that is given for a particular architecture is MIPS/MHz - an indication of the number of instructions typically executed per clock cycle (amortized over many clock cycles). An ARM Cortex-M3 for example manages 1.25 MIPS/MHz, while a Renesas SH-4 achieves 1.8 MIPS/MHz.

Question 3

Unfortunately it is not that simple as you suggested.

You might want to take a look at the relatet wikipedia arcticle http://en.wikipedia.org/wiki/Sequential_logic

Trying to put it simple a clock cycle is the time it takes for a synchronously operating circuit to switch from one defined state to the next defined state.

As a matter of fact it is often related to the time it takes the CPU to perform instructions. You might want to take a look at the manual of your CPU, execution times are often given as a number of clock cycles it takes to perform the desired instruction.

Unfortunately in reality it's not that simple as somewhat clever CPUs will be able to execute instructions faster if they meet special requirements. E.g. by pushing instructions through an instruction pipeline the cpu might already begin operation on the the next instruction, while part of the previos instruction taking more than one clock cycle is still beeing processed.

It's even possible that instructions might be reordered, if their reordering doesn't change the program flow or instructions get speculativly calculatet ahead of time.

On the other hand it may take a lot longer for one instruction to operate in terms of clock cycles, if the instructions have to be fetched from memory which will take a looooooot more time.