Question

I wrote some neon code in assembly and was aiming for maximum optimization. Though the numbers seem satisfactory, I was interested in understanding the possibilities of optimizing it further. Then I came across an online tool which helps in counting the cycles of each instruction.

Here goes the link to my code: http://pulsar.webshaker.net/ccc/sample-115d4c29

It clearly marked the areas of my concern, but I could not clearly understand the reason for those statements to contain the overheads.

The code segment is divided into 7 sections in the 'comment' area to make it easier for referring.

Thanks in advance. :)

Was it helpful?

Solution

you can try this link

http://pulsar.webshaker.net/ccc/beta-sample-115d4c29

this use the beta version 0.9 of the cycle counter. The main difference is that NEON simulator do not use 2 distincts pipelines anymore. Due to Cortex A9 that can't execute 2 NEON instructions in one cycle.

I Started to udpate some part of the cycle counter.

The result Is:

-The cycle information are more accurate for Cortex A9.

-The result is easier to read because most of NEON latency information are due to unpaired instructions.

Orange color mean latency due to waiting for pipeline

Red color mean latency due to register conflict.

The number spécified near the register is not the number of loosed cycles. This is the max number of instructions you could place before this instruction.

I hope that help !

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top