ARMv4, ARMv5E, ARMv6 assembly usage on iOS devices with ARMv7 and ARMv8-A (arm64) instructions sets

StackOverflow https://stackoverflow.com/questions/23564612

  •  18-07-2023
  •  | 
  •  

Question

There is a plenty of libraries written in C with assembly (for ARMv7) optimized versions of some functions which boost performance significantly (some of them making use of NEON). In this case I know that I should better use optimized versions of them.

Now I have a library written in C which has some functions written both in C and ARMv4, ARMv5E assembly. With default compile configuration it doesn't try to use this assembly code on iOS devices. I wonder if I should bother trying to enable it.

Is it possible to use assembly source code for ARMv4, ARMv5E, ARMv6 instructions sets on iOS devices with ARMv7 and ARMv8-A (arm64) instructions sets?

If yes, does it give performance boost comparing to the similar code written in C and compiled for ARMv7 and ARMv8-A (arm64)?

And another question: does one need ARMv8-A (arm64) optimized NEON assembly code? How does this compatibility work in general? Here I mean only AArch64 and not AArch32. For this question lets pretend that I have to build binary for AArch64 only which should be truly 64 bit and doesn't contain any 32 bit code.

I would appreciate if someone could answer with a compatibility table or a link to one.

EDIT: I have slightly edited my question as suggested by Notlikethat.

EDIT2: I wanted to give some details after Notlikethat answer. Maybe it will be useful for someone who reads this question.

  1. Now I have a library written in C which has some functions written both in C and ARMv4, ARMv5E assembly. [...] I wonder if I should bother trying to enable it. - The functions written in assembly are purely for performance and don't do anything that can't be done in C.

  2. Does one need ARMv8-A (arm64) optimized NEON assembly code? - If one has NEON optimized code for ARMv7 does he/she need to adopt/change it for ARMv8-A?

Was it helpful?

Solution

OK, I'll bite. This isn't an answer so much as a random selection of details and opinions to illustrate why the question is still fundamentally unanswerable, but which may tangentially contain some useful information around the subject. And entertainingly excessive use of emphasis.

Now I have a library written in C which has some functions written both in C and ARMv4, ARMv5E assembly. [...] I wonder if I should bother trying to enable it.

"I have a plant here, I wonder if I should bother trying to eat it." Rather depends on whether it's a lettuce or a holly bush, doesn't it? I'm guessing the assembly was there as a pure performance thing, rather than to implement something that simply can't be expressed in a higher-level language. Whatever it does, does it make your program measurably faster/better if you do enable it? Hand-tuned assembly for a v4-era core isn't likely to be particularly optimal for a modern 15+ stage superscalar out-of-order pipeline anyway, so it's not unreasonable that the compiler might do a better job with access to newer instructions and suitable optimisation settings - It knows more about instruction scheduling and cycle timings than you or I do. On the other hand, maybe it is something awkward that the optimiser can't catch, but can be done efficiently with a handful of the more esoteric instructions. The only real way to make a judgement like that is to try it and see.

Is it possible to use assembly source code for ARMv4, ARMv5E, ARMv6 instructions sets on iOS devices with ARMv7 [...] instructions sets?

In most cases. Except if you use deprecated instructions like SWP, which may or may not fault depending on how the device is set up. Or depend on the pre-v6 unaligned access behaviour. Or any implementation-defined features that just happened to be consistent across previous devices. Or any of the other features of the architecture which have subtly changed over the years. The v6 architecture was the most significant shift, but helpfully, Appendices L and O of the v7 ARM ARM consist of 102 pages detailing the changes all the way back to v4. Without knowing your code in detail, how can we say what, if any, of that is relevant?

Is it possible to use assembly source code for ARMv4, ARMv5E, ARMv6 instructions sets on iOS devices with [...] ARMv8-A (arm64) instructions sets?

No. AArch64 is a completely new architecture, new instruction set, new assembly language. Many concepts, mnemonics and the general feel of the syntax are familiar from what is now AArch32, but the instruction set is a fundamentally different design. For starters the register names are different - the sort of thing that reading any kind of manual would have told you straight away.

If yes, does it give performance boost comparing to the similar code written in C and compiled for ARMv7 and ARMv8-A (arm64)?

Are we talking the careful selection of the algorithm that best suits the architecture, tuned for a particular microarchitecture implementation by an expert with detailed knowledge of the pipeline model, cycle timings, etc., or the kind of naïve "assembly is faster, innit?" code which ends up being 4 times slower than what the compiler spits out on -O1? (Nothing ARM-specific about this one, either) In any case, see question 1.

does one need ARMv8-A (arm64) optimized NEON assembly code?

You don't need it, you could always just have slow code. Of course, if you're doing SIMD-type operations it'd be a bit silly not to use NEON but you don't necessarily need to go straight to assembly - if you're doing straightforward loop-based stuff an auto-vectorising compiler may take care of it. For more complex things, the figures I've seen (from someone who definitely knows what they're doing) suggest intrinsics can get you about 70-95% of the speed of hand-tuned assembly, for a lot less effort. For absolute maximum performance then yeah, fire up the assembler and spend weeks microbenchmarking your cache misses and register stalls as you tweak it to perfection.

How does this compatibility work in general?

What compatibility? The one that doesn't exist on account of it being a different instruction set?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top