A sequence for initializing the VFPU can be found in u-boot source.
.macro init_vfpu
ldr r0, =(0xF << 20)
mcr p15, 0, r0, c1, c0, 2
mov r3, #0x40000000
.long 0xeee83a10
/* vmsr FPEXC, r3 */
.endm /* init_vfpu */
As documented in the binutils mailing list, the vmsr FPEXC bug has been fixed in the binutils 2.23 branch as well as the HEAD and the 2.24 development branch which will be released shortly. Fixes exist in the 2.23.1 and 2.23.2 releases of binutils.
Here is a sample session,
$ cat t.S
init_vpu:
ldr r0, =(0xF << 20)
mcr p15, 0, r0, c1, c0, 2
mov r3, #0x40000000
vmsr FPEXC, r3
bx lr
.ltorg
$ arm-none-linux-gnueabi-as -march=armv7-a -mcpu=cortex-a15 -mfpu=neon t.S -o t.o
$ arm-none-linux-gnueabi-as --version | grep assembler
GNU assembler (crosstool-NG hg+default-86a8d1d467c8) 2.23.1
This assembler was configured for a target of `arm-none-linux-gnueabi'.
$ objdump --version | grep Binutils
GNU objdump (GNU Binutils for Ubuntu) 2.23.2
$ objdump -S t.o
t.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <init_vpu>:
0: e3a0060f mov r0, #15728640 ; 0xf00000
4: ee010f50 mcr 15, 0, r0, cr1, cr0, {2}
8: e3a03101 mov r3, #1073741824 ; 0x40000000
c: eee83a10 vmsr fpexc, r3
10: e12fff1e bx lr
The above sequence should work for all of the Cortex-A series. The sequence is for a system without virtualization or TrustZone active.