Question

I'm looking on some information about bare-metal programming. I'm working on different powerpc platforms, and currently trying to prove that some tests are not impacted by the linux kernel. These tests are pretty basic, loads and stores in asm volatile, some benchmarks as well (Coremark, Dhrystone, etc). These tests run perfectly on Linux, but i now have to test them in baremetal, an environement i don't really have experience in. All my platforms have u-boot installed, and i'm wondering if there is such applications that would allow me to run my tests powerpc-eabi cross-compiled ? for example, would a gdbserver launched by u-boot be able to communicate via serial port, or ethernet ? Is it possible to have a busybox called by U-boot ?

Was it helpful?

Solution

Uboot is a bootloader...use it. You probably have an xmodem downloader or ymodem downloader with uboot, if push comes to shove you can turn your program into a long series of write word to memory then branch to that.

uboot will have already setup ram and the serial port, that is how you are talking with uboot anyway, so you don't have to do all of that. You won't need to configure the serial port but you will want to find out how to write a character which means poll the status register for the transmitter register to be empty then write one character to the transmit register. Repeat for every character in your string or whatever to print.

The bootstrap to your C program assuming it is C usually involves at a bare bare minimum setting up the stack pointer (which btw uboot is running so the stack is already setup you can just not do that so long as you load your program such that it doesn't collide with what uboot is doing) and then branch to your C entry point.

Depending on how you have written your high level language program (I am assuming C) then you might have to zero out the .bss area and setup the .data area, the nice thing about using a bootloader to copy a program to ram and just run it is you usually don't have to do any of this, the binary that you download and run already has the bss zeroed and .data in the right place. So it comes back to setup the stack and branch or simply branch since you may not even have to set of the stack.

Building a bare metal program is the real challenge, because you don't have a system to make system calls to, and that is a hard thing to give up and/or simulate. newlib for example makes life a bit easier as it has a very easy to replace system backend so that you can for example leave the printfs in dhrystone (vs removing them and finding a different way to output the strings as needed or output the results.

compiling to object of the C files is easy, assembling the assembly is easy, and you should be able to do that with your powerpc-eabi gcc cross compiler, the next challenge is linking, telling the linker where stuff goes. since this is likely a flat chunk of ram you can probably do something like -Ttext 0x123450000 where the number is whatever the base address is of the ram you want to use. if you have any multiplies or divides or any floats or any other gcc library functions (that replace things that your processor may or may not do or requires a wrapper to do them properly), or any libc calls then it will try to link them in. Ideally the gcc library ones are easy but depending on the cross compiler they can be a challenge, worst case take the gcc sources and build those functions yourself, or get or build a different gcc cross compiler with different target options (Generally an easy thing to do).

I highly recommend you disassemble your binary and make sure if nothing else your entry point of your bootstrap is at the beginning of the binary. use objcopy to make a binary file powerpc-...-objcopy myprog.elf -O binary myprog.bin. then use xmodem or ymodem on the uboot prompt to copy over that program and run it.

backing up. from the datasheets for the part when you look up the uart and figure out the base address you should first use the uboot prompt to write to the address of the uart transmit register write a 0x30 to that address for example and if you have the right address then before it prints the uboot prompt again after your command it should have an extra zero '0' in the output. If you cant get it to do that with a single write from the uboot command line you wont get it to work in a program of any kind you have the wrong address or you are doing something else wrong.

Then write a very small program in assembly language that outputs a character to the uart by writing to that address, then have it count to some big number depending on the speed of your processor. If you are running at 100Mhz then count to 100 million or more (or count down to zero from a few hundred million) then branch to the beginning and repeat, output, wait output, wait. build and link this tiny program and then download with xmodem or whatever and branch to it. If you can't get it to output a character every few seconds then you won't be able to progress to something more complicated.

Next small program, poll the status register, wait for the tx buffer to be empty, then write a 0x30 to the tx register. increment the register holding the 0x30 to 0x31 and that register with 0x37. branch to the wait for tx empty and output the new value 0x31, make this an infinite loop. If once you start running you don't see 01234567012345670... repeated forever without the numbers getting mangled they must be 0-7 and repeat, then you won't be able to progress to something more complicated.

Repeat the last two programs in C with a small bootstrap that branches to the C entry point, if you cant get those working you wont be able to progress any further.

Start small with any library calls you think you can't do without (printf for example) and if you can't make a simple printf("Hello World\n"); work with all the linking and system backend and such, then you won't be able to run Dhrystone and leave in its system calls.

The compiler will likely turn some of the Dhrystone into memcpy or memset calls which you will have to implement, there are hand tuned assembly versions of these most likely and your Dhrystone performance numbers can and will be hugely affected by implementation of functions like these, so you cant simply do this

void memset ( unsigned char *d unsigned char c, unsigned int len)
{
    while(len--) *(d++)=c;
}

and expect any performance. You can likely grab gcc lib or gnu libc versions of these or just steal the ones from the linux build of one of these tests (disassemble and grab the asm), that way you have apples to apples...

Benchmarking is often more bogus than real, it is very easy to take the same benchmark source with the same compiler in the same environment (on linux or on bare metal, etc) and show dramatically different results by doing various simple things, different compiler options, rearranging the functions, adding a few nops in the bootstrap, etc. Anything to either build different code or take advantage of or get hurt by the cahce, etc. If you want to show bare metal being faster than on the operating system, it is likely NOT going to happen without a bit of work. You are going to need to get the i and d caches up the d cache likely requires that you get the mmu up and so on. These can all be research projects. Then you need to know how to control your compiler build, make sure optimizations are on, as mentioned add or remove nops in your bootstrap to change the alignment of tight loops in the code with respect to cache lines. ON an operating system there are interrupts and things going on, possibly you are multitasking so with bare metal you should be able to get dhrystone like tests to run at the same speed or faster than linux, if you cant it is not because linux is faster it is because you are not doing something right in your bare metal implementation.

Yes you can probably use gdb to talk to uboot and load programs, not sure I never use gdb, I prefer to use a dumb terminal and x or y modem or use jtag with the openocd terminal (telnet into openocd rather than gdb in).

OTHER TIPS

You could try compile the Benchmarks together with u-boot. So that after u-boot finishes loading it loads your program. I know that was possible for ARM platforms. I don't whether toolchains exist for powerpc bare metal development

At https://cirosantilli.com/linux-kernel-module-cheat/#dhrystone in this commit I have provided a minimal runnable Dhrystone baremetal example with Newlib on ARM that runs on QEMU and gem5. With this starting point, it should not be hard to port it to PowerPC or other ISAs and real platforms.

In that setup, Newlib implements everything except syscalls themselves as described at: https://electronics.stackexchange.com/questions/223929/c-standard-libraries-on-bare-metal/400077#400077 which makes it much easier to use larger subsets of the C standard library.

And I use newlib through a toolchain built with crosstool-NG.

Some key files in that setup:

  • linker script
  • syscall implementations
  • the full make command showing some of the flags used:

    make \
      -j 8 \
      -C /home/ciro/bak/git/linux-kernel-module-cheat/submodules/dhrystone \
      CC=/home/ciro/bak/git/linux-kernel-module-cheat/out/crosstool-ng/build/default/install/aarch64/bin/aarch64-unknown-elf-gcc \
      'CFLAGS_EXTRA=-nostartfiles -O0' \
      'LDFLAGS_EXTRA=-Wl,--section-start=.text=0x40000000 -T /home/ciro/bak/git/linux-kernel-module-cheat/baremetal/link.ld' \
      'EXTRA_OBJS=/home/ciro/bak/git/linux-kernel-module-cheat/out/baremetal/aarch64/qemu/virt/lib/bootloader.o /home/ciro/bak/git/linux-kernel-module-cheat/out/baremetal/aarch64/qemu/virt/lib/lkmc.o /home/ciro/bak/git/linux-kernel-module-cheat/out/baremetal/aarch64/qemu/virt/lib/syscalls_asm.o /home/ciro/bak/git/linux-kernel-module-cheat/out/baremetal/aarch64/qemu/virt/lib/syscalls.o' \
      OUT_DIR=/home/ciro/bak/git/linux-kernel-module-cheat/out/baremetal/aarch64/qemu/virt/submodules/dhrystone \
      -B \
    ;
    

Related: How to compile dhrystone benchmark for RV32I

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top