Question

I am trying to debug somewhat strange problem in the device driver for the PCIe FPGA device. Both the device driver and the FPGA image are developed in the house.

The target system is x86, and the OS is fedora 9. It has a PCIe card with the FPGA plugged in it's only PCIe slot. The FPGA image is loaded after the boot from the EEPROM.

The driver is written in such a way that it uses the /sys/bus/pci/devices/0000:02:00.0/ resource files (where 0000:02:00.0 is the PCI slot of the card containing the FPGA) to configure the FPGA.

When the system boots (or when it returns from the hibernation), the FPGA link seams to be lost, and the resource files are missing. When the FPGA boots properly, everything works fine (the resource files are there). When the system enters the hibernation, the FPGA is powered off. When it returns from the hibernation, the FPGA is powered on, before starting the driver initialization.

I am suspecting at next things :

  • a bug in firmware - something related to PCI plug in?
  • a bug in kernel - least likely, because other PCI cards are recognized fine. Only
    this PCI card makes problems

And the questions are :

  • Has anyone had similar problems?
  • What else could be wrong?
  • Any suggestions on how to debug this issue?

EDIT

I just found this bug, which is very similar to the problem I am seeing.

Was it helpful?

Solution

A PCIe card has to reply to a "Is anybody there" message within a certain time. Is is possible that your card is not responding quickly enough after hibernation / reset?

Without more details of your design, it is hard to do anything but guess.

Can you list the differences between the system working and not working, i.e. what do you do differently to get the card to work?

OTHER TIPS

I finally managed to debug my problem. Just before entering the hibernation, all processes which are still using the resource files are being killed. For some unknown reason, one process didn't release resources, and was killed. We have a watchdog, which respawns all processes which are not running.

When coming back from the hibernation, this process respawned, and since it couldn't open the resource files, it died again, and then a critical error was declared. After some very small time, the resources files were added by the OS, and this process could continue normally.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top