← back

2017-05-24: demystifying the obfuscated layers

One of the cool / frustrating aspects of low level language programming is dealing with the attributes of the OS itself. The thought occurred to me that this stuff starts to take on an indirect mystical sort of reverence.

Perform a calloc blessing of the enigmatic object. Invoke the reference of attribute. Set the value of variable. Withdraw the memory of reference. End with the rites of free, restoring to null.

A programmer is a bit like a machine, no?

The philosophy de jure endorsed by golang et al. is probably the right one. Hide the details and let people get down to business.

Still, I feel like something is lost from the equation. The kernel is a real element functioning off of very real hardware. Another layer. That makes it tricky. Speaking of hardware tricks...

One day I was playing around with the Device Tree and OpenFirmware module, and I noticed the following error messages suddenly appear upon running certain unit tests included with the mentioned kernel module:

## dt-test ### start of unittest - you will see error messages
OF:/testcase-data/phandle-tests/consumer-a: could not get #phandle-cells-missing for
OF:/testcase-data/phandle-tests/consumer-a: could not get #phandle-cells-missing for
OF:/testcase-data/phandle-tests/consumer-a: could not find phandle
OF:/testcase-data/phandle-tests/consumer-a: could not find phandle
OF:/testcase-data/phandle-tests/consumer-a: arguments longer than property
OF:/testcase-data/phandle-tests/consumer-a: arguments longer than property
irq: no irq domain found for /testcase-data/interrupts/intc0 !
OF: overlay: overlay_is_topmost: #5 clashes #6 @/testcase-data/overlay-node/test-bus
OF: overlay: overlay #5 is not topmost
## dt-test ### FAIL of_unittest_overlay_high_level():2131 overlay_base_root not init
## dt-test ### end of unittest - 148 passed, 1 failed

Initially I had expected this might be a hardware issue, but I quickly ruled that out. Since I was using a custom kernel, a fast grep over the code found the answer. There were some newly added unit tests, which had a few bugs that were more or less immediately patched by the developers who had wrote them.

The warning messages thrown seem to be related to this and so the solution in my case was simply to disable the tests and rebuild the kernel. Done and done.

Actually kind of amazing how quick progress occurs in a project of this size. Frankly impressed there aren't more errors, and the code was reasonably clean too.

I sort of felt motivated to keep exploring. So I glanced around at other mild error messages in hopes of gleaming a few nifty tidbits, and found this one:

AMD-Vi: Event logged [ IO_PAGE_FAULT device=09:00.0 domain=0x0003 address=0x000000f400089ac0 flags=0x0010 ]

This particular log entry, might it have some origin in hardware? Or a sad blobby firmware?

A simple `lspci -v` noted the device in question (09:00.0) pointed to an AMD graphics card using the opensource amdgpu module.

Grepping around the driver directory did not turn up too many clues. Some of the error messages were quite helpfully verbose, or perhaps the opposite.

[AVFS] Something is broken. See log!

Regarding the IO_PAGE_FAULT messages, I noticed quite a number of them, one for every group of addresses on that PCIe bus of the GPU. A little too convenient to simply be hardware errors.

Hmmm... well the event was originally printed by a driver for the AMD IOMMU. I had hoped there wouldn't be anything physically wrong with the component, but at this point I thought why not?

Now this one was slightly trickier to pin down, there were at least two files "amd_iommu.c" and "amd_iommu_v2.c" that looked like they might be important. Possibly more.

It ended up related to an event being polled on the IOMMU itself, printed into the logs by the following code:

case EVENT_TYPE_IO_FAULT:
    printk("IO_PAGE_FAULT device=%02x:%02x.%x"
    "domain=0x%04x address=0x%016llx flags=0x%04x]\n",
    PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
    domid, address, flags);
    break;

The faults seemed to occur once-in-a-while on bootup, so maybe it was related to that?

I recalled that certain boot functionality on Linux was controllable via boot arguments. A dozen hasty searches later and I found the kernel parameters document.

As an aside, the online Linux Admin Guide from Kernel.org is a decent read, abet with elements out-of-date or *really* out-of-date. Such is the fate of good documentation:

Linux beats them ALL! While all other OS's are TALKING about direct support of Java Binaries in the OS, Linux is doing it!

Still got your attention?

Since the IOMMU is point of focus, I glanced at the relevant documentation section.

iommu= [x86]
    off
    force
    noforce
    biomerge
    panic
    nopanic
    merge
    nomerge
    forcesac
    soft
    pt [x86, IA-64]
    nobypass [PPC/POWERNV] - Disable IOMMU bypass, using IOMMU for PCI devices.

Oh boy, around here it's not just the memory mapping that's sparse. Some of them seem obvious, like 'off' or 'force', and I suppose the 'panic' means that if an error or fault occurs, then throw in the towel.

The 'pt' option looks like it means pass-through, and a search of the mailing list confirms this. It can be accomplished via adding it to your bootloader of choice and regenerating the config. In my case, it was grub2:

GRUB_CMDLINE_LINUX_DEFAULT="quiet iommu=pt"

After reboot there were no more IO_PAGE_FAULT errors in the logs. Problem solved.

That was kind of exciting. I think I might dig deeper in the future to better understand these things. I hope you do too.