Вопрос

Recently got this in my Event Log:

A fatal hardware error has occurred.
Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Bus/Interconnect Error
Processor ID: 1

The question is, How can one design a logging subsystem that can log these kinds of errors?

Event if it's enough to issue one DMA request to write to HDD, no paging, no interrupts, it's still at lot of work to do on a presumably failed hardware.

Reminds me of an old joke "CPU not found, starting software emulation", too.

P.S. I believe this does not belong to superuser or serverfault, because I'm interested in software part of the problem, not diagnosing my computer :)

Это было полезно?

Решение

As far as I know, that particular error is flagged internally in the CPU. The particular flag is MCE_FLAG, and the CPU can be polled for its content of this flag using CPUID. Not much of an assembly programmer, but I think the Intel architecture books (free on Intel's site) should have something more. See the CPUID function.

Другие советы

HW devices are being managed by their drivers and capturing device errors is part of its driver responsibility. The driver can use the dedicated kernel API to write the data related to the error into Windows error log. The log can be parsed by applications but in order to understand the error event data there is a need to know the proprietary information related to the faulting device (the info is either available in the documentation or it's undocumented). In case the driver was implemented by the MS, the event viewer is capable to present the meaning of an error without additional parsing - just like in the question.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top