这篇文章上次修改于 448 天前,可能其部分内容已经发生变化,如有疑问可询问作者。

MMIO(Memory Mapped I/O)

前段时间配置星河的第二台GPU机器的时候,驱动怎么都打不上,运行了一下dmesg发现:

[  886.661014] NVRM: The system BIOS may have misconfigured your GPU.
[  886.661018] nvidia: probe of 0000:45:00.0 failed with error -1
[  886.661058] NVRM: The NVIDIA probe routine failed for 2 device(s).
[  886.661060] NVRM: None of the NVIDIA graphics adapters were initialized!
[  886.661298] nvidia-nvlink: Unregistered the Nvlink Core, major device number 241
[  886.779812] nvidia-nvlink: Nvlink Core is being initialized, major device number 241
[  886.780112] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR1 is 0M @ 0x0 (PCI:0000:44:00.0)
[  886.780118] NVRM: The system BIOS may have misconfigured your GPU.
[  886.780132] nvidia: probe of 0000:44:00.0 failed with error -1
[  886.780152] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR1 is 0M @ 0x0 (PCI:0000:45:00.0)
[  886.780156] NVRM: The system BIOS may have misconfigured your GPU.
[  886.780166] nvidia: probe of 0000:45:00.0 failed with error -1
[  886.780193] NVRM: The NVIDIA probe routine failed for 2 device(s).
[  886.780195] NVRM: None of the NVIDIA graphics adapters were initialized!
[  886.780285] nvidia-nvlink: Unregistered the Nvlink Core, major device number 241
[  886.902779] nvidia-nvlink: Nvlink Core is being initialized, major device number 241
[  886.903055] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR1 is 0M @ 0x0 (PCI:0000:44:00.0)
[  886.903059] NVRM: The system BIOS may have misconfigured your GPU.
[  886.903072] nvidia: probe of 0000:44:00.0 failed with error -1
[  886.903088] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR1 is 0M @ 0x0 (PCI:0000:45:00.0)
[  886.903090] NVRM: The system BIOS may have misconfigured your GPU.
[  886.903096] nvidia: probe of 0000:45:00.0 failed with error -1
[  886.903117] NVRM: The NVIDIA probe routine failed for 2 device(s).
[  886.903118] NVRM: None of the NVIDIA graphics adapters were initialized!
[  886.903181] nvidia-nvlink: Unregistered the Nvlink Core, major device number 241

在询问了墨羲之后找到了问题所在:MMIO(Memory Mapped I/O)。
在不同的BIOS里或许配置名不同,但基本都和内存沾边。
54961-lpjz1v7xftf.png
原理是如果MMIO开太多(超过操作系统支持的寻址范围)或者开太少,都有可能导致系统无法正常进行寻址,PCI设备无法连接。
基本上服务器主板里都有这个设置,耐心看看说明书就好了。
如果是VMWare ESXi的直通机器,可以参考我之前写的这篇博客VMWare ESXi 显卡直通 (PCI 设备直通) 出现 DevicePowerOn 错误,照着操作基本不会出问题。
另外PVE直通基本不会出现这个问题,但切记,只有主机寻址认到了计算卡才有直通的机会,也就是说主板MMIO还是要配置。

说明

本篇博客是对之前的安装 GPU 及配置环境的注意事项的扩写,具体可以看之前的那篇。

Q.E.D
C4a15Wh_5.1
2021-12-17 13:50