-
Notifications
You must be signed in to change notification settings - Fork 337
infiniband: do not fail if board_id is not present #556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Whilst a large majority do, not _all_ InfiniBand drivers expose a board_id in sysfs. Signed-off-by: Daniel Swarbrick <daniel.swarbrick@gmail.com>
func (fs FS) parseInfiniBandDevice(name string) (*InfiniBandDevice, error) { | ||
path := fs.sys.Path(infinibandClassPath, name) | ||
device := InfiniBandDevice{Name: name} | ||
|
||
for _, f := range [...]string{"board_id", "fw_ver", "hca_type"} { | ||
// fw_ver is exposed by all InfiniBand drivers since kernel version 4.10. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We typically support older kernels than this. Maybe we should just make all of these files/values optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this PR is does not change the behaviour for fw_ver
, which has been a mandatory requirement of this collector since it first landed in June 2019. It only makes the board_id
optional, and hca_type
was already optional.
I would be very surprised if anybody is running InfiniBand on kernels older than 4.10. I don't recall a single issue being logged about this collector failing due to missing fw_ver
.
Digging through kernel source, the fw_ver
sysfs entry appears to be universally implemented from kernel 4.8 onwards (despite docs saying 4.10), and prior to that it is implemented by selected vendors (e.g. Mellanox, which is almost certainly going to be the hardware in use from that era of kernels). The earliest mention of it in the mlx4 driver is kernel 2.6.24.
I wonder if we should make the values in the struct |
Doing so means that consumers of this need to check for non-nil before using the struct members. I'm not sure if that's a good thing or not. Given that Prometheus does not store empty-string labels, my gut feeling is that it's more convenient to simply leave these struct members as empty strings if not found, meaning that node_exporter can just blindly assign them to labels. |
Related-to: prometheus/node_exporter#2769 |
Whilst a large majority do, not _all_ InfiniBand drivers expose a board_id in sysfs. Signed-off-by: Daniel Swarbrick <daniel.swarbrick@gmail.com>
Whilst a large majority do, not all InfiniBand drivers expose a board_id in sysfs.
Fixes: prometheus/node_exporter#2769