-
Notifications
You must be signed in to change notification settings - Fork 1.2k
configurable plugin instance in gpu_cuda plugin #3264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The build error does not seem to be related to my changes. I tried to reproduce it on my local machine with clang, but it builds without errors. Should I push a dummy change to trigger CI once more? PR #3273 had a similar issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This particular patch LGTM, seems to do what it intends to do.
However master
branch lacks @rubenk's build fixes #3323 & #3332 from issue #3320 (not yet ported from collectd-5.9
).
But even after applying them plugin still doesn't compile fine for me on Ubuntu16.04 (/usr/bin/ld: cannot find -lnvidia-ml
), so haven't verify it in practice.
Detailed log attached: gpu_nvidia_build_fail_log.txt
I see. Question: why are there any PRs in 5.9 branch not merged into master? Shouldn't we do this as starter for 5.12? |
@dago the way the community did this so far is: fix issues in released branches and merge released branches to master. Apparently, this did not happen here... @kkepka good catch, would you be able to propose patches for these issues? Previously, one could simply merge them, once they went through a review, but that is not the case anymore. |
@mrunge But the development has changed since then? Nowadays PRs are merged to master and releases are snapshotted (or branched for backports of security releases), right? And: are there more PRs merged to release-branches which have not been merged to master? |
@rpv-tomsk Thanks for reviewing the PR and approving the request! |
The plugin instance can now be configured via collectd.conf. Currently, the GPU name is used. However, this does not allow us to distinguish between GPUs of the same type, e.g. in workstations or computing clusters, which often contain multiple GPUs of the same type.
Hence, I propose to configure the plugin instance of the gpu_cuda plugin via the two booleans 'InstanceByGPUIndex' and 'InstanceByGPUName'. The GPU index is unique for every NVIDIA GPU in the system. Furthermore, this is more in line with the cpu plugin, which uses the core as plugin instance. I set the default to "GpuId-GpuName".
I also replaced MAX_DEVNAME_LEN with NVML_DEVICE_NAME_BUFFER_SIZE from nvml.h.
ChangeLog: GPU NVML plugin: configurable plugin instance by GPU name and/or GPU index.