8000 configurable plugin instance in gpu_cuda plugin by rdietric · Pull Request #3264 · collectd/collectd · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

configurable plugin instance in gpu_cuda plugin #3264

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 3, 2020

Conversation

rdietric
Copy link
Contributor

The plugin instance can now be configured via collectd.conf. Currently, the GPU name is used. However, this does not allow us to distinguish between GPUs of the same type, e.g. in workstations or computing clusters, which often contain multiple GPUs of the same type.

Hence, I propose to configure the plugin instance of the gpu_cuda plugin via the two booleans 'InstanceByGPUIndex' and 'InstanceByGPUName'. The GPU index is unique for every NVIDIA GPU in the system. Furthermore, this is more in line with the cpu plugin, which uses the core as plugin instance. I set the default to "GpuId-GpuName".

I also replaced MAX_DEVNAME_LEN with NVML_DEVICE_NAME_BUFFER_SIZE from nvml.h.

ChangeLog: GPU NVML plugin: configurable plugin instance by GPU name and/or GPU index.

@rdietric
Copy link
Contributor Author

The build error does not seem to be related to my changes. I tried to reproduce it on my local machine with clang, but it builds without errors. Should I push a dummy change to trigger CI once more? PR #3273 had a similar issue.

@dago dago requested a review from rpv-tomsk February 29, 2020 17:02
@dago dago added this to the 5.11.0 milestone Feb 29, 2020
@dago dago requested a review from kwiatrox March 2, 2020 16:19
Copy link
Contributor
@kkepka kkepka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular patch LGTM, seems to do what it intends to do.

However master branch lacks @rubenk's build fixes #3323 & #3332 from issue #3320 (not yet ported from collectd-5.9).
But even after applying them plugin still doesn't compile fine for me on Ubuntu16.04 (/usr/bin/ld: cannot find -lnvidia-ml), so haven't verify it in practice.
Detailed log attached: gpu_nvidia_build_fail_log.txt

@dago
Copy link
Contributor
dago commented Mar 2, 2020

I see. Question: why are there any PRs in 5.9 branch not merged into master? Shouldn't we do this as starter for 5.12?

@mrunge
Copy link
Member
mrunge commented Mar 3, 2020

@dago the way the community did this so far is: fix issues in released branches and merge released branches to master. Apparently, this did not happen here...

@kkepka good catch, would you be able to propose patches for these issues? Previously, one could simply merge them, once they went through a review, but that is not the case anymore.

@dago
Copy link
Contributor
dago commented Mar 3, 2020

@mrunge But the development has changed since then? Nowadays PRs are merged to master and releases are snapshotted (or branched for backports of security releases), right?

And: are there more PRs merged to release-branches which have not been merged to master?

@kkepka
Copy link
Contributor
kkepka commented Mar 3, 2020

@kkepka good catch, would you be able to propose patches for these issues? Previously, one could simply merge them, once they went through a review, but that is not the case anymore.

sure, created PR with mentioned fixes #3393

@dago
Copy link
Contributor
dago commented Mar 3, 2020

@kkepka #3393 has been merged, do I get you right that the code still does not compile even with #3393 applied?

@kkepka
Copy link
Contributor
kkepka commented Mar 3, 2020

That's right @dago. However just checked and I can observe same issue with current master, so it may be something with my test build env or issue is out there in build (and seems not related to this patch from @rdietric)

@dago
Copy link
Contributor
dago commented Mar 3, 2020

@rpv-tomsk Thanks for reviewing the PR and approving the request!

@dago dago merged commit 84b6d12 into collectd:master Mar 3, 2020
@rdietric rdietric deleted the gpu-index branch March 3, 2020 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0