configurable plugin instance in gpu_cuda plugin #3264

rdietric · 2019-08-22T13:15:04Z

The plugin instance can now be configured via collectd.conf. Currently, the GPU name is used. However, this does not allow us to distinguish between GPUs of the same type, e.g. in workstations or computing clusters, which often contain multiple GPUs of the same type.

Hence, I propose to configure the plugin instance of the gpu_cuda plugin via the two booleans 'InstanceByGPUIndex' and 'InstanceByGPUName'. The GPU index is unique for every NVIDIA GPU in the system. Furthermore, this is more in line with the cpu plugin, which uses the core as plugin instance. I set the default to "GpuId-GpuName".

I also replaced MAX_DEVNAME_LEN with NVML_DEVICE_NAME_BUFFER_SIZE from nvml.h.

ChangeLog: GPU NVML plugin: configurable plugin instance by GPU name and/or GPU index.

rdietric · 2019-10-29T13:18:58Z

The build error does not seem to be related to my changes. I tried to reproduce it on my local machine with clang, but it builds without errors. Should I push a dummy change to trigger CI once more? PR #3273 had a similar issue.

kkepka

This particular patch LGTM, seems to do what it intends to do.

However master branch lacks @rubenk's build fixes #3323 & #3332 from issue #3320 (not yet ported from collectd-5.9).
But even after applying them plugin still doesn't compile fine for me on Ubuntu16.04 (/usr/bin/ld: cannot find -lnvidia-ml), so haven't verify it in practice.
Detailed log attached: gpu_nvidia_build_fail_log.txt

dago · 2020-03-02T21:29:40Z

I see. Question: why are there any PRs in 5.9 branch not merged into master? Shouldn't we do this as starter for 5.12?

mrunge · 2020-03-03T08:08:46Z

@dago the way the community did this so far is: fix issues in released branches and merge released branches to master. Apparently, this did not happen here...

@kkepka good catch, would you be able to propose patches for these issues? Previously, one could simply merge them, once they went through a review, but that is not the case anymore.

dago · 2020-03-03T08:11:47Z

@mrunge But the development has changed since then? Nowadays PRs are merged to master and releases are snapshotted (or branched for backports of security releases), right?

And: are there more PRs merged to release-branches which have not been merged to master?

mrunge · 2020-03-03T08:18:50Z

@dago it is still encouraged to push fixes to released branches. In reality, that does not happen, as folks propose patches for master.

In my personaly view, we should keep it that way (patches to master, cherry-pick to released branches), but the last time we spoke about that, I was the only one arguing for that workflow. I believe, we have that "original" workflow documented somewhere, but I currently can not find the correct link right now.

kkepka · 2020-03-03T08:25:21Z

@kkepka good catch, would you be able to propose patches for these issues? Previously, one could simply merge them, once they went through a review, but that is not the case anymore.

sure, created PR with mentioned fixes #3393

dago · 2020-03-03T08:28:17Z

@kkepka #3393 has been merged, do I get you right that the code still does not compile even with #3393 applied?

kkepka · 2020-03-03T08:43:21Z

That's right @dago. However just checked and I can observe same issue with current master, so it may be something with my test build env or issue is out there in build (and seems not related to this patch from @rdietric)

dago · 2020-03-03T09:23:19Z

@rpv-tomsk Thanks for reviewing the PR and approving the request!

Robert Dietrich added 6 commits August 21, 2019 19:56

use GPU index as plugin instance

25bba03

fixed nvml plugin config

4b217bf

added documentation for the two new config options of the nvml plugin

b7401cb

replaced bit-shift with respective define

e23f707

replaced sizeof with NVML_DEVICE_NAME_BUFFER_SIZE (provided with nvml.h)

1a781da

code formatting via format.sh

c876c1d

dago requested a review from rpv-tomsk February 29, 2020 17:02

dago added this to the 5.11.0 milestone Feb 29, 2020

dago requested a review from kwiatrox March 2, 2020 16:19

kkepka reviewed Mar 2, 2020

View reviewed changes

rpv-tomsk approved these changes Mar 3, 2020

View reviewed changes

dago merged commit 84b6d12 into collectd:master Mar 3, 2020

rdietric deleted the gpu-index branch March 3, 2020 10:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

configurable plugin instance in gpu_cuda plugin #3264

configurable plugin instance in gpu_cuda plugin #3264

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

configurable plugin instance in gpu_cuda plugin #3264

configurable plugin instance in gpu_cuda plugin #3264

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!