8000 Fix links in C++ Developer Guide. by bdice · Pull Request #9675 · rapidsai/cudf · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Fix links in C++ Developer Guide. #9675

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 13, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 11 additions & 11 deletions cpp/docs/BENCHMARKING.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,24 +8,24 @@ other benchmarks in `cpp/benchmarks` to understand the options.

## Directory and File Naming

The naming of unit benchmark directories and source files should be consistent with the feature
being benchmarked. For example, the benchmarks for APIs in `copying.hpp` should live in
`cudf/cpp/benchmarks/copying`. Each feature (or set of related features) should have its own
The naming of unit benchmark directories and source files should be consistent with the feature
being benchmarked. For example, the benchmarks for APIs in `copying.hpp` should live in
`cudf/cpp/benchmarks/copying`. Each feature (or set of related features) should have its own
benchmark source file named `<feature>_benchmark.cu/cpp`. For example,
`cudf/cpp/src/copying/scatter.cu` has benchmarks in
`cudf/cpp/src/copying/scatter.cu` has benchmarks in
`cudf/cpp/benchmarks/copying/scatter_benchmark.cu`.

In the interest of improving compile time, whenever possible, test source files should be `.cpp`
In the interest of improving compile time, whenever possible, test source files should be `.cpp`
files because `nvcc` is slower than `gcc` in compiling host code. Note that `thrust::device_vector`
includes device code, and so must only be used in `.cu` files. `rmm::device_uvector`,
includes device code, and so must only be used in `.cu` files. `rmm::device_uvector`,
`rmm::device_buffer` and the various `column_wrapper` types described in [Testing](TESTING.md)
can be used in `.cpp` files, and are therefore preferred in test code over `thrust::device_vector`.

## CUDA Asynchrony and benchmark accuracy

CUDA computations and operations like copies are typically asynchronous with respect to host code,
so it is 7C6B important to carefully synchronize in order to ensure the benchmark timing is not stopped
before the feature you are benchmarking has completed. An RAII helper class `cuda_event_timer` is
before the feature you are benchmarking has completed. An RAII helper class `cuda_event_timer` is
provided in `cpp/benchmarks/synchronization/synchronization.hpp` to help with this. This class
can also optionally clear the GPU L2 cache in order to ensure cache hits do not artificially inflate
performance in repeated iterations.
Expand All @@ -35,10 +35,10 @@ performance in repeated iterations.
In general, we should benchmark all features over a range of data sizes and types, so that we can
catch regressions across libcudf changes. However, running many benchmarks is expensive, so ideally
we should sample the parameter space in such a way to get good coverage without having to test
exhaustively.
exhaustively.

A rule of thumb is that we should benchmark with enough data to reach the point where the algorithm
reaches its saturation bottleneck, whether that bottleneck is bandwidth or computation. Using data
A rule of thumb is that we should benchmark with enough data to reach the point where the algorithm
reaches its saturation bottleneck, whether that bottleneck is bandwidth or computation. Using data
sets larger than this point is generally not helpful, except in specific cases where doing so
exercises different code and can therefore uncover regressions that smaller benchmarks will not
exercises different code and can therefore uncover regressions that smaller benchmarks will not
(this should be rare).
Loading
0