-
Notifications
You must be signed in to change notification settings - Fork 71
Jitify2 parsing and preprocessing overhaul #131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
#90 refers to Jitify1. I take it this is in response to the performance regression between CUDA 12.1 and 12.2 (in a test case I've been investigating calls to I've not looked significantly into Jitify2 since I created #90, I presume it's now worth me looking more seriously into the viability of migrating our code to use Jitify2. |
@Robadob This is actually not in response to that issue; I only just became aware of it and do not know the root cause (possibly some header hierarchy changed and became pathological for Jitify). All significant new work is expected to be on Jitify2, so I would encourage you to migrate, and would be happy to answer any questions that come up. |
Fair, the timing must just be a coincidence. I assume the same re: header-hierarchy, but I haven't looked into it too deeply beyond tracking the number of calls to nvrtc compile program.
I'm now planning to explore the migration today (having discussed it with my team), will create a new issue if I have any problems. Edit: Have managed to migrate to |
jitify2.hpp
Outdated
*full_path = include.nonlocal_full_path(kJitifyCallbackHeaderPrefix); | ||
*full_path = path_simplify(*full_path); | ||
if (already_loaded(*full_path)) return HeaderLoadStatus::kAlreadyLoaded; | ||
if (header_callback and header_callback(include, &source)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (header_callback and header_callback(include, &source)) { | |
if (header_callback && header_callback(include, &source)) { |
I've been doing our migration with this branch, this line errors in Visual Studio 2022.
case '"': return in_include_directive_ ? quote_include() : string(); | ||
case 'u': match('8'); | ||
// fall-through | ||
[[gnu::fallthrough]]; // Not sure why gcc complains here without this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[[gnu::fallthrough]]; // Not sure why gcc complains here without this | |
#ifdef __GNUC__ | |
[[gnu::fallthrough]]; // Not sure why gcc complains here without this | |
#endif |
Visual Studio 2022
jitify2.hpp(6192): warning C5030: attribute 'gnu::fallthrough' is not recognized
jitify2.hpp
Outdated
@@ -100,16 +103,19 @@ | |||
#endif | |||
|
|||
#include <algorithm> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include <algorithm> | |
#include <algorithm> | |
#include <array> |
Incomplete type jitify2.hpp:6549
(std::array<value_type, Size> data_ = {};
) under visual studio 2022.
jitify2.hpp
Outdated
Token new_tokens[kMaxNewTokens]; | ||
int j = 0; | ||
Iterator before_where = where; | ||
--before_where; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assertion here under Visual Studio 2022 Cannot decrement begin iterator
.
The begin iterator is coming from jitify2::parser::process_cuda_source()
s first call to insert_directive();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fiddling with the code (e.g. removing this line), later had a line below assert Cannot deref end iterator
when the same insert_directive()
was being called from with .end()
.
I have however got jitify2
branch working, see an edit to my earlier comment.
e1334e5
to
128e055
Compare
@Robadob Thanks for the help with this. I've pushed fixes for the issues you identified as well as some more that I found with manylinux2014. |
Visual Studio 2022 (CUDA 12.0) Log
Log
gcc (Ubuntu 10.5.0-1ubuntu1~20.04) 10.5.0 (CUDA 12.3) // get the dynamically generated header from curve rtc
headers.emplace("dynamic/curve_rtc_dynamic.h", dynamic_header);
getKnownHeaders(flamegpu_include_dir, headers); // Temp hack that prespecifies known headers
// jitify to create program (with compilation settings)
const std::string program_name = func_name + "_program"; // Does this name actually matter?
jitify2::PreprocessedProgram program = jitify2::Program(program_name, kernel_src, headers)->preprocess(options);
if (!program.ok()) { Log
ManyLinux2014 (CUDA 11.2/12.0) Now builds on CI after setting definition Visual Studio 2019 Still got the variadic template error Log
|
I think the I'm guessing the I need to find a way to reproduce the variadic template error. |
Yes, removing that line has moved Visual Studio 2022 to the same error as previously reported with gcc.
I made this change, and now it's unable to find a regular file included by #line 1 "outputdata_impl_curve_rtc_dynamic.h"
#ifndef CURVE_RTC_DYNAMIC_H_
#define CURVE_RTC_DYNAMIC_H_
#include "flamegpu/exception/FLAMEGPUDeviceException.cuh"
#include "flamegpu/detail/type_decode.h"
#include "flamegpu/runtime/detail/curve/Curve.cuh"
#include "flamegpu/util/dstring.h"
Whilst investigating this, I removed the preloading of all FLAMEGPU headers (which I presume shouldn't be required for speedup after this PR), and found it doesn't appear to be using our include directories passed via options to
It would appear I need to rewrite all of our internal includes as system includes to get any further. (This is rather undesirable, and feels like quite the breaking-change)
It's possibly a case of Visual Studio 2019 simply having poor support for If you have a windows box with Visual Studio 2022, you should be able to install the VS 2019 build tools rather than the whole of Visual Studio 2019. |
I think you need to pass The keys in the
So a key of Note that the |
I've tried this, it has no impact. It only appears able to find FLAMEGPU includes, if they are preloaded and passed via the headers map. All of our FLAMEGPU includes (and their includes) are relative to the FLAMEGPU include directory. In my local build this is
I've done this myself. jitify.hpp:7162 // Try loading from include directories.
for (const std::string& include_path : include_paths) {
*full_path = include.nonlocal_full_path(include_path);
*full_path = path_simplify(*full_path);
if (already_loaded(*full_path)) return HeaderLoadStatus::kAlreadyLoaded;
if (read_text_file(*full_path, &source)) {
return newly_loaded(std::move(source));
} else {
fprintf(stderr,
"Did not find #include %s at %s\n", include.name().c_str(), full_path->c_str());
}
} The first file it can't find is
Somehow the relative path is being interpreted wrong. That path should be (Note the extra So I've updated our code, so that the absolute path to the include directory is passed instead I think this is a case of relative to executable, rather than working directory. The first file that can't be found now has changed.
That file does exist on disk.
Digging into
However, if I make that change for all files loaded by TLDR: 2 issues
|
Thanks for the debugging. I've pushed fixes for preincludes, absolute filenames, and the line endings issue. The paths being relative to the executable directory is intentional, but I'm now re-thinking whether that's a good idea based on your feedback. |
I've now guarded out the offending code, and with the other changes discussed from my previous comment (not clear why jitify1/vanilla-jitify2 weren't complaining). The code I've been testing with now works under both the Windows and Linux. The Linux/CUDA12.3 RTC times are down to 961 & 1196ms from 24009 & 23543ms! (with preloaded FLAMEGPU headers disabled in both cases, which was the previous hack to accelerate compilation, previously 18793 & 17782ms with the preloading although it has negligible impact to performance with this branch) I don't have an immediate RTC model to test GLM with (the original thing that spurred #90), but I expect it will fare similarly. The main difference with GLM is that it's many headers all use relative paths to include each-other, which prevented it from benefiting from the header preload hack. I look forward to seeing this merged, thanks for all the help getting it working. The VS2022 warnings (in #132) are still upsetting CI, and the VS2019 std::function/varadic-template issue persists. But we can probably guard the VS2022 warnings if necessary (VS via CMake doesn't seem to obey ISystem), and we may drop support for VS2019 if necessary (that requires some internal discussion though). |
Realised we have something in our test suite, that reported 1349 & 1318ms for the two (different) kernels under CUDA 12.3. Significantly better than the original >60s reported in #90. To get any further I expected we'd need to be flattening the headers, so this is great. |
cc: @kmaehashi for vis (I need to find time to migrate to jitify2 so as to test this branch) |
410c320
to
0d42248
Compare
Hi, I've picked back up our FLAMEGPU branch that uses this version of Jitify as we're running an event soon and would benefit from the shorter compile times. A couple of (minor) issues that I've spotted, these are niggles, rather than anything we can't work around. compile_log() When compilation fails, if we call In the debugger, it appears these properties aren't even set within PreprocessedProgram, so I guess that explains why. werror Our GCC12 WError build raises this spurious looking warning here.
|
I've identified the other bug, that was getting me confused. When The Lines 193 to 195 in bf1c8c0
This is two statements. In both instances of Lines 1169 to 1177 in bf1c8c0
|
- These are useful for debugging and logging.
- Adds Orin Tegra arch. - Ensures that Tegra archs are not used for forward compatibility. - Allows falling back to a binary-compatible real arch when NVRTC doesn't support the requested arch exactly.
- This will ensure cache entries are not reused between different library versions (e.g., if the libraries are updated to fix code-gen bugs). - The NVRTC build version is extracted via a small hack, but there is unfortunately no obvious way to get the build version of the linkers.
@benbarsdell I've tested this branch against CUDF, we have runtime-generated sources that are included as extra headers and the source is then included during jit compilation. The header will have a function name, i.e. |
- Moves option parsing into Option instead of OptionsVec. This ensures that single options can be constructed with "-key=val". Previously, this silently set key="-key=val", which was bad. - Also adds an operator bool() for convenience.
- This is more useful than just returning bool.
- This "feature" was poorly thought out for a few reasons: - During preprocessing, the current exe dir is jitify2_preprocess, not the user's application. - It didn't apply to -I or -L flags used after preprocessing. - It is not the obvious default behavior. - If we want to re-add this feature later, it should be opt-in behind a "--paths-relative-to-exe-dir" option, and it should (only) be applied after preprocessing.
- Adds support for "a" and "f" arch suffixes, such as "sm_90a". Previously, such suffixes were silently removed. - Also fixes the architecture that is put into cache filenames by using the same logic to determine it as is used during compilation, ensuring it will exactly match the compiled binary.
@benbarsdell Any chance this will be merged or tagged? We are looking to create a new release of FLAME GPU using this branch to improve our RTC compile times (FLAMEGPU/FLAMEGPU2#1150). |
- Fixes include paths being processed in reverse order. - Fixes IncludeName path handling not respecting empty paths. - Uses '@' instead of ':' for separators so that they remain valid filenames (on Windows). - Encodes include paths that appear in quote-includes, and sanitizes injected warnings, so that full paths are not included in shipped binaries. - Adds jitify2::quote_include_name(), which can be used to override quote-includes (if really needed).
@mondus I've just pushed a few more fixes to this branch, and I think I'm happy with it now. If you're able to test it (especially on Windows) before I merge it, that would be great. |
@benbarsdell Thanks for moving this along! Initial tests by @ptheywood have flagged that there was an issue/bug in CCCL that shipped with CUDA 12.9 flagged by another jitify2 user which is also exposed in our build matrix (NVIDIA/cccl#4319). Would it be possible to ignore incorrectly formed pre-processor definitions? The specific problem line in CCCL is a comment line in a pre-processor declaration. E.g.
|
We can workaround this via our CMake fetching of CCCL, so it doesn't strictly need resolving in jitify for us (trivially by increasing our minimum to Otherwise our (FLAMEGPU2's) test suite is fine locally under linux with CUDA 12.6 and 12.8 (and python test suite run-time (~360 jitify compilations) is reduced from ~1600s with jitify1 down to ~120s). We'll post an update once we've done this on a windows machine as well. |
- Preprocessor directives containing no identifier (i.e., just a hash) are legal code, but we were treating them as an error. - This fixes an issue with a CCCL header in CUDA 12.9. - This commit also fixes an unnecessary assertion failure with old CUDA versions.
Thanks for the heads up about that issue. It turns out that empty preprocessing directives like that are actually legal code, so it was a jitify bug. I've pushed a fix. |
Between @Robadob, @mondus and myself, we've now got some windows coverage (although not exhaustive), where both FLAME GPU 2 test suites (C++ and python) all passed, which includes >400 Jitfify compilations and executions.
Our CI no longer tests compilation on CUDA 11.0 or 11.1, so unsure if the VS2019 + CUDA < 11.2 issue/compiler bug still persists. On both Windows and linux when we print the error/warning log we do see an incompatible redefinition warning, where a CCCL/libcu++ header is being pulled in at some point. On Linux (CUDA 12.9):
On Windows (CUDA 12.8):
Which appears to be the jitsafe Line 4380 in adbeb71
Conflicting with CCCL's climits: # define ULLONG_MAX 0xffffffffffffffffUL I.e. I'm not sure if this would be more appropriately fixed in Jitify or CCCL. We can suppress this via |
- This avoids "incompatible redefinition" warnings when both <limits> and <cuda/std/limits> are included (and in that order).
Amazing, thanks for all the testing! I've fixed the ULLONG_MAX issue. |
- Previously, the whole pragma once line was removed, which messed up line numbering. Now, only the "pragma ... once" tokens are removed, leaving "#\n" and maintaining line numbers. - Also adds a test that line numbers are maintained. - Note that the existing tests for line numbers of include directives were not affected by this because they are extracted before the pragma once directives are removed.
- Refactors builtin headers to properly handle the "header.h" and "cheader" versions both being included without conflicting. - Fixes incompatibility between our tuple header and libcudacxx's. - Makes other minor fixes and enhancements to some headers. - Adds tests for compatibility within and between all builtin headers.
- Adding placeholder headers like these is of low value, but it improves out-of-the-box compatibility with existing codes that aren't RTC-safe.
- This is still incomplete, but covers most cases. - It's recommended to use <cuda/std/type_traits> instead; at some point we should look into doing this automatically.
This is great. We would be happy with this being merged. |
- Replaces path_simplify() with get_real_path(), which returns the full canonical path with symlinks expanded. This requires the path to exist, so explicit header sources (which are not real files) needed some special handling. - Full path expansion is also applied to "." paths. To avoid these being embedded in the application, they are now explicitly encoded as ".". - Also adds a test for include directory order. - Also fixes some formatting.
I've pushed a final change to improve directory handling. There's a small chance that this breaks something (particularly on Windows); let me know if anything goes wrong. |
Fixes #90 and #107.
(This PR requires #128 and #129 to be merged first).