sci-libs/tensorflow have a rocm backend support for AMD video cards. It would be nice if we have a rocm use flag to enable this feature. Reproducible: Always
Created attachment 607168 [details] Modified tensorflow-9999 ebuild
Created attachment 607170 [details, diff] Tensorflow ROCm bazel config patch
Hi, I have been trying to build tensorflow with ROCm for some time. I was able to make patch for bazel configure file, so tensorflow build process starts OK, but after a some time build always fail at the same include file, gpu_device_functions.h. I tried tensorflow 2.0, 2.1 with different versions of ROCM since 2.8 from Gentoo portage and https://github.com/justxi/rocm, but build alwas failed. The errors below comes from included ebuild and patch. In file included from ./tensorflow/core/util/gpu_kernel_helper.h:25: ./tensorflow/core/util/gpu_device_functions.h:110:34: error: use of undeclared identifier 'blockIdx' return detail::GpuGridRange<T>(blockIdx.x * blockDim.x + threadIdx.x, ^ ./tensorflow/core/util/gpu_device_functions.h:110:47: error: use of undeclared identifier 'blockDim' return detail::GpuGridRange<T>(blockIdx.x * blockDim.x + threadIdx.x, ^ ./tensorflow/core/util/gpu_device_functions.h:110:60: error: use of undeclared identifier 'threadIdx' return detail::GpuGridRange<T>(blockIdx.x * blockDim.x + threadIdx.x, ^ ./tensorflow/core/util/gpu_device_functions.h:111:34: error: use of undeclared identifier 'gridDim' gridDim.x * blockDim.x, count); ^ ./tensorflow/core/util/gpu_device_functions.h:111:46: error: use of undeclared identifier 'blockDim' gridDim.x * blockDim.x, count); ^ ./tensorflow/core/util/gpu_device_functions.h:119:34: error: use of undeclared identifier 'blockIdx' return detail::GpuGridRange<T>(blockIdx.y * blockDim.y + threadIdx.y, ^ ./tensorflow/core/util/gpu_device_functions.h:119:47: error: use of undeclared identifier 'blockDim' return detail::GpuGridRange<T>(blockIdx.y * blockDim.y + threadIdx.y, ^ ./tensorflow/core/util/gpu_device_functions.h:119:60: error: use of undeclared identifier 'threadIdx' return detail::GpuGridRange<T>(blockIdx.y * blockDim.y + threadIdx.y, ^ ./tensorflow/core/util/gpu_device_functions.h:120:34: error: use of undeclared identifier 'gridDim' gridDim.y * blockDim.y, count); ^ ./tensorflow/core/util/gpu_device_functions.h:120:46: error: use of undeclared identifier 'blockDim' gridDim.y * blockDim.y, count); ^ ./tensorflow/core/util/gpu_device_functions.h:128:34: error: use of undeclared identifier 'blockIdx' return detail::GpuGridRange<T>(blockIdx.z * blockDim.z + threadIdx.z, ^ ./tensorflow/core/util/gpu_device_functions.h:128:47: error: use of undeclared identifier 'blockDim' return detail::GpuGridRange<T>(blockIdx.z * blockDim.z + threadIdx.z, ^ ./tensorflow/core/util/gpu_device_functions.h:128:60: error: use of undeclared identifier 'threadIdx' return detail::GpuGridRange<T>(blockIdx.z * blockDim.z + threadIdx.z, ^ ./tensorflow/core/util/gpu_device_functions.h:129:34: error: use of undeclared identifier 'gridDim' gridDim.z * blockDim.z, count); ^ ./tensorflow/core/util/gpu_device_functions.h:129:46: error: use of undeclared identifier 'blockDim' gridDim.z * blockDim.z, count); ^ ./tensorflow/core/util/gpu_device_functions.h:149:13: error: use of undeclared identifier '__lane_id' lane_id = __lane_id(); ^ ./tensorflow/core/util/gpu_device_functions.h:177:7: error: use of undeclared identifier '__shfl' __shfl(static_cast<int>(mask), static_cast<int>(src_lane)); ^ ./tensorflow/core/util/gpu_device_functions.h:254:10: error: use of undeclared identifier '__ballot' return __ballot(pred) & mask; // Apply mask to match __ballot_sync's spec. ^ ./tensorflow/core/util/gpu_device_functions.h:266:10: error: use of undeclared identifier '__any' return __any(pred); ^ fatal error: too many errors emitted, stopping now [-ferror-limit=] 20 errors generated. Portage 2.3.84 (python 3.6.9-final-0, default/linux/amd64/17.1/hardened, gcc-9.2.0, glibc-2.30-r3, 5.4.15-gentoo x86_64) ================================================================= System uname: Linux-5.4.15-gentoo-x86_64-AMD_Ryzen_7_3700X_8-Core_Processor-with-gentoo-2.6 KiB Mem: 32868240 total, 24432948 free KiB Swap: 33554428 total, 33554428 free Timestamp of repository gentoo: Tue, 28 Jan 2020 17:45:01 +0000 Head commit of repository gentoo: 272453850e553e9f4eae0055d0719ac9e4f77ab3 sh bash 4.4_p23-r1 ld GNU ld (Gentoo 2.32 p2) 2.32.0 distcc 3.3.3 x86_64-pc-linux-gnu [disabled] ccache version 3.7.6 [disabled] app-shells/bash: 4.4_p23-r1::gentoo dev-java/java-config: 2.2.0-r4::gentoo dev-lang/perl: 5.30.1::gentoo dev-lang/python: 2.7.17::gentoo, 3.6.9::misc, 3.7.5-r1::gentoo dev-util/ccache: 3.7.6::gentoo dev-util/cmake: 3.14.6::gentoo dev-util/pkgconfig: 0.29.2::gentoo sys-apps/baselayout: 2.6-r1::gentoo sys-apps/openrc: 0.42.1::gentoo sys-apps/sandbox: 2.13::gentoo sys-devel/autoconf: 2.13-r1::gentoo, 2.69-r4::gentoo sys-devel/automake: 1.16.1-r1::gentoo sys-devel/binutils: 2.32-r1::gentoo sys-devel/gcc: 9.2.0-r3::gentoo sys-devel/gcc-config: 2.1::gentoo sys-devel/libtool: 2.4.6-r6::gentoo sys-devel/make: 4.2.1-r4::gentoo sys-kernel/linux-headers: 5.4::gentoo (virtual/os-headers) sys-libs/glibc: 2.30-r3::gentoo Repositories: crossdev location: /usr/local/portage/crossdev masters: gentoo priority: 10 gentoo location: /usr/portage sync-type: rsync sync-uri: rsync://rsync.europe.gentoo.org/gentoo-portage priority: 10 sync-rsync-verify-jobs: 1 sync-rsync-verify-metamanifest: yes sync-rsync-extra-opts: sync-rsync-verify-max-age: 24 brother-overlay location: /var/lib/layman/brother-overlay sync-type: laymansync sync-uri: https://github.com/stefan-langenmaier/brother-overlay.git masters: gentoo priority: 50 misc location: /usr/local/portage/misc masters: gentoo priority: 90 ACCEPT_KEYWORDS="amd64" ACCEPT_LICENSE="@FREE" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-O2 -march=native -ftree-vectorize -pipe" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/lib64/libreoffice/program/sofficerc /usr/share/config /usr/share/gnupg/qualified.txt /usr/share/themes/oxygen-gtk/gtk-2.0 /usr/share/themes/oxygen-gtk/gtk-3.0" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo" CXXFLAGS="-O2 -march=native -ftree-vectorize -pipe" DISTDIR="/usr/portage/distfiles" ENV_UNSET="DBUS_SESSION_BUS_ADDRESS DISPLAY GOBIN PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR" FCFLAGS="-O2 -pipe" FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync multilib-strict network-sandbox news pid-sandbox preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr" FFLAGS="-O2 -pipe" GENTOO_MIRRORS="ftp://ftp.fi.muni.cz/pub/linux/gentoo/ http://gentoo.mirror.web4u.cz/ ftp://gentoo.mirror.web4u.cz/ http://tux.rainside.sk/gentoo/ ftp://tux.rainside.sk/gentoo/ http://gentoo.wheel.sk/ ftp://gentoo.wheel.sk/pub/linux/gentoo/" LANG="en_US.utf8" LDFLAGS="-Wl,-O1 -Wl,--as-needed" LINGUAS="en" MAKEOPTS="-j17" PKGDIR="/var/tmp/pkg/" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git" PORTAGE_TMPDIR="/var/tmp" USE="X a52 aac acl acpi activities alsa amd64 berkdb bluetooth branding bzip2 cairo cdda cdr clang cli crypt cups cxx dbus declarative dri dts dvd dvdr elogind emboss encode exif fam flac fortran gdbm gif glamor gpm gtk hardened iconv icu ipv6 jpeg kde kipi lcms libnotify libtirpc mad mng mp3 mp4 mpeg multilib ncurses networkmanager nls nptl ogg opengl openmp pam pango pcre pcsc-lite pdf pgo phonon pie plasma png policykit ppds pulseaudio qml qt5 readline sdl seccomp semantic-desktop spell split-usr ssl ssp startup-notification svg tcpd tiff truetype udev udisks unicode upower usb v4l vdpau vorbis vulkan wayland widgets x264 xattr xcb xcomposite xml xtpax xv xvid zlib" ABI_X86="64" ADA_TARGET="gnat_2018" CALLIGRA_FEATURES="sheets stage words" CPU_FLAGS_X86="aes avx avx2 f16c fma3 mmx mmxext pclmul popcnt sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3" CURL_SSL="openssl" ELIBC="glibc" GRUB_PLATFORMS="efi-64" INPUT_DEVICES="libinput" KERNEL="linux" L10N="en" LLVM_TARGETS="AMDGPU BPF X86 AArch64 ARM" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php7-2" POSTGRES_TARGETS="postgres10 postgres11" PYTHON_SINGLE_TARGET="python3_6" PYTHON_TARGETS="python2_7 python3_6" QEMU_SOFTMMU_TARGETS="x86_64" QEMU_USER_TARGETS="x86_64" RUBY_TARGETS="ruby25" USERLAND="GNU" VIDEO_CARDS="amdgpu radeonsi" Unset: CC, CPPFLAGS, CTARGET, CXX, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LC_ALL, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS I hope that that modified ebuild and patch can help someone to continue.
Created attachment 607176 [details] Modified ebuild from justxi overlay
I was able to reproduce also second kind of build failure. In file included from /usr/include/rocprim/intrinsics.hpp:27: /usr/include/rocprim/intrinsics/atomic.hpp:33:16: error: no member named 'atomicAdd' in the global namespace; did you mean 'atomic_add'? return ::atomicAdd(address, value); ^~ /usr/include/rocprim/intrinsics/atomic.hpp:31:18: note: 'atomic_add' declared here unsigned int atomic_add(unsigned int * address, unsigned int value) ^ /usr/include/rocprim/intrinsics/atomic.hpp:39:16: error: no member named 'atomicAdd' in the global namespace; did you mean 'atomic_add'? return ::atomicAdd(address, value); ^~ /usr/include/rocprim/intrinsics/atomic.hpp:37:9: note: 'atomic_add' declared here int atomic_add(int * address, int value) ^ /usr/include/rocprim/intrinsics/atomic.hpp:45:16: error: no member named 'atomicAdd' in the global namespace; did you mean 'atomic_add'? return ::atomicAdd(address, value); ^~ /usr/include/rocprim/intrinsics/atomic.hpp:43:11: note: 'atomic_add' declared here float atomic_add(float * address, float value) ^ /usr/include/rocprim/intrinsics/atomic.hpp:51:16: error: no member named 'atomicAdd' in the global namespace; did you mean 'atomic_add'? return ::atomicAdd(address, value); ^~ /usr/include/rocprim/intrinsics/atomic.hpp:49:24: note: 'atomic_add' declared here unsigned long long atomic_add(unsigned long long * address, unsigned long long value) ^ /usr/include/rocprim/intrinsics/atomic.hpp:57:18: error: no member named 'atomicInc' in the global namespace return ::atomicInc(address, value); ~~^ /usr/include/rocprim/intrinsics/atomic.hpp:63:16: error: no member named 'atomicExch' in the global namespace; did you mean 'atomic_exch'? return ::atomicExch(address, value); ^~ /usr/include/rocprim/intrinsics/atomic.hpp:61:18: note: 'atomic_exch' declared here unsigned int atomic_exch(unsigned int * address, unsigned int value) ^ /usr/include/rocprim/intrinsics/atomic.hpp:69:16: error: no member named 'atomicExch' in the global namespace; did you mean 'atomic_exch'? return ::atomicExch(address, value); ^~ /usr/include/rocprim/intrinsics/atomic.hpp:67:24: note: 'atomic_exch' declared here unsigned long long atomic_exch(unsigned long long * address, unsigned long long value) ^ In file included from tensorflow/core/kernels/histogram_op_gpu.cu.cc:24: In file included from bazel-out/k8-opt/bin/external/local_config_rocm/rocm/rocm/include/hipcub/hipcub.hpp:31: In file included from bazel-out/k8-opt/bin/external/local_config_rocm/rocm/rocm/include/hipcub/config.hpp:50: In file included from /usr/include/rocprim/rocprim.hpp:33: In file included from /usr/include/rocprim/intrinsics.hpp:28: /usr/include/rocprim/intrinsics/bit.hpp:44:12: error: use of undeclared identifier '__popc' return __popc(x); ^ /usr/include/rocprim/intrinsics/bit.hpp:53:12: error: use of undeclared identifier '__popcll' return __popcll(x); ^ In file included from tensorflow/core/kernels/histogram_op_gpu.cu.cc:24: In file included from bazel-out/k8-opt/bin/external/local_config_rocm/rocm/rocm/include/hipcub/hipcub.hpp:31: In file included from bazel-out/k8-opt/bin/external/local_config_rocm/rocm/rocm/include/hipcub/config.hpp:50: In file included from /usr/include/rocprim/rocprim.hpp:33: In file included from /usr/include/rocprim/intrinsics.hpp:29: /usr/include/rocprim/intrinsics/thread.hpp:42:12: error: use of undeclared identifier 'warpSize' return warpSize; ^ /usr/include/rocprim/intrinsics/thread.hpp:40:24: error: no return statement in constexpr function constexpr unsigned int warp_size() ^ /usr/include/rocprim/intrinsics/thread.hpp:49:12: error: use of undeclared identifier 'hipBlockDim_z' return hipBlockDim_z * hipBlockDim_y * hipBlockDim_x; ^ /usr/include/rocprim/intrinsics/thread.hpp:49:28: error: use of undeclared identifier 'hipBlockDim_y' return hipBlockDim_z * hipBlockDim_y * hipBlockDim_x; ^ /usr/include/rocprim/intrinsics/thread.hpp:49:44: error: use of undeclared identifier 'hipBlockDim_x' return hipBlockDim_z * hipBlockDim_y * hipBlockDim_x; ^ /usr/include/rocprim/intrinsics/thread.hpp:65:12: error: no member named '__lane_id' in the global namespace; did you mean 'lane_id'? return ::__lane_id(); ^~ /usr/include/rocprim/intrinsics/thread.hpp:63:14: note: 'lane_id' declared here unsigned int lane_id() ^ /usr/include/rocprim/intrinsics/thread.hpp:72:13: error: use of undeclared identifier 'hipThreadIdx_z' return (hipThreadIdx_z * hipBlockDim_y * hipBlockDim_x) ^ /usr/include/rocprim/intrinsics/thread.hpp:72:30: error: use of undeclared identifier 'hipBlockDim_y' return (hipThreadIdx_z * hipBlockDim_y * hipBlockDim_x) ^ /usr/include/rocprim/intrinsics/thread.hpp:72:46: error: use of undeclared identifier 'hipBlockDim_x' return (hipThreadIdx_z * hipBlockDim_y * hipBlockDim_x) ^ /usr/include/rocprim/intrinsics/thread.hpp:73:12: error: use of undeclared identifier 'hipThreadIdx_y' + (hipThreadIdx_y * hipBlockDim_x) ^ fatal error: too many errors emitted, stopping now [-ferror-limit=] 20 errors generated.
Created attachment 607334 [details, diff] Tensorflow ROCm bazel config patch For those having this failure: patching file third_party/gpus/rocm_configure.bzl Hunk #3 FAILED at 191. Patch with removed additional line: - inc_dirs.append(rocm_config.rocm_toolkit_path + "/llvm/lib/clang/11.0.0/include")
Created attachment 641240 [details, diff] Tensorflow ROCm bazel config patch Tried to update the patch to fit Tensorflow-2.2.0 and ROCm-3.3.0. Some work remains.
Hi, I finally succeed build tensorflow-2.2.0 with ROCm 3.3. It was necessary to use ar wrapper, https://medium.com/analytics-vidhya/compiling-tensorflow-from-the-source-when-your-compiler-is-in-a-non-standard-location-194fecc92153 and add patch to hip ebuild, hip-3.3.0-ROCM_PATH-LIB_PATH.patch. I'm not interested in tensorflow anymore, maybe someone will continue.
Created attachment 655374 [details, diff] Hip rocm path patch
Created attachment 655376 [details] Tensorflow 2.2.0 ebuild
Created attachment 655378 [details, diff] Workspace patch
Created attachment 655380 [details, diff] 0003-systemlibs-jsoncpp-fix-include-path patch
Created attachment 655382 [details, diff] tensorflow-2.1.0-python3.8-pywrap_tensor.patch
Created attachment 655384 [details, diff] tensorflow-2.2.0-remove-bazel_version_repository.patch
Created attachment 655386 [details, diff] tensorflow-2.2_rc4-rocm-3.3.0_cc_toolchain.patch
Created attachment 655388 [details, diff] tensorflow-2.2_rc4-rocm-3.3.0_configure_bazel.patch
Created attachment 655390 [details, diff] tensorflow-2.2_rc4-rocm-3.3.0_hipcc_toolchain.patch
tensorflow updated to 2.3.0 and rocm to 3.7.0 and some patches exists to AUR BUILD for rocm but need rocm_configure.bzl adopt to gentoo enviroment and rocm 3.7.0 may be help me ?
I've got tensorflow-2.5.0-r3 worked on ROCm-4.3 with Radeon VII(gfx906) and RX 6700XT (gfx1031). Dependencies such as rocFFT-4.3 and MIOpen is ready for PR, and after some clean up the rocm USE flag for tensorflow will be ready. Please stay tuned.
(In reply to Wu Yiyang from comment #19) > I've got tensorflow-2.5.0-r3 worked on ROCm-4.3 Could you be so kind as to share your efforts before they're merged in? I've been working on my own tensorflow-rocm ebuild with limited success, so it would be great not to duplicate effort. Are you using the upstream code base or AMD's fork for TensorFlow?
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=02dc3ee11c5815bec6c60fe88285d8d446f7263f commit 02dc3ee11c5815bec6c60fe88285d8d446f7263f Author: YiyangWu <xgreenlandforwyy@gmail.com> AuthorDate: 2021-08-26 07:33:50 +0000 Commit: Benda Xu <heroxbd@gentoo.org> CommitDate: 2021-11-30 06:32:54 +0000 sci-libs/hipFFT: ROCm FFT marshalling library hipFFT is the front end of rocFFT, and is dependency of ROCm supported math/DL frameworks like pytorch. Bug: https://bugs.gentoo.org/705712 Closes: https://github.com/gentoo/gentoo/pull/22804 Package-Manager: Portage-3.0.20, Repoman-3.0.3 Signed-off-by: Yiyang Wu <xgreenlandforwyy@gmail.com> Signed-off-by: Benda Xu <heroxbd@gentoo.org> sci-libs/hipFFT/Manifest | 2 + .../files/hipFFT-4.3.0-add-complex-header.patch | 11 ++++ .../hipFFT-4.3.0-gentoo-install-locations.patch | 42 ++++++++++++ .../files/hipFFT-4.3.0-remove-git-dependency.patch | 33 ++++++++++ sci-libs/hipFFT/hipFFT-4.3.0.ebuild | 75 ++++++++++++++++++++++ sci-libs/hipFFT/metadata.xml | 22 +++++++ 6 files changed, 185 insertions(+)
I managed to install it on my system with gfx803, which I set in AMDGPU_TARGETS in my make.conf. I based this all on Wu Yiyang's hard work, merging in the hipFFT branch from their Git repo: https://github.com/littlewu2508/gentoo.git I had to tweak the TensorFlow ebuild to enable permissions to use the card during the build, and also add rocSOLVER. These are present in my personal overlay: https://github.com/awenocur/asw-custom-software-gentoo.git.
I'm beginning an effort to port ROCm 3.5.2, which includes the necessary BLAS, PRIM, SOLVER, FFT, and SPARSE libraries. It is far from complete, and has some issues with AMD's clang branch. I'll keep working on it, but the effort is currently in a branch of my personal overlay for anyone who can use it to make more reliable or compliant ebuilds: https://github.com/awenocur/asw-custom-software-gentoo/tree/ROCm
I've got tensorflow-2.5.0 worked with rocm-4.3.0 on gfx906 and gfx1030. The tensorflow-2.5.0-r3.ebuild with rocm enabled is in https://github.com/littlewu2508/gentoo/tree/tfrocm branch but it's not quite mature, and I don't have much time to do further clean up and tests yet. But at least it worked once, so I share it here so you can have a try.
(In reply to Adam Wenocur from comment #20) > (In reply to Wu Yiyang from comment #19) > > I've got tensorflow-2.5.0-r3 worked on ROCm-4.3 > > Could you be so kind as to share your efforts before they're merged in? I've > been working on my own tensorflow-rocm ebuild with limited success, so it > would be great not to duplicate effort. > > Are you using the upstream code base or AMD's fork for TensorFlow? Sorry for the late answer. I use the upstream code.
I try to compile tensorflow 2.9.1 with rocm 5.2.1 and got error: [0 / 559] 3 actions, 0 running [Prepa] BazelWorkspaceStatusAction stable-status.txt [Prepa] Writing file tensorflow/libtensorflow.so.2.9.1-2.params [Prepa] Writing file tensorflow/libtensorflow_framework.so.2.9.1-2.params WARNING: An illegal reflective access operation has occurred ERROR: /var/tmp/portage/sci-libs/tensorflow-2.9.1-r3/work/tensorflow-2.9.1-python3_10/tensorflow/core/kernels/mlir_generated/BUILD:1430:19: Generating kernel '//tensorflow/core/kernels/mlir_generated:cast_gpu_i1_i8_kernel_generator' failed: (Exit 1): tf_to_kernel failed: error executing command (cd /var/tmp/portage/sci-libs/tensorflow-2.9.1-r3/work/tensorflow-2.9.1-python3_10-bazel-base/execroot/org_tensorflow && \ exec env - \ GLIBCXX_USE_CXX11_ABI=0 \ HOME=/var/tmp/portage/sci-libs/tensorflow-2.9.1-r3/homedir \ KERAS_HOME=/var/tmp/portage/sci-libs/tensorflow-2.9.1-r3/temp/.keras \ PATH=/var/tmp/portage/sci-libs/tensorflow-2.9.1-r3/temp/python3.10/bin:/usr/lib/portage/python3.10/ebuild-helpers/xattr:/usr/lib/portage/python3.10/ebuild-helpers:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/bin:/usr/lib/llvm/14/bin:/usr/lib64/subversion/bin:/opt/cuda/bin \ PYTHON_BIN_PATH=/usr/bin/python3.10 \ PYTHON_LIB_PATH=/usr/lib/python3.10/site-packages \ ROCBLAS_TENSILE_LIBPATH=/usr/lib/library \ ROCM_PATH=/usr \ TF2_BEHAVIOR=1 \ TF_SYSTEM_LIBS=boringssl,curl,cython,gif,icu,libjpeg_turbo,lmdb,nasm,png,pybind11,zlib \ bazel-out/k8-opt-exec-50AE0418/bin/tensorflow/compiler/mlir/tools/kernel_gen/tf_to_kernel '--tile_sizes=256' '--max-supported-rank=5' '--arch=gfx900' '--input=bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/cast_gpu_i1_i8.mlir' '--output=bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/cast_gpu_i1_i8_kernel_generator_kernel.o' '--enable_ftz=False' '--cpu_codegen=False' '--jit=False') # Configuration: 20e94f0da98a352d5b21b2ec04385024bfad09a8a1c63f958a9a85c13100ac17 # Execution platform: @local_execution_config_platform//:platform 2022-07-31 10:34:24.872181: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:263] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable. 2022-07-31 10:34:24.880676: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:400] target triple not found in the module error: Failure when generating HSACO 2022-07-31 10:34:24.889058: E tensorflow/compiler/mlir/tools/kernel_gen/tf_to_kernel.cc:206] INTERNAL: Generating device code failed. INFO: Elapsed time: 1147.909s, Critical Path: 103.38s INFO: 7075 processes: 1599 internal, 5476 local. FAILED: Build did NOT complete successfully Full build log https://gist.github.com/raw/99b818fccf082d91655cd8e5b74c6ffd How to fix this error ? Any feedback would be greatly appreciated.
(In reply to perestoronin from comment #26) > FAILED: Build did NOT complete successfully > > How to fix this error ? Fixed non-stanard location ld.lld in gpu_backend_lib.cc and build succesful.
(In reply to perestoronin from comment #27) > (In reply to perestoronin from comment #26) > > FAILED: Build did NOT complete successfully > > > > How to fix this error ? > > > Fixed non-stanard location ld.lld in gpu_backend_lib.cc and build succesful. Wonderful! Is this issue specific to ROCm with vanilla clang? Also, can you provide the patch you applied? This would be very helpful.
(In reply to Yiyang Wu from comment #28) > (In reply to perestoronin from comment #27) > > (In reply to perestoronin from comment #26) > > > FAILED: Build did NOT complete successfully > > > > > > How to fix this error ? > > > > > > Fixed non-stanard location ld.lld in gpu_backend_lib.cc and build succesful. > > Wonderful! > > Is this issue specific to ROCm with vanilla clang? Also, can you provide the > patch you applied? This would be very helpful. Yes. rocm.patch https://gist.github.com/raw/ed891528aacf0c5baf3a789e5e9aaead Also I applied other patches in order: >>> Preparing source in /var/tmp/portage/sci-libs/tensorflow-2.9.1-r3/work/tensorflow-2.9.1 ... * Applying rocm.patch ... [ ok ] * Applying 1607.patch ... [ ok ] * Applying 0003-WORKSPACE-add-rules-docker-http_archive-bazel-toolch.patch ... [ ok ] * Applying 55779.patch ... [ ok ] * Applying 55961.patch ... [ ok ] * Applying tf-cuda-rocm.patch ... [ ok ] * Adjusting to prefix /
Hi! I just started looking into this issue, as I want to get tensorflow running with rocm for an old gfx803 card, and find it a rather horrifying experience… thanks to all the effort put into it here. Is anybody still pursuing tensorflow with rocm and maybe has an updated 2.11 ebuild? I by now updated the rocm.patch provided here to match the new header structure, but new issues keep popping up. From what I see, it seems that a lot of patching is about resolving issues from how rocm is deployed in Gentoo. The number of issues seemingly increasing with every new tensorflow version. (In reply to perestoronin from comment #29) […] > > Yes. rocm.patch https://gist.github.com/raw/ed891528aacf0c5baf3a789e5e9aaead > > Also I applied other patches in order: > > >>> Preparing source in /var/tmp/portage/sci-libs/tensorflow-2.9.1-r3/work/tensorflow-2.9.1 ... > * Applying rocm.patch ... > [ ok ] > * Applying 1607.patch ... > [ ok ] > * Applying 0003-WORKSPACE-add-rules-docker-http_archive-bazel-toolch.patch > ... > [ ok ] > * Applying 55779.patch ... > [ ok ] > * Applying 55961.patch ... > [ ok ] > * Applying tf-cuda-rocm.patch ... > [ ok ] > * Adjusting to prefix / Your 2.9.1 build seems to be the most up to date success. Where can the other patches be found? rocm.patch and 0003-* are found here directly. Thanks everybody, I hope there still people active here. :-)
(In reply to Jan-Matthias Braun from comment #30) > Hi! > > I just started looking into this issue, as I want to get tensorflow running > with rocm for an old gfx803 card, and find it a rather horrifying > experience… thanks to all the effort put into it here. Is anybody still > pursuing tensorflow with rocm and maybe has an updated 2.11 ebuild? I by now > updated the rocm.patch provided here to match the new header structure, but > new issues keep popping up. > > From what I see, it seems that a lot of patching is about resolving issues > from how rocm is deployed in Gentoo. The number of issues seemingly > increasing with every new tensorflow version. > > (In reply to perestoronin from comment #29) > […] > > > > Yes. rocm.patch https://gist.github.com/raw/ed891528aacf0c5baf3a789e5e9aaead > > > > Also I applied other patches in order: > > > > >>> Preparing source in /var/tmp/portage/sci-libs/tensorflow-2.9.1-r3/work/tensorflow-2.9.1 ... > > * Applying rocm.patch ... > > [ ok ] > > * Applying 1607.patch ... > > [ ok ] > > * Applying 0003-WORKSPACE-add-rules-docker-http_archive-bazel-toolch.patch > > ... > > [ ok ] > > * Applying 55779.patch ... > > [ ok ] > > * Applying 55961.patch ... > > [ ok ] > > * Applying tf-cuda-rocm.patch ... > > [ ok ] > > * Adjusting to prefix / > > Your 2.9.1 build seems to be the most up to date success. Where can the > other patches be found? rocm.patch and 0003-* are found here directly. > > Thanks everybody, I hope there still people active here. :-) gfx803 cards no longer support in rocm by maintainers. currently I use rocm-5.4.3 with gfx900 16Gb Vega Frontier with tensorflow-2.11.0 compiled by bazel-6.1.0 I not publish my investigation on AI, but I can share a patch what U request: 0003-WORKSPACE-add-rules-docker-http_archive-bazel-toolch.patch https://gist.github.com/raw/531422c35e3caab7db53c2d96e451f1e
Current my list of patches to tensorflow very long: * Applying tf-bazel-update.patch ... [ ok ] * Applying tf-a100.patch ... [ ok ] * Applying tf-dill-update.patch ... patching file tensorflow/python/distribute/BUILD Hunk #1 succeeded at 473 (offset -2 lines). Hunk #2 succeeded at 1242 (offset -5 lines). Hunk #3 succeeded at 2087 (offset -5 lines). patching file tensorflow/python/distribute/cross_device_ops_test.py patching file tensorflow/python/distribute/failure_handling/BUILD Hunk #1 succeeded at 48 with fuzz 2 (offset -14 lines). Hunk #2 succeeded at 78 (offset -14 lines). patching file tensorflow/python/distribute/failure_handling/failure_handler_test.py Hunk #2 succeeded at 453 with fuzz 1 (offset -59 lines). patching file tensorflow/python/distribute/failure_handling/gce_failure_handler_test.py Hunk #2 succeeded at 481 (offset -24 lines). patching file tensorflow/python/distribute/multi_process_runner_test.py patching file tensorflow/python/distribute/multi_worker_continuous_run_test.py patching file tensorflow/workspace2.bzl [ ok ] * Applying tf-eigen-update.patch ... [ ok ] * Applying tf-onednn-update.patch ... [ ok ] * Applying tf-protobuf-4.21.patch ... patching file .bazelrc Hunk #1 succeeded at 551 (offset -6 lines). patching file tensorflow/opensource_only.files Hunk #1 succeeded at 137 with fuzz 2 (offset -32 lines). patching file tensorflow/tools/ci_build/release/requirements_common.txt Hunk #1 succeeded at 9 with fuzz 2 (offset -2 lines). patching file tensorflow/tools/def_file_filter/def_file_filter.py.tpl patching file tensorflow/tools/pip_package/setup.py Hunk #1 succeeded at 99 (offset -5 lines). patching file tensorflow/tools/toolchains/win/BUILD patching file tensorflow/tools/toolchains/win/tf_win_01232023/BUILD patching file tensorflow/tools/toolchains/win/tf_win_01232023/armeabi_cc_toolchain_config.bzl patching file tensorflow/tools/toolchains/win/tf_win_01232023/builtin_include_directory_paths_msvc patching file tensorflow/tools/toolchains/win/tf_win_01232023/toolchain_image_info patching file tensorflow/tools/toolchains/win/tf_win_01232023/windows_cc_toolchain_config.bzl patching file tensorflow/tsl/platform/default/build_config.bzl Hunk #1 succeeded at 405 (offset 4 lines). Hunk #2 succeeded at 422 (offset 4 lines). patching file tensorflow/workspace2.bzl Hunk #1 succeeded at 458 (offset 1 line). Hunk #2 succeeded at 561 (offset -6 lines). patching file third_party/pprof.BUILD patching file third_party/protobuf/protobuf.patch [ ok ] * Applying tf-sqlite-update.patch ... [ ok ] * Applying b3a8fdbcb79e723f8d62f86bddcfdfb73fe76291.patch ... [ ok ] * Applying 9be68da3a2c58984cc6ddaac18bc2ed51e42eaf2.patch ... [ ok ] * Applying 3540299f4fdd3f1c09b059f66a78353b89eb40b4.patch ... [ ok ] * Applying rocm-2.11.0-5.4.3.patch ... patching file tensorflow/core/util/gpu_solvers.h Hunk #1 succeeded at 33 with fuzz 2 (offset 1 line). patching file tensorflow/compiler/xla/stream_executor/rocm/hipsolver_wrapper.h patching file tensorflow/compiler/xla/stream_executor/rocm/rocblas_wrapper.h Hunk #1 succeeded at 20 with fuzz 2. patching file tensorflow/compiler/xla/stream_executor/rocm/rocm_blas.h patching file tensorflow/compiler/mlir/tools/kernel_gen/transforms/gpu_kernel_to_blob_pass.cc patching file tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc Hunk #1 succeeded at 734 (offset -5 lines). patching file tensorflow/tsl/platform/default/rocm_rocdl_path.cc patching file tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc Hunk #1 succeeded at 433 with fuzz 1 (offset 2 lines). patching file third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_rocm.tpl Hunk #2 succeeded at 136 with fuzz 1. Hunk #3 succeeded at 180 (offset 2 lines). patching file third_party/gpus/rocm_configure.bzl Hunk #2 succeeded at 322 (offset 14 lines). Hunk #5 succeeded at 753 (offset -50 lines). Hunk #6 succeeded at 777 (offset -50 lines). [ ok ] * Applying 0001-WORKSPACE-add-rules-docker-http_archive-bazel-toolch.patch ... [ ok ] * Applying 0002-systemlib-Latest-absl-LTS-has-split-cord-libs.patch ... [ ok ] * Applying 0003-mkl_dnn-Must-link-against-libm-for-round-and-log2.patch ... [ ok ] * Applying 0004-tensorflow_cc-Add-systemlib-nsync-linkopts-2.11.0.patch ... [ ok ] * Applying 0005-Relax-setup.py-version-requirements.patch ... [ ok ] * Applying 58345.patch ... [ ok ] * Applying pipes.patch ... [ ok ] * Applying workspace2.bzl.patch ... [ ok ] * Applying tf-abseil-cpp-update.patch ... [ ok ] * Applying tf-dlpack-update.patch ... [ ok ] * Applying tf-flatbuffers-update.patch ... [ ok ] * Applying tf-fp16-update.patch ... [ ok ] * Applying tf-gemmlowp-update.patch ... [ ok ] * Applying tf-go.patch ... [ ok ] * Applying tf-stablehlo-update.patch ... [ ok ] * Applying tf-workspace2-update.patch ... [ ok ] * Applying 55779.patch ... [ ok ] * Applying 55961.patch ... [ ok ] * Applying tf-roctracer-2.11.0-5.4.3.patch ... patching file tensorflow/core/profiler/backends/gpu/rocm_tracer.cc Hunk #1 succeeded at 1008 with fuzz 1 (offset -3 lines). Hunk #2 succeeded at 1486 (offset -3 lines). patching file tensorflow/compiler/xla/stream_executor/rocm/roctracer_wrapper.h Hunk #3 succeeded at 60 (offset -1 lines). Hunk #4 succeeded at 95 (offset -1 lines). [ ok ] * Applying tf-sparse-transpose-op.patch ... [ ok ] * Applying tf-cuda-host-compiler.patch ... PS. Also I update and patch rocm-5.4.3, llvm/clang-15, gcc-12.2.0, nvidia cuda 12.1.0
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=f57039ebc84c40eddf9f6a63afbbf86a8b3c42da commit f57039ebc84c40eddf9f6a63afbbf86a8b3c42da Author: Jakov Smolić <jsmolic@gentoo.org> AuthorDate: 2024-02-24 12:47:06 +0000 Commit: Jakov Smolić <jsmolic@gentoo.org> CommitDate: 2024-02-24 12:47:06 +0000 sci-libs/tensorflow: treeclean Bug: https://bugs.gentoo.org/807625 Closes: https://bugs.gentoo.org/906609 Closes: https://bugs.gentoo.org/905673 Closes: https://bugs.gentoo.org/913731 Closes: https://bugs.gentoo.org/882617 Closes: https://bugs.gentoo.org/881445 Closes: https://bugs.gentoo.org/915621 Closes: https://bugs.gentoo.org/909002 Closes: https://bugs.gentoo.org/705712 Closes: https://bugs.gentoo.org/873295 Closes: https://bugs.gentoo.org/907031 Closes: https://bugs.gentoo.org/909003 Closes: https://bugs.gentoo.org/909767 Closes: https://bugs.gentoo.org/913534 Closes: https://bugs.gentoo.org/818766 Closes: https://bugs.gentoo.org/830167 Closes: https://bugs.gentoo.org/854354 Closes: https://bugs.gentoo.org/851573 Closes: https://bugs.gentoo.org/780468 Closes: https://bugs.gentoo.org/910029 Closes: https://bugs.gentoo.org/897228 Closes: https://bugs.gentoo.org/844196 Closes: https://bugs.gentoo.org/910030 Closes: https://bugs.gentoo.org/897230 Closes: https://bugs.gentoo.org/788064 Signed-off-by: Jakov Smolić <jsmolic@gentoo.org> profiles/package.mask | 1 - sci-libs/tensorflow/Manifest | 57 -- ...dd-rules-docker-http_archive-bazel-toolch.patch | 37 - ...emlib-Latest-absl-LTS-has-split-cord-libs.patch | 32 - ...Must-link-against-libm-for-round-and-log2.patch | 29 - ...ensorflow_cc-Add-systemlib-nsync-linkopts.patch | 35 - ...systemlib-Updates-for-Abseil-20220623-LTS.patch | 71 -- ...0006-systemlib-Update-targets-for-absl_py.patch | 24 - ...temlib-Add-well_known_types_py_pb2-target.patch | 28 - ...-0008-Relax-setup.py-version-requirements.patch | 86 -- ....0-0009-systemlib-update-targets-for-absl.patch | 365 -------- ...010-systemlib-fix-missing-osx-in-pybind11.patch | 25 - ...temlib-fix-missing-LICENSE-in-flatbuffers.patch | 25 - ...nstallation-remove-cp_local_config_python.patch | 68 -- ...2.15.0-0013-build-use-non-hermetic-python.patch | 990 --------------------- sci-libs/tensorflow/metadata.xml | 15 - sci-libs/tensorflow/tensorflow-2.15.0.ebuild | 464 ---------- 17 files changed, 2352 deletions(-)