8000 Build error on libstc++ header stl_alogbase.h on riscv · Issue #99278 · pytorch/pytorch · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Build error on libstc++ header stl_alogbase.h on riscv #99278

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
JackTemaki opened this issue Apr 16, 2023 · 7 comments
Open

Build error on libstc++ header stl_alogbase.h on riscv #99278

JackTemaki opened this issue Apr 16, 2023 · 7 comments
Labels
good first issue module: build Build system issues triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@JackTemaki
Copy link
JackTemaki commented Apr 16, 2023

🐛 Describe the bug

I am currently trying to build PyTorch using the latest master commit (fdbc862) on riscv. While so far I encountered another issue that was clearly caused by compiling for riscv, this error might be a general issue:

[ 78%] Building CXX object test_api/CMakeFiles/test_api.dir/init.cpp.o
In file included from /usr/include/c++/12/memory:63,
                 from /home/user/git/pytorch/third_party/googletest/googletest/include/gtest/gtest.h:57,                                                                                                           
                 from /home/user/git/pytorch/test/cpp/api/dataloader.cpp:1:
In static member function _static _Tp* std::__copy_move<_IsMove, true, std::random_access_iterator_tag>::__copy_m(const _Tp*, const _Tp*, _Tp*) [with _Tp = long unsigned int; bool _IsMove = false]_,
    inlined from __OI std::__copy_move_a2(_II, _II, _OI) [with bool _IsMove = false; _II = const long unsigned int*; _OI = long unsigned int*]_ at /usr/include/c++/12/bits/stl_algobase.h:495:30,
    inlined from __OI std::__copy_move_a1(_II, _II, _OI) [with bool _IsMove = false; _II = const long unsigned int*; _OI = long unsigned int*]_ at /usr/include/c++/12/bits/stl_algobase.h:522:42,
    inlined from __OI std::__copy_move_a(_II, _II, _OI) [with bool _IsMove = false; _II = __gnu_cxx::__normal_iterator<const long unsigned int*, vector<long unsigned int> >; _OI = __gnu_cxx::__normal_iterator<lo
ng unsigned int*, vector<long unsigned int> >]_ at /usr/include/c++/12/bits/stl_algobase.h:529:31,
    inlined from __OI std::copy(_II, _II, _OI) [with _II = __gnu_cxx::__normal_iterator<const long unsigned int*, vector<long unsigned int> >; _OI = __gnu_cxx::__normal_iterator<long unsigned int*, vector<long u
nsigned int> >]_ at /usr/include/c++/12/bits/stl_algobase.h:620:7,
    inlined from _std::vector<_Tp, _Alloc>& std::vector<_Tp, _Alloc>::operator=(const std::vector<_Tp, _Alloc>&) [with _Tp = long unsigned int; _Alloc = std::allocator<long unsigned int>]_ at /usr/include/c++/12
/bits/vector.tcc:244:21:
/usr/include/c++/12/bits/stl_algobase.h:431:30: error: argument 1 null where non-null expected [-Werror=nonnull]
  431 |             __builtin_memmove(__result, __first, sizeof(_Tp) * _Num);
      |             ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/c++/12/bits/stl_algobase.h:431:30: note: in a call to built-in function _void* __builtin_memmove(void*, const void*, long unsigned int)_
[ 78%] Building CXX object test_api/CMakeFiles/test_api.dir/jit.cpp.o
At global scope:
cc1plus: note: unrecognized command-line option _-Wno-aligned-allocation-unavailable_ may have been intended to silence earlier diagnostics
cc1plus: note: unrecognized command-line option _-Wno-unused-private-field_ may have been intended to silence earlier diagnostics
cc1plus: note: unrecognized command-line option _-Wno-invalid-partial-specialization_ may have been intended to silence earlier diagnostics
cc1plus: some warnings being treated as errors
gmake[2]: *** [test_api/CMakeFiles/test_api.dir/build.make:118: test_api/CMakeFiles/test_api.dir/dataloader.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....

My first guess is that this is an unlucky combination of the libstc++ version, compiler version and the -Werror=nonnull flag. I will try to set up the same build on an x86 machine but this might take some time. I will also try to compile using the tagged 2.0.0 version and report if this changes something.

Note for riscv compilation (In case someone wants to reproduce this exactly):
The third-party lib SLEEF (https://github.com/shibatch/sleef) will only compile with the small fix from shibatch/sleef#448. It is e.g. possible to compile SLEEF separately with the fix included and use USE_SYSTEM_SLEEF=ON for compiling PyTorch.

Addition: Kineto also does not build for now, and can be disabled with USE_KINETO=0.

Versions

PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A

OS: Debian GNU/Linux bookworm/sid (riscv64)
GCC version: (Debian 12.2.0-10) 12.2.0
Clang version: 14.0.6
CMake version: version 3.25.1
Libc version: glibc-2.36

Python version: 3.10.9 (main, Dec 7 2022, 13:47:07) [GCC 12.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-starfive-riscv64-with-glibc2.36
Is CUDA available: N/A
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: N/A

CPU:
Architecture: riscv64
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3

Versions of relevant libraries:
[pip3] numpy==1.24.2
[conda] Could not collect

cc @malfet @seemethere

@malfet malfet added needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user module: build Build system issues triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module good first issue and removed needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user labels Apr 17, 2023
@malfet
Copy link
Contributor
malfet commented Apr 17, 2023

Thank you for reporting. If you have a PR that fixes the issue, please do not hesitate to submit it.

@JackTemaki
Copy link
Author
JackTemaki commented Apr 17, 2023

Okay this seems to be some mysterious compiler optimization issue. For me it seems that it is expected that __result is passed as null because it should store the return value. According to https://rkoucha.fr/tech_corner/nonnull_gcc_attribute.html compiler optimitzation can influence if this warning triggers or not.

This might mean the code is differently optimized for riscv, so the warning/error triggers.

The compiler setting -Werror=format is set, and format implies non-null. So one solution is to just set -Wno-error=nonnull for this part of the compilation. I guess I will make a PR with that.

JackTemaki added a commit to JackTemaki/pytorch that referenced this issue Apr 18, 2023
On some platforms the build might fail due to the nonnull error
being triggered by different compiler behavior.

Fix for pytorch#99278.
JackTemaki added a commit to JackTemaki/pytorch that referenced this issue Apr 18, 2023
On some platforms the build might fail due to the nonnull error
being triggered by different compiler behavior.

Fix for pytorch#99278.
@JackTemaki
Copy link
Author

It might be worth noticing that with the above mentioned changes for sleef and kineto plus the proposed fix, I was able to successfully compile and run PyTorch on a VisionFive2 riscv board with the default Debian image.

@yusharth
Copy link

@JackTemaki Can I help you with this bug by any means?

@JackTemaki
Copy link
Author

@yusharth thanks for responding here! I already opened a PR for the fix: #99468, but I am not sure how and when this is reviewed or corrected.

@isuruf
Copy link
Collaborator
isuruf commented Oct 31, 2023

I can reproduce on x86_64 and #112089 is a ppc64le reproducer.

@stevef1uk
Copy link
stevef1uk commented Jan 20, 2024

I am trying to build PyTorch on a RPi 5 using the Debian 1:6.1.63-1+rpt1 (2023-11-24) aarch64 GNU/Linux

I am trying to use Vulcan and see if I can get the gpu to be useful for nanoGPT.

USE_VULKAN=1 USE_VULKAN_SHADERC_RUNTIME=1 USE_VULKAN_WRAPPER=0 MSVC=0 MAX_JOBS=4 python setup.py install

I am getting the same error:
/usr/include/c++/12/bits/stl_algobase.h:431:30: error: argument 1 null where non-null expected [-Werror=nonnull]
431 | __builtin_memmove(__result, __first, sizeof(_Tp) * _Num);
| ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/c++/12/bits/stl_algobase.h:431:30: note: in a call to built-in function ‘void* __builtin_memmove(void*, const void*, long unsigned int)’
/mnt/llm/pytorch/test/cpp/api/dataloader.cpp: At global scope:
/mnt/llm/pytorch/test/cpp/api/dataloader.cpp:2322:1: fatal error: opening dependency file test_api/CMakeFiles/test_api.dir/dataloader.cpp.o.d: No such file or directory

when the following command is run.

/usr/bin/c++ -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_RPC -DUSE_TENSORPIPE -D_FILE_OFFSET_BITS=64 -I/mnt/llm/pytorch/build/aten/src -I/mnt/llm/pytorch/aten/src -I/mnt/llm/pytorch/build -I/mnt/llm/pytorch -I/mnt/llm/pytorch/cmake/../third_party/benchmark/include -I/mnt/llm/pytorch/third_party/onnx -I/mnt/llm/pytorch/build/third_party/onnx -I/mnt/llm/pytorch/third_party/foxi -I/mnt/llm/pytorch/build/third_party/foxi -I/mnt/llm/pytorch/build/caffe2/../aten/src -I/mnt/llm/pytorch/torch/csrc/api -I/mnt/llm/pytorch/torch/csrc/api/include -I/mnt/llm/pytorch/c10/.. -isystem /mnt/llm/pytorch/build/third_party/gloo -isystem /mnt/llm/pytorch/cmake/../third_party/gloo -isystem /mnt/llm/pytorch/cmake/../third_party/tensorpipe/third_party/libuv/include -isystem /mnt/llm/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /mnt/llm/pytorch/cmake/../third_party/googletest/googletest/include -isystem /mnt/llm/pytorch/third_party/protobuf/src -isystem /mnt/llm/pytorch/third_party/gemmlowp -isystem /mnt/llm/pytorch/third_party/neon2sse -isystem /mnt/llm/pytorch/third_party/XNNPACK/include -isystem /mnt/llm/pytorch/cmake/../third_party/eigen -isystem /mnt/llm/pytorch/third_party/googletest/googletest/include -isystem /mnt/llm/pytorch/third_party/googletest/googletest -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN -DUSE_VULKAN_API -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-error=nonnull -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -O3 -DNDEBUG -DNDEBUG -fPIE -DTORCH_USE_LIBUV -DCAFFE2_USE_GLOO -D__NEON__ -Wno-unused-variable -Wno-missing-braces -Wno-maybe-uninitialized -Wno-unused-but-set-parameter -std=gnu++17 -MD -MT test_api/CMakeFiles/test_api.dir/dataloader.cpp.o -MF test_api/CMakeFiles/test_api.dir/dataloader.cpp.o.d -o test_api/CMakeFiles/test_api.dir/dataloader.cpp.o -c /mnt/llm/pytorch/test/cpp/api/dataloader.cpp

Update: I managed to build following this guide: https://qengineering.eu/install-pytorch-on-raspberry-pi-4.html

krafczyk pushed a commit to ncsa/pytorch that referenced this issue Nov 24, 2024
On some platforms the build might fail due to the nonnull error
being triggered by different compiler behavior.

Fix for pytorch#99278.
krafczyk pushed a commit to ncsa/pytorch that referenced this issue Dec 3, 2024
On some platforms the build might fail due to the nonnull error
being triggered by different compiler behavior.

Fix for pytorch#99278.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue module: build Build system issues triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants
0