8000 [caffe2] EigenTranspose problem in math_cpu.cc · Issue #6847 · pytorch/pytorch · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
  • Notifications You must be signed in to change notification settings
  • < 8000 a icon="repo-forked" id="fork-button" href="/login?return_to=%2Fpytorch%2Fpytorch" rel="nofollow" data-hydro-click="{"event_type":"authentication.click","payload":{"location_in_page":"repo details fork button","repository_id":65600975,"auth_type":"LOG_IN","originating_url":"https://github.com/pytorch/pytorch/issues/6847","user_id":null}}" data-hydro-click-hmac="971ae86a263e5e359037a02afcb6ec995c7a85ad5b225f49f75b5310e2bbfef3" data-view-component="true" class="btn-sm btn"> Fork 24.1k

[caffe2] EigenTranspose problem in math_cpu.cc #6847

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dimilar opened this issue Apr 22, 2018 · 3 comments
Closed

[caffe2] EigenTranspose problem in math_cpu.cc #6847

dimilar opened this issue Apr 22, 2018 · 3 comments
Assignees
Labels

Comments

@dimilar
Copy link
dimilar commented Apr 22, 2018

with DEBUG mode, there is a bug in the following code snippet of math_cpu.cc(line 2084).

EigenTensorMap<T, D>(Y, Y_dims) =
      EigenTensorMap<T, D>(const_cast<T*>(X), X_dims).shuffle(axes_array);

runtime error message:

sertion failed: (dimensions_match(m_leftImpl.dimensions(), m_rightImpl.dimensions())), function evalSubExprsIfNeeded, file /usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h, line 123.

@pjh5
Copy link
Contributor
pjh5 commented Apr 25, 2018

Can you give the full error message? Can you also give your eigen version, operating system, and how you built Caffe2?

@pjh5 pjh5 self-assigned this Apr 25, 2018
@pjh5 pjh5 added the caffe2 label Apr 25, 2018
@dimilar
Copy link
Author
dimilar commented Apr 29, 2018

@pjh5, #6668 ok, the following is the infomation:
eigen version: e9e95489a (third_parth/eigen)
operating system: macOS 10.13.4
caffe2 build command:
cmake ../ -DCMAKE_VERBOSE_MAKEFILE=1 -DBUILD_TEST=OFF -DProtobuf_PROTOC_EXECUTABLE=/usr/local/Cellar/protobuf/3.5.2/bin/protoc -DProtobuf_DIR=/usr/local/Cellar/protobuf/3.5.2 -DProtobuf_LIBRARY_DEBUG=/usr/local/Cellar/protobuf/3.5.2/lib/libprotobuf.a -DProtobuf_LIBRARY_RELEASE=/usr/local/Cellar/protobuf/3.5.2/lib/libprotobuf.a -DProtobuf_INCLUDE_DIR=/usr/local/Cellar/protobuf/3.5.2/include -DUSE_CUDA=ON -DCUDNN_ROOT_DIR=/usr/local/cuda-9.1/cudnn7 -DBLAS=MKL -DMKLML_USE_SINGLE_DYNAMIC_LIBRARY=OFF -DMKLML_USE_STATIC_LIBS=ON -DMKLML_MULTI_THREADED=OFF -DUSE_FFMPEG=OFF -DUSE_REDIS=OFF -DUSE_MPI=OFF -DUSE_NCCL=OFF -DUSE_GLOO=OFF -DBUILD_CUSTOM_PROTOBUF=OFF -DCMAKE_BUILD_TYPE=Debug -DCUDA_NVCC_FLAGS=-g -DCUDA_CUDA_LIB=/usr/local/cuda/lib/libcuda.dylib -DCMAKE_MODULE_LINKER_FLAGS_DEBUG="-lopencv_core -lopencv_highgui -lopencv_imgproc" -DCUDA_CUDA_LIB=/usr/local/cuda/lib/libcuda.dylib -DMKLINC=/opt/intel/mkl/include -DMKL_INCLUDE_DIR=/opt/intel/mkl/include -DCAFFE2_HAS_MKL_DNN=ON -DCAFFE2_USE_MKL=ON -DCUDA_NVCC_FLAGS="-g"

full error message:

ssertion failed: (dimensions_match(m_leftImpl.dimensions(), m_rightImpl.dimensions())), function evalSubExprsIfNeeded, file /usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h, line 123.
Process 19385 stopped

  • thread Matrix multiplication operator #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
    frame #0: 0x00007fff74ea2b6e libsystem_kernel.dylib__pthread_kill + 10 libsyst 8000 em_kernel.dylib__pthread_kill:
    -> 0x7fff74ea2b6e <+10>: jae 0x7fff74ea2b78 ; <+20>
    0x7fff74ea2b70 <+12>: movq %rax, %rdi
    0x7fff74ea2b73 <+15>: jmp 0x7fff74e99b00 ; cerror_nocancel
    0x7fff74ea2b78 <+20>: retq
    Target 0: (iat_fd_image.bin) stopped.
    (lldb) bt
  • thread Matrix multiplication operator #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
    • frame #0: 0x00007fff74ea2b6e libsystem_kernel.dylib__pthread_kill + 10 frame #1: 0x00007fff7506d080 libsystem_pthread.dylibpthread_kill + 333
      frame Don't support legacy Python #2: 0x00007fff74dfe1ae libsystem_c.dylibabort + 127 frame #3: 0x00007fff74dc61ac libsystem_c.dylib__assert_rtn + 320
      frame PEP8 #4: 0x0000000103a3ddee libdnnie_cpu.1.dylibEigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 4, 0, long>, 0, Eigen::MakePointer>, Eigen::TensorShufflingOp<std::__1::array<long, 4ul> const, Eigen::TensorMap<Eigen::Tensor<float, 4, 0, long>, 0, Eigen::MakePointer> > const> const, Eigen::DefaultDevice>::evalSubExprsIfNeeded(this=0x00007ffeefbf9210, (null)=0x0000000000000000) at TensorAssign.h:123 frame #5: 0x0000000103a3db5a libdnnie_cpu.1.dylibEigen::internal::TensorExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 4, 0, long>, 0, Eigen::MakePointer>, Eigen::TensorShufflingOp<std::__1::array<long, 4ul> const, Eigen::TensorMap<Eigen::Tensor<float, 4, 0, long>, 0, Eigen::MakePointer> > const> const, Eigen::DefaultDevice, true>::run(expr=0x00007ffeefbf9330, device=0x00007ffeefbf9328) at TensorExecutor.h:57
      frame Remove dampening from SGD #6: 0x0000000103a3da55 libdnnie_cpu.1.dylibEigen::TensorMap<Eigen::Tensor<float, 4, 0, long>, 0, Eigen::MakePointer>& Eigen::TensorMap<Eigen::Tensor<float, 4, 0, long>, 0, Eigen::MakePointer>::operator=<Eigen::TensorShufflingOp<std::__1::array<long, 4ul> const, Eigen::TensorMap<Eigen::Tensor<float, 4, 0, long>, 0, Eigen::MakePointer> > >(this=0x00007ffeefbf9370, other=0x00007ffeefbf93c0) at TensorMap.h:310 frame #7: 0x0000000103a3a807 libdnnie_cpu.1.dylibvoid caffe2::math::(anonymous namespace)::EigenTransposeImpl<float, 4>(dims=0x000000010fd0f3e0, axes=0x00000001182a0a60, X=0x0000000114f97000, Y=0x00000001180b9000) at math_cpu.cc:2079
      frame fake commit #8: 0x00000001039e39d5 libdnnie_cpu.1.dylibbool caffe2::math::(anonymous namespace)::EigenTranspose<float>(ndim=4, dims=0x000000010fd0f3e0, axes=0x00000001182a0a60, X=0x0000000114f97000, Y=0x00000001180b9000) at math_cpu.cc:2105 frame #9: 0x00000001039e38e7 libdnnie_cpu.1.dylibvoid caffe2::math::Transpose<float, caffe2::CPUContext>(ndim=4, dims=0x000000010fd0f3e0, axes=0x00000001182a0a60, X=0x0000000114f97000, Y=0x00000001180b9000, (null)=0x00000001182a0638) at math_cpu.cc:2145
      frame Tensors don't print sometimes #10: 0x00000001037fb1d4 libdnnie_cpu.1.dylibbool caffe2::TransposeOp<caffe2::CPUContext>::DoRunWithType<float>(this=0x00000001182a0540) at transpose_op.h:60 frame #11: 0x00000001037fadaf libdnnie_cpu.1.dylibbool caffe2::DispatchHelper<caffe2::TensorTypes<float, double, int, long long> >::call<caffe2::TransposeOpcaffe2::CPUContext >(op=0x00000001182a0540, meta=0x000000010fd228b8) at operator.h:671
      frame Initial utils implementation + bug fixes #12: 0x00000001037fad70 libdnnie_cpu.1.dylibbool caffe2::DispatchHelper<caffe2::TensorTypes<float, double, int, long long> >::call<caffe2::TransposeOp<caffe2::CPUContext>, caffe2::CPUContext>(op=0x00000001182a0540, tensor=0x000000010fd22890) at operator.h:671 frame #13: 0x00000001037faccf libdnnie_cpu.1.dylibcaffe2::TransposeOpcaffe2::CPUContext::RunOnDevice(this=0x00000001182a0540) at transpose_op.h:44
      frame Clean up Module forward and __call__ #14: 0x0000000102d7de37 libdnnie_cpu.1.dylibcaffe2::Operator<caffe2::CPUContext>::Run(this=0x00000001182a0540, stream_id=0) at operator.h:408 frame #15: 0x0000000102c6ffb7 libdnnie_cpu.1.dylibcaffe2::SimpleNet::Run(this=0x0000000118262d20) at net_simple.cc:58
      frame Error on legacy.nn serialization #16: 0x0000000102d5b5ad libdnnie_cpu.1.dylib`caffe2::Workspace::RunNet(this=0x000000010fbf9cc8, name="") at workspace.cc:274

and two hints:

  • same errors on Linux (Centos 7), similar message, same eigen version
  • the same code works with Release mode, because eigen_assert is ignored.

@dimilar
Copy link
Author
dimilar commented May 2, 2018

@pjh5 I think we can close the issue now because the eigen transpose is disabled in the latest commit #7112 .

@pjh5 pjh5 closed this as completed May 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants
0