[caffe2] EigenTranspose problem in math_cpu.cc #6847

dimilar · 2018-04-22T14:06:59Z

with DEBUG mode, there is a bug in the following code snippet of math_cpu.cc(line 2084).

EigenTensorMap<T, D>(Y, Y_dims) =
      EigenTensorMap<T, D>(const_cast<T*>(X), X_dims).shuffle(axes_array);

runtime error message:

sertion failed: (dimensions_match(m_leftImpl.dimensions(), m_rightImpl.dimensions())), function evalSubExprsIfNeeded, file /usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h, line 123.

pjh5 · 2018-04-25T17:52:36Z

Can you give the full error message? Can you also give your eigen version, operating system, and how you built Caffe2?

dimilar · 2018-04-29T13:40:43Z

@pjh5, #6668 ok, the following is the infomation:
eigen version: e9e95489a (third_parth/eigen)
operating system: macOS 10.13.4
caffe2 build command:
cmake ../ -DCMAKE_VERBOSE_MAKEFILE=1 -DBUILD_TEST=OFF -DProtobuf_PROTOC_EXECUTABLE=/usr/local/Cellar/protobuf/3.5.2/bin/protoc -DProtobuf_DIR=/usr/local/Cellar/protobuf/3.5.2 -DProtobuf_LIBRARY_DEBUG=/usr/local/Cellar/protobuf/3.5.2/lib/libprotobuf.a -DProtobuf_LIBRARY_RELEASE=/usr/local/Cellar/protobuf/3.5.2/lib/libprotobuf.a -DProtobuf_INCLUDE_DIR=/usr/local/Cellar/protobuf/3.5.2/include -DUSE_CUDA=ON -DCUDNN_ROOT_DIR=/usr/local/cuda-9.1/cudnn7 -DBLAS=MKL -DMKLML_USE_SINGLE_DYNAMIC_LIBRARY=OFF -DMKLML_USE_STATIC_LIBS=ON -DMKLML_MULTI_THREADED=OFF -DUSE_FFMPEG=OFF -DUSE_REDIS=OFF -DUSE_MPI=OFF -DUSE_NCCL=OFF -DUSE_GLOO=OFF -DBUILD_CUSTOM_PROTOBUF=OFF -DCMAKE_BUILD_TYPE=Debug -DCUDA_NVCC_FLAGS=-g -DCUDA_CUDA_LIB=/usr/local/cuda/lib/libcuda.dylib -DCMAKE_MODULE_LINKER_FLAGS_DEBUG="-lopencv_core -lopencv_highgui -lopencv_imgproc" -DCUDA_CUDA_LIB=/usr/local/cuda/lib/libcuda.dylib -DMKLINC=/opt/intel/mkl/include -DMKL_INCLUDE_DIR=/opt/intel/mkl/include -DCAFFE2_HAS_MKL_DNN=ON -DCAFFE2_USE_MKL=ON -DCUDA_NVCC_FLAGS="-g"

full error message:

ssertion failed: (dimensions_match(m_leftImpl.dimensions(), m_rightImpl.dimensions())), function evalSubExprsIfNeeded, file /usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h, line 123.
Process 19385 stopped

thread Matrix multiplication operator #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
frame #0: 0x00007fff74ea2b6e libsystem_kernel.dylib__pthread_kill + 10 libsyst 8000 em_kernel.dylib__pthread_kill:
-> 0x7fff74ea2b6e <+10>: jae 0x7fff74ea2b78 ; <+20>
0x7fff74ea2b70 <+12>: movq %rax, %rdi
0x7fff74ea2b73 <+15>: jmp 0x7fff74e99b00 ; cerror_nocancel
0x7fff74ea2b78 <+20>: retq
Target 0: (iat_fd_image.bin) stopped.
(lldb) bt

thread Matrix multiplication operator #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT

frame #0: 0x00007fff74ea2b6e libsystem_kernel.dylib__pthread_kill + 10 frame #1: 0x00007fff7506d080 libsystem_pthread.dylibpthread_kill + 333
frame Don't support legacy Python #2: 0x00007fff74dfe1ae libsystem_c.dylibabort + 127 frame #3: 0x00007fff74dc61ac libsystem_c.dylib__assert_rtn + 320
frame PEP8 #4: 0x0000000103a3ddee libdnnie_cpu.1.dylibEigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 4, 0, long>, 0, Eigen::MakePointer>, Eigen::TensorShufflingOp<std::__1::array<long, 4ul> const, Eigen::TensorMap<Eigen::Tensor<float, 4, 0, long>, 0, Eigen::MakePointer> > const> const, Eigen::DefaultDevice>::evalSubExprsIfNeeded(this=0x00007ffeefbf9210, (null)=0x0000000000000000) at TensorAssign.h:123 frame #5: 0x0000000103a3db5a libdnnie_cpu.1.dylibEigen::internal::TensorExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 4, 0, long>, 0, Eigen::MakePointer>, Eigen::TensorShufflingOp<std::__1::array<long, 4ul> const, Eigen::TensorMap<Eigen::Tensor<float, 4, 0, long>, 0, Eigen::MakePointer> > const> const, Eigen::DefaultDevice, true>::run(expr=0x00007ffeefbf9330, device=0x00007ffeefbf9328) at TensorExecutor.h:57
frame Remove dampening from SGD #6: 0x0000000103a3da55 libdnnie_cpu.1.dylibEigen::TensorMap<Eigen::Tensor<float, 4, 0, long>, 0, Eigen::MakePointer>& Eigen::TensorMap<Eigen::Tensor<float, 4, 0, long>, 0, Eigen::MakePointer>::operator=<Eigen::TensorShufflingOp<std::__1::array<long, 4ul> const, Eigen::TensorMap<Eigen::Tensor<float, 4, 0, long>, 0, Eigen::MakePointer> > >(this=0x00007ffeefbf9370, other=0x00007ffeefbf93c0) at TensorMap.h:310 frame #7: 0x0000000103a3a807 libdnnie_cpu.1.dylibvoid caffe2::math::(anonymous namespace)::EigenTransposeImpl<float, 4>(dims=0x000000010fd0f3e0, axes=0x00000001182a0a60, X=0x0000000114f97000, Y=0x00000001180b9000) at math_cpu.cc:2079
frame fake commit #8: 0x00000001039e39d5 libdnnie_cpu.1.dylibbool caffe2::math::(anonymous namespace)::EigenTranspose<float>(ndim=4, dims=0x000000010fd0f3e0, axes=0x00000001182a0a60, X=0x0000000114f97000, Y=0x00000001180b9000) at math_cpu.cc:2105 frame #9: 0x00000001039e38e7 libdnnie_cpu.1.dylibvoid caffe2::math::Transpose<float, caffe2::CPUContext>(ndim=4, dims=0x000000010fd0f3e0, axes=0x00000001182a0a60, X=0x0000000114f97000, Y=0x00000001180b9000, (null)=0x00000001182a0638) at math_cpu.cc:2145
frame Tensors don't print sometimes #10: 0x00000001037fb1d4 libdnnie_cpu.1.dylibbool caffe2::TransposeOp<caffe2::CPUContext>::DoRunWithType<float>(this=0x00000001182a0540) at transpose_op.h:60 frame #11: 0x00000001037fadaf libdnnie_cpu.1.dylibbool caffe2::DispatchHelper<caffe2::TensorTypes<float, double, int, long long> >::call<caffe2::TransposeOpcaffe2::CPUContext >(op=0x00000001182a0540, meta=0x000000010fd228b8) at operator.h:671
frame Initial utils implementation + bug fixes #12: 0x00000001037fad70 libdnnie_cpu.1.dylibbool caffe2::DispatchHelper<caffe2::TensorTypes<float, double, int, long long> >::call<caffe2::TransposeOp<caffe2::CPUContext>, caffe2::CPUContext>(op=0x00000001182a0540, tensor=0x000000010fd22890) at operator.h:671 frame #13: 0x00000001037faccf libdnnie_cpu.1.dylibcaffe2::TransposeOpcaffe2::CPUContext::RunOnDevice(this=0x00000001182a0540) at transpose_op.h:44
frame Clean up Module forward and __call__ #14: 0x0000000102d7de37 libdnnie_cpu.1.dylibcaffe2::Operator<caffe2::CPUContext>::Run(this=0x00000001182a0540, stream_id=0) at operator.h:408 frame #15: 0x0000000102c6ffb7 libdnnie_cpu.1.dylibcaffe2::SimpleNet::Run(this=0x0000000118262d20) at net_simple.cc:58
frame Error on legacy.nn serialization #16: 0x0000000102d5b5ad libdnnie_cpu.1.dylib`caffe2::Workspace::RunNet(this=0x000000010fbf9cc8, name="") at workspace.cc:274

and two hints:

same errors on Linux (Centos 7), similar message, same eigen version
the same code works with Release mode, because eigen_assert is ignored.

dimilar · 2018-05-02T15:24:44Z

@pjh5 I think we can close the issue now because the eigen transpose is disabled in the latest commit #7112 .

pjh5 self-assigned this Apr 25, 2018

pjh5 added the caffe2 label Apr 25, 2018

pjh5 closed this as completed May 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[caffe2] EigenTranspose problem in math_cpu.cc #6847

[caffe2] EigenTranspose problem in math_cpu.cc #6847

[caffe2] EigenTranspose problem in math_cpu.cc #6847

[caffe2] EigenTranspose problem in math_cpu.cc #6847

Comments