8000 Remove warnings from Thyra and Stratimikos impacting CUDA · Issue #1140 · trilinos/Trilinos · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Remove warnings from Thyra and Stratimikos impacting CUDA #1140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bartlettroscoe opened this issue Mar 14, 2017 · 3 comments
Closed

Remove warnings from Thyra and Stratimikos impacting CUDA #1140

bartlettroscoe opened this issue Mar 14, 2017 · 3 comments
Assignees
Labels
pkg: Stratimikos pkg: Thyra Issues primarily dealing with the Thyra Package

Comments

@bartlettroscoe
Copy link
Member

This will take care of Thyra and Stratimikos warnings reported in #1133.

@bartlettroscoe bartlettroscoe self-assigned this Mar 14, 2017
@bartlettroscoe bartlettroscoe added pkg: Stratimikos pkg: Thyra Issues primarily dealing with the Thyra Package labels Mar 14, 2017
@bartlettroscoe
Copy link
Member Author

CC: @trilinos/thyra, @trilinos/stratimikos

I am getting to work on this now.

@bartlettroscoe bartlettroscoe added the stage: in progress Work on the issue has started label Mar 14, 2017
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Mar 14, 2017
This is just a few of the warnings reported in trilinos#1133, just the warnings
reported by GCC 4.7.2.  The rest will be take care of next.
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Mar 15, 2017
…#1140)

This breaks up the single configure script that Matt B. provided into its
three logical parts.
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Mar 16, 2017
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Mar 16, 2017
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Mar 16, 2017
I get link failures otherwise.  Something seems to not be complete in the env
otherwise.
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Mar 16, 2017
This varible is used it is just that this version of nvcc does not realize
that.  This refactored version is actually less code and is just as clear.
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Mar 17, 2017
…#1140)

This breaks up the single configure script that Matt B. provided into its
three logical parts.
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Mar 17, 2017
I get link failures otherwise.  Something seems to not be complete in the env
otherwise.
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Mar 17, 2017
This varible is used it is just that this version of nvcc does not realize
that.  This refactored version is actually less code and is just as clear.
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Mar 17, 2017
Enabling tests with Thyra this gets rid of all of the Thyra warnings.  Most of
these warnings were 'statement is unreachable'.  I used the new macro
TEUCHOS_UNREACHABLE_RETURN().

There was one unused var warning in a test that was a real used var.  It was
good to see that and fix it.
bartlettroscoe added a commit that referenced this issue Mar 17, 2017
NVCC thinks these vars are not needed but GCC disagrees.  The build fails
unless you put these back.

Build/Test Cases Summary
Enabled Packages: TeuchosCore, ThyraEpetraAdapters, ThyraEpetraExtAdapters, ThyraTpetraAdapters, ThyraCore
Disabled Packages: PyTrilinos,Claps,TriKota
Enabled all Forward Packages
0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=2233,notpassed=0 (92.29 min)
Other local commits for this build/test group: 4cc32a2, 2e6e050, c5bbe62, 1ce3bd0, 790f463, 39e315d
@bartlettroscoe
Copy link
Member Author
bartlettroscoe commented Mar 17, 2017

I was able to reproduce warnings for Thyra for the NVCC/CUDA build on shiller using the env provded in #1133. I extracted this into the scripts under Trilinos:

cmake/ctest/drivers/ATTB/attb_cuda_config_base.sh
cmake/ctest/drivers/ATTB/attb_cuda_config_for_drekar.sh
cmake/ctest/drivers/ATTB/load_attb_cuda_env.sh

and used these to reproduce warnings in Thyra and fix them. See details below.

I created the new macro TEUCHOS_NONREACHABLE_RETURN(RETURN_VAL) and used it to remove all of the "statement is unreachable" warnings.

Next I will remove all of the warnings for Stratimikos for NVCC/CUDA.


DETAILED NOTES:

(2017/03/16)

@bathmath provided the pretty complete configure script above. It even has the module loads so it is self-contained. The only piece of info missing is what machine these builds are done on. My guess are the ATTB machines hansen and shiller.

First, I need to separate this configure script into its three logical parts:

  1. The setup of the env and modules

  2. The specification of the env passed to cmake

  3. The list of Trilinos packages to enable/disable

I broke up this script into the thre files:

  • load_attb_cuda_env.sh
  • attb_cuda_config_base.sh
  • attb_cuda_config_for_drekar.sh

I pushed this to the branch:

as of commit 1b2f8ce.

I then tried to reproduce the configure and build with:

$ time ./do-configure -DTrilinos_ENABLE_TESTS=ON \
  -DTrilinos_ENABLE_Thyra=ON -DTrilinos_ENABLE_Stratimikos=ON \
  &> configure.out

real    1m56.765s
user    1m5.000s
sys     0m11.434s

$ time make -j10 &> make.out

real    12m31.788s
user    74m36.931s
sys     19m48.565s

That build failed with link failures like:

[100%] Linking CXX executable ThyraCore_DefaultBlockedLinearOpUnitTests.exe
/home/projects/x86-64-haswell/blas/20150602/gcc/4.8.4/lib/libblas.a(xerbla.o): In function `xerbla_':
xerbla.f:(.text+0x52): undefined reference to `_gfortran_st_write'
xerbla.f:(.text+0x5d): undefined reference to `_gfortran_string_len_trim'
xerbla.f:(.text+0x6f): undefined reference to `_gfortran_transfer_character_write'
xerbla.f:(.text+0x7f): undefined reference to `_gfortran_transfer_integer_write'
xerbla.f:(.text+0x87): undefined reference to `_gfortran_st_write_done'
xerbla.f:(.text+0x90): undefined reference to `_gfortran_stop_string'
collect2: error: ld returned 1 exit status
make[3]: *** [packages/thyra/core/test/operator_vector/ThyraCore_DefaultBlockedLinearOpUnitTests.exe] Error 1

Seems like the build configuration is missing -lgfortran. I will try adding that and see what happens. That seemed to fix it (in commit c9c4ef7). I pushed that in the commit c9c4ef7.

Now configuring and building from scratch:

$ time ./do-configure -DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_Thyra=ON -DTrilinos_ENABLE_Stratimikos=ON &> configure.out && time make -j16 &> make.out ; ~/mailmsg.py "Finished CUDA build for Thyra and Startimikos on shiller"

real    1m46.340s
user    1m2.969s
sys     0m11.209s

real    6m18.293s
user    52m6.997s
sys     22m53.191s

Now that actually built and linked everything.

Now running the tests:

$ ctest -j10

[ ... ]


97% tests passed, 3 tests failed out of 119

Label Time Summary:
Stratimikos    =  43.38 sec (39 tests)
Thyra          =  69.01 sec (80 tests)

Total Test time (real) =  24.22 sec

The following tests FAILED:
         78 - ThyraTpetraAdapters_TpetraThyraWrappersUnitTests_MPI_4 (Failed)
         79 - ThyraTpetraAdapters_TpetraThyraWrappersUnitTests_serial_MPI_1 (Failed)
         80 - ThyraTpetraAdapters_Simple2DTpetraModelEvaluatorUnitTests_MPI_1 (Failed)
Errors while running CTest

real    0m24.276s
user    0m41.490s
sys     0m50.463s

Looks like you can't actually run tests that need CUDA on the build node. You get, for example:

1. TpetraThyraWrappers_double_createVectorSpace_UnitTest ...

 p=0: *** Caught standard std::exception of type 'std::runtime_error' :

  cudaGetDeviceCount( & m_cudaDevCount ) error( cudaErrorInsufficientDriver): CUDA driver version is insufficient for CUDA runtime version
/home/rabartl/Trilinos.base/Trilinos/packages/kokkos/core/src/Cuda/Kokkos_Cuda_Impl.cpp:207
  Traceback functionality not available

 [FAILED]  (0.000584 sec) TpetraThyraWrappers_double_createVectorSpace_UnitTest
 Location: /home/rabartl/Trilinos.base/Trilinos/packages/thyra/adapters/tpetra/test/TpetraThyraWrappers_UnitTests.cpp:220

So how to run test on shiller? Where do I find this documentation? I did a search on the snl-wiki for "shiller" and I found out where to find documentation on the machine itself. After grabing interactive nodes, I was able to run:

$ ctest -j2 -R ^ThyraTpetraAdapters.*
Test project /home/rabartl/Trilinos.base/BUILDS/CUDA/MPI_RELEASE_CUDA
    Start 80: ThyraTpetraAdapters_Simple2DTpetraModelEvaluatorUnitTests_MPI_1
    Start 79: ThyraTpetraAdapters_TpetraThyraWrappersUnitTests_serial_MPI_1
1/3 Test #80: ThyraTpetraAdapters_Simple2DTpetraModelEvaluatorUnitTests_MPI_1 ...   Passed   15.06 sec
2/3 Test #79: ThyraTpetraAdapters_
8000
TpetraThyraWrappersUnitTests_serial_MPI_1 .....   Passed   15.76 sec
    Start 78: ThyraTpetraAdapters_TpetraThyraWrappersUnitTests_MPI_4
3/3 Test #78: ThyraTpetraAdapters_TpetraThyraWrappersUnitTests_MPI_4 ............   Passed   17.52 sec

100% tests passed, 0 tests failed out of 3

Label Time Summary:
Thyra    =  48.34 sec (3 tests)

Total Test time (real) =  33.44 sec

Wow, shiller is super slow!

Anyhow, I can finally see the warnings that were produced when building Thyra and Stratimikos tests.

Looking for unique warnings from Teuchos first:

$ grep warning make.out | grep packages/teuchos | sort | uniq | less
/home/rabartl/Trilinos.base/Trilinos/packages/teuchos/numerics/src/Teuchos_SerialDenseMatrix.hpp:455:11: warning: ISO C++ does not support variable-length array types [-Wvla]
/home/rabartl/Trilinos.base/Trilinos/packages/teuchos/numerics/src/Teuchos_SerialDenseMatrix.hpp:659:11: warning: ISO C++ does not support variable-length array types [-Wvla]
/home/rabartl/Trilinos.base/Trilinos/packages/teuchos/numerics/src/Teuchos_SerialDenseMatrix.hpp:679:11: warning: ISO C++ does not support variable-length array types [-Wvla]
/home/rabartl/Trilinos.base/Trilinos/packages/teuchos/numerics/src/Teuchos_SerialSymDenseMatrix.hpp:488:11: warning: ISO C++ does not support variable-length array types [-Wvla]
/home/rabartl/Trilinos.base/Trilinos/packages/teuchos/numerics/src/Teuchos_SerialTriDiMatrix.hpp:413:11: warning: ISO C++ does not support variable-length array types [-Wvla]
/home/rabartl/Trilinos.base/Trilinos/packages/teuchos/numerics/src/Teuchos_SerialTriDiMatrix.hpp:431:11: warning: ISO C++ does not support variable-length array types [-Wvla]
/home/rabartl/Trilinos.base/Trilinos/packages/teuchos/numerics/src/Teuchos_SerialTriDiMatrix.hpp:456:11: warning: ISO C++ does not support variable-length array types [-Wvla]
/home/rabartl/Trilinos.base/Trilinos/packages/teuchos/numerics/src/Teuchos_SerialTriDiMatrix.hpp:468:11: warning: ISO C++ does not support variable-length array types [-Wvla]
/home/rabartl/Trilinos.base/Trilinos/packages/teuchos/numerics/src/Teuchos_SerialTriDiMatrix.hpp:492:11: warning: ISO C++ does not support variable-length array types [-Wvla]
/home/rabartl/Trilinos.base/Trilinos/packages/teuchos/numerics/src/Teuchos_SerialTriDiMatrix.hpp:571:51: warning: ISO C++ does not support variable-length array types [-Wvla]

I need to create a separate issue for that.

Now to look at Thyra warnings:

$ grep warning make.out | grep packages/thyra | sort | uniq  | wc -l
36

Of these, almost all of them are "statement is unreachable" as shown by:

$ grep warning make.out | grep packages/thyra | grep -v "statement is unreachable" | sort | uniq
/home/rabartl/Trilinos.base/Trilinos/packages/thyra/adapters/epetra/src/Thyra_EpetraOperatorViewExtractorStd.cpp(61): warning: variable "eFwdOp" was set but never used

That leaves 35 "statement is unreachable" warnings.

Many of these "statement is unreachable" warnings come from code like:

Thyra::ModelEvaluatorBase::EDerivativeMultiVectorOrientation
Thyra::convert(
  const EpetraExt::ModelEvaluator::EDerivativeMultiVectorOrientation &mvOrientation
  )
{
  switch(mvOrientation) {
    case EpetraExt::ModelEvaluator::DERIV_MV_BY_COL :
      return ModelEvaluatorBase::DERIV_MV_BY_COL;
    case EpetraExt::ModelEvaluator::DERIV_TRANS_MV_BY_ROW :
      return ModelEvaluatorBase::DERIV_TRANS_MV_BY_ROW;
    default:
      TEUCHOS_TEST_FOR_EXCEPT(true);
  }
  return ModelEvaluatorBase::DERIV_MV_BY_COL; // Should never be called!
}

Grepping for that comment shows there 15 of these:

$ find . -name "*pp" -exec grep -nHi "Should never be called" {} \; | wc -l
15

So that is where I will start.

What I want to do here is to create a macro TEUCHOS_NONREACHABLE_RETURN(RETURN_VAL) and then use this in all such returns.

I removed all of the warnings for Thyra and pushed to the 'develop' branch.

bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Mar 17, 2017
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Mar 17, 2017
… avoid warnings (trilinos#1140)

With NVCC, it issues a warning about the 'break' statement being unreachable
(which it is).
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Mar 17, 2017
These vars were only used in one unit test so I just moved them there to avoid
the "unused var" warning with NVCC.  But NVCC is just wrong that these vars
are not used.  If you comment them out, the code does not build with GCC
4.8.3.
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Mar 17, 2017
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Mar 21, 2017
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Mar 21, 2017
… avoid warnings (trilinos#1140)

With NVCC, it issues a warning about the 'break' statement being unreachable
(which it is).
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Mar 21, 2017
These vars were only used in one unit test so I just moved them there to avoid
the "unused var" warning with NVCC.  But NVCC is just wrong that these vars
are not used.  If you comment them out, the code does not build with GCC
4.8.3.
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Mar 21, 2017
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Mar 21, 2017
After talking with Roger P. I have changed this to only remove the return
statement for __NVCC__.  This is because only __NVCC__ warns about the
unreachable return but having that return statement leads to safer code by
avoiding undefined behavior in case a function is modified in such a way that
this return is no longer unreachable.  In that case, we want it to return
something to avoid undefined behavior.

I also improved the documentation some.
@bartlettroscoe
Copy link
Member Author

I finished cleaning up the warnings from Thyra and Stratimikos for NVCC and pushed to the Trilinos 'develop' branch (see below).

@bathmatt, I am closing this as complete as I am pretty sure that you will not see any more warnings on NVCC on shiller coming from building Thyra or Stratimikos. There are also not any warnings for GCC 4.8.3 as well.


DETAILED NOTES:

(2017/03/17)

I cleaned up all of the warnings for Stratimikos for NVCC and pushed to the branch 'thyra-stratimikos-cuda-warnings-1140'.

Now I want to verify that all warnings for Thyra and Stratimikos are removed for the Trilinos build for Drekar. For that, on shiller, I do:

$ cd /home/rabartl/Trilinos.base/BUILDS/CUDA/MPI_RELEASE_CUDA
$ ln -s ../../../Trilinos/cmake/ctest/drivers/ATTB/attb_cuda_config_base.sh .
$ ln -s ../../../Trilinos/cmake/ctest/drivers/ATTB/attb_cuda_config_for_drekar.sh .

$ rm -r CMake*

$ time ./attb_cuda_config_for_drekar.sh &> configure.out

real    3m32.358s
user    1m40.306s
sys     0m15.161s

$ time make -j10 &> make.out

real    685m28.870s
user    2382m5.324s
sys     62m22.490s

Wow, almost 7.5 hours to build these packages on 10 cores! That is fantastic!

Looking for any remaining Thyra or Stratimikos warnings:

$ grep "warning: " make.out | grep "\(packages/stratimikos\|thyra\|Thyra\)" | sort | uniq  | less
/home/rabartl/Trilinos.base/Trilinos/packages/nox/src-loca/src-thyra/LOCA_Thyra_GroupWrapper.H(66): warning: overloaded virtual function "LOCA::Thyra::Group::operator=" is only par
/home/rabartl/Trilinos.base/Trilinos/packages/nox/src-thyra/NOX_Thyra_Group.C(486): warning: statement is unreachable
/home/rabartl/Trilinos.base/Trilinos/packages/nox/src-thyra/NOX_Thyra_Group.C(695): warning: statement is unreachable
/home/rabartl/Trilinos.base/Trilinos/packages/nox/src-thyra/NOX_Thyra_Group.C(859): warning: statement is unreachable
/home/rabartl/Trilinos.base/Trilinos/packages/xpetra/sup/Utils/Xpetra_ThyraUtils.hpp(942): warning: dynamic initialization in unreachable code
/home/rabartl/Trilinos.base/Trilinos/packages/xpetra/sup/Utils/Xpetra_ThyraUtils.hpp(943): warning: statement is unreachable

I could fix the ones in NOX but I think Roger is already taking care of those (see #1139). As for the warnings in Xpetra, those should be fixed by an Xpetra developer.

Looking at the remaining warnings on this branch and the size of the make output:

[rabartl@shiller01 MPI_RELEASE_CUDA (master)]$ grep "warning: " make.out | wc -l
148033

[rabartl@shiller01 MPI_RELEASE_CUDA (master)]$ grep "warning: " make.out | sort | uniq | wc -l
21409

[rabartl@shiller01 MPI_RELEASE_CUDA (master)]$ du -sh make.out 
562M    make.out

Wow, still over 20K unique warnings!

(2017/03/20)

I then went back on crf450 and built and tested Teuchos, Thyra, and Stratimikos with GCC 4.8.3 with:

$ time ./do-configure -DTrilinos_ENABLE_Teuchos=ON \
  -DTrilinos_ENABLE_Thyra=ON  -DTrilinos_ENABLE_Stratimikos=ON \
  &> configure.out

real    0m13.419s
user    0m8.696s
sys     0m2.616s

$ time make -j32 &> make.out

real    9m50.045s
user    94m45.685s
sys     8m1.349s

$ ctest -j32

...

100% tests passed, 0 tests failed out of 243

Label Time Summary:
Stratimikos    =  29.59 sec (39 tests)
Teuchos        =  17.14 sec (124 tests)
Thyra          =  10.07 sec (80 tests)

Total Test time (real) =   5.49 sec

And this gave no warnings:

grep "warning" make.out | wc -l
0

(2017/03/21)

I pushed these commits to the Trilinos 'develop' branch with the top commit being:

commit 416fc129b8fed5f8c5083dad272feffc7e059f95
Merge: 9104e93 7d90906
Author: Roscoe A. Bartlett <rabartl@sandia.gov>
Date:   Tue Mar 21 05:42:29 2017 -0600

    Merge branch 'more-warnings-1133-1140' of github.com:bartlettroscoe/Trilinos into develop
    
    Build/Test Cases Summary
    Enabled Packages: EpetraExt, Stratimikos, TeuchosComm, TeuchosCore, ThyraCore
    Disabled Packages: PyTrilinos,Claps,TriKota
    Enabled all Forward Packages
    0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=2234,notpassed=0 (100.48 min)
    Other local commits for this build/test group: 7d90906, 8e3a5f45, 48a0a35, 8b6b253, aa8fafa, 148e6b4, 01122bd

@bartlettroscoe bartlettroscoe removed the stage: in progress Work on the issue has started label Mar 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg: Stratimikos pkg: Thyra Issues primarily dealing with the Thyra Package
Projects
None yet
Development

No branches or pull requests

1 participant
0