-
Notifications
You must be signed in to change notification settings - Fork 24.1k
Initial utils implementation + bug fixes #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
target_buffer = None | ||
for idx in self.sampler: | ||
sample, target = self.source[idx] | ||
sample_buffer = sample_buffer or type(sample)(self.batch_size, *sample.size()) |
This comment was marked as off-topic.
10000
Sign in to view
This comment was marked as off-topic.
10000
Sign in to view
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
Fixed: * tensor and storage printing * legacy.nn module printing * SpatialCrosMapLRN tests Also, all fixed bugs have regression tests now.
yogi81
referenced
this pull request
Aug 16, 2017
cpuhrsch
pushed a commit
to cpuhrsch/pytorch
that referenced
this pull request
May 3, 2018
…or_log,_exp_and_pow This patch improves the performance of xlog_u1 and other log, exp an…
pytorchmergebot
pushed a commit
that referenced
this pull request
Dec 5, 2023
… to hang (#115124) Let's see if it helps #114913 The issues on llvm are at llvm/llvm-project#55530 and llvm/llvm-project#69369. In my CI test, I saw the following process hanged: ``` /pytorch/pytorch/.lintbin/clang-tidy -p=/pytorch/pytorch/build --extra-arg -I/usr/lib/llvm-11/include/openmp --extra-arg -I/opt/conda/envs/py_3.9/include/python3.9 --extra-arg -I/pytorch/pytorch/third_party/pybind11/include --extra-arg -I/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11 --extra-arg -I/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11 --extra-arg -I/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/backward --extra-arg -I/usr/lib/llvm-14/lib/clang/14.0.0/include --extra-arg -I/usr/local/include --extra-arg -I/usr/include/x86_64-linux-gnu --extra-arg -I/usr/include /pytorch/pytorch/torch/csrc/autograd/python_nested_functions_manual.cpp ``` and the core dump matches the description found in llvm/llvm-project#69369 showing the stuck in `clang::tidy::bugprone::UncheckedOptionalAccessCheck::check`: ``` #0 0x00000000030c7420 in clang::dataflow::WatchedLiteralsSolverImpl::updateWatchedLiterals() () #1 0x00000000030c6c2a in clang::dataflow::WatchedLiteralsSolverImpl::solve() && () #2 0x00000000030c6572 in clang::dataflow::WatchedLiteralsSolver::solve(llvm::DenseSet<clang::dataflow::BoolValue*, llvm::DenseMapInfo<clang::dataflow::BoolValue*, void> >) () #3 0x00000000030b3bd3 in clang::dataflow::DataflowAnalysisContext::querySolver(llvm::DenseSet<clang::dataflow::BoolValue*, llvm::DenseMapInfo<clang::dataflow::BoolValue*, void> >) () #4 0x00000000030b3ca5 in clang::dataflow::DataflowAnalysisContext::flowConditionImplies(clang::dataflow::AtomicBoolValue&, clang::dataflow::BoolValue&) () #5 0x00000000030b1213 in clang::dataflow::(anonymous namespace)::diagnoseUnwrapCall(clang::Expr const*, clang::Expr const*, clang::dataflow::Environment const&) () #6 0x00000000030b1357 in std::_Function_handler<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::CallExpr const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&), clang::dataflow::(anonymous namespace)::buildDiagnoseMatchSwitch(clang::dataflow::UncheckedOptionalAccessModelOptions const&)::$_7>::_M_invoke(std::_Any_data const&, clang::CallExpr const*&&, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&) () #7 0x00000000030b1292 in std::_Function_handler<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::Stmt const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&), clang::dataflow::MatchSwitchBuilder<clang::dataflow::Environment const, std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > >::CaseOf<clang::CallExpr>(clang::ast_matchers::internal::Matcher<clang::Stmt>, std::function<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::CallExpr const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&)>) &&::{lambda(clang::Stmt const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&)#1}>::_M_invoke(std::_Any_data const&, clang::Stmt const*&&, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&) () #8 0x00000000030b1995 in clang::dataflow::MatchSwitchBuilder<clang::dataflow::Environment const, std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > >::Build() &&::{lambda(clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&)#1}::operator()(clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&) const () #9 0x00000000030b170c in std::_Function_handler<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&), clang::dataflow::MatchSwitchBuilder<clang::dataflow::Environment const, std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > >::Build() &&::{lambda(clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&)#1}>::_M_invoke(std::_Any_data const&, clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&) () #10 0x00000000030a7c27 in clang::dataflow::UncheckedOptionalAccessDiagnoser::diagnose(clang::ASTContext&, clang::Stmt const*, clang::dataflow::Environment const&) () #11 0x0000000002931286 in std::_Function_handler<void (clang::Stmt const*, clang::dataflow::DataflowAnalysisState<clang::dataflow::NoopLattice> const&), clang::tidy::bugprone::analyzeFunction(clang::FunctionDecl const&, clang::ASTContext&)::$_0>::_M_invoke(std::_Any_data const&, clang::Stmt const*&&, clang::dataflow::DataflowAnalysisState<clang::dataflow::NoopLattice> const&) () #12 0x0000000002930b41 in clang::dataflow::runDataflowAnalysis<clang::dataflow::UncheckedOptionalAccessModel>(clang::dataflow::ControlFlowContext const&, clang::dataflow::UncheckedOptionalAccessModel&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> const&)>)::{lambda(clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&)#1}::operator()(clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&) const () #13 0x00000000030c18cc in std::_Function_handler<void (clang::CFGStmt const&, clang::dataflow::TypeErasedDataflowAnalysisState const&), clang::dataflow::runTypeErasedDataflowAnalysis(clang::dataflow::ControlFlowContext const&, clang::dataflow::TypeErasedDataflowAnalysis&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&)>)::$_1>::_M_invoke(std::_Any_data const&, clang::CFGStmt const&, clang::dataflow::TypeErasedDataflowAnalysisState const&) () #14 0x00000000030bf069 in clang::dataflow::transferBlock(clang::dataflow::ControlFlowContext const&, std::vector<llvm::Optional<clang::dataflow::TypeErasedDataflowAnalysisState>, std::allocator<llvm::Optional<clang::dataflow::TypeErasedDataflowAnalysisState> > >&, clang::CFGBlock const&, clang::dataflow::Environment const&, clang::dataflow::TypeErasedDataflowAnalysis&, std::function<void (clang::CFGStmt const&, clang::dataflow::TypeErasedDataflowAnalysisState const&)>) () #15 0x00000000030bfaa5 in clang::dataflow::runTypeErasedDataflowAnalysis(clang::dataflow::ControlFlowContext const&, clang::dataflow::TypeErasedDataflowAnalysis&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&)>) () #16 0x00000000029301b3 in llvm::Expected<std::vector<llvm::Optional<clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> >, std::allocator<llvm::Optional<clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> > > > > clang::dataflow::runDataflowAnalysis<clang::dataflow::UncheckedOptionalAccessModel>(clang::dataflow::ControlFlowContext const&, clang::dataflow::UncheckedOptionalAccessModel&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> const&)>) () #17 0x000000000292fbe8 in clang::tidy::bugprone::UncheckedOptionalAccessCheck::check(clang::ast_matchers::MatchFinder::MatchResult const&) () #18 0x00000000022e1572 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::MatchVisitor::visitMatch(clang::ast_matchers::BoundNodes const&) () #19 0x0000000002797a1c in clang::ast_matchers::internal::BoundNodesTreeBuilder::visitMatches(clang::ast_matchers::internal::BoundNodesTreeBuilder::Visitor*) () #20 0x00000000022e0dc6 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::matchWithFilter(clang::DynTypedNode const&) () #21 0x00000000022e3b57 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () #22 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) () #23 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () #24 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) () #25 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () #26 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) () #27 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () #28 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) () #29 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () #30 0x00000000022e8791 in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) () #31 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () #32 0x00000000022c017a in clang::ast_matchers::MatchFinder::matchAST(clang::ASTContext&) () #33 0x000000000370ad3c in clang::MultiplexConsumer::HandleTranslationUnit(clang::ASTContext&) () #34 0x00000000038ed4bb in clang::ParseAST(clang::Sema&, bool, bool) () #35 0x000000000369eda7 in clang::FrontendAction::Execute() () #36 0x000000000360d3f6 in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) () #37 0x00000000027c475c in clang::tooling::FrontendActionFactory::runInvocation(std::shared_ptr<clang::CompilerInvocation>, clang::FileManager*, std::shared_ptr<clang::PCHContainerOperations>, clang::DiagnosticConsumer*) () #38 0x00000000022ad486 in clang::tidy::runClangTidy(clang::tidy::ClangTidyContext&, clang::tooling::CompilationDatabase const&, llvm::ArrayRef<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, llvm::IntrusiveRefCntPtr<llvm::vfs::OverlayFileSystem>, bool, bool, llvm::StringRef)::ActionFactory::runInvocation(std::shared_ptr<clang::CompilerInvocation>, clang::FileManager*, std::shared_ptr<clang::PCHContainerOperations>, clang::DiagnosticConsumer*) () #39 0x00000000027c44c6 in clang::tooling::ToolInvocation::runInvocation(char const*, clang::driver::Compilation*, std::shared_ptr<clang::CompilerInvocation>, std::shared_ptr<clang::PCHContainerOperations>) () #40 0x00000000027c360b in clang::tooling::ToolInvocation::run() () #41 0x00000000027c5bb1 in clang::tooling::ClangTool::run(clang::tooling::ToolAction*) () #42 0x00000000022a90c7 in clang::tidy::runClangTidy(clang::tidy::ClangTidyContext&, clang::tooling::CompilationDatabase const&, llvm::ArrayRef<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, llvm::IntrusiveRefCntPtr<llvm::vfs::OverlayFileSystem>, bool, bool, llvm::StringRef) () #43 0x0000000001ebc7f2 in clang::tidy::clangTidyMain(int, char const**) () #44 0x0000000004c54ba0 in __libc_start_main () #45 0x0000000001eb76ae in _start () ``` Another note is that clang-tidy is CPU-bound. So we could consider running lintrunner job on 4xlarge if needed. Pull Request resolved: #115124 Approved by: https://github.com/kit1980, https://github.com/Skylion007, https://github.com/malfet
dmenig
pushed a commit
to dmenig/pytorch
that referenced
this pull request
Dec 21, 2023
… to hang (pytorch#115124) Let's see if it helps pytorch#114913 The issues on llvm are at llvm/llvm-project#55530 and llvm/llvm-project#69369. In my CI test, I saw the following process hanged: ``` /pytorch/pytorch/.lintbin/clang-tidy -p=/pytorch/pytorch/build --extra-arg -I/usr/lib/llvm-11/include/openmp --extra-arg -I/opt/conda/envs/py_3.9/include/python3.9 --extra-arg -I/pytorch/pytorch/third_party/pybind11/include --extra-arg -I/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11 --extra-arg -I/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11 --extra-arg -I/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/backward --extra-arg -I/usr/lib/llvm-14/lib/clang/14.0.0/include --extra-arg -I/usr/local/include --extra-arg -I/usr/include/x86_64-linux-gnu --extra-arg -I/usr/include /pytorch/pytorch/torch/csrc/autograd/python_nested_functions_manual.cpp ``` and the core dump matches the description found in llvm/llvm-project#69369 showing the stuck in `clang::tidy::bugprone::UncheckedOptionalAccessCheck::check`: ``` #0 0x00000000030c7420 in clang::dataflow::WatchedLiteralsSolverImpl::updateWatchedLiterals() () pytorch#1 0x00000000030c6c2a in clang::dataflow::WatchedLiteralsSolverImpl::solve() && () pytorch#2 0x00000000030c6572 in clang::dataflow::WatchedLiteralsSolver::solve(llvm::DenseSet<clang::dataflow::BoolValue*, llvm::DenseMapInfo<clang::dataflow::BoolValue*, void> >) () pytorch#3 0x00000000030b3bd3 in clang::dataflow::DataflowAnalysisContext::querySolver(llvm::DenseSet<clang::dataflow::BoolValue*, llvm::DenseMapInfo<clang::dataflow::BoolValue*, void> >) () pytorch#4 0x00000000030b3ca5 in clang::dataflow::DataflowAnalysisContext::flowConditionImplies(clang::dataflow::AtomicBoolValue&, clang::dataflow::BoolValue&) () pytorch#5 0x00000000030b1213 in clang::dataflow::(anonymous namespace)::diagnoseUnwrapCall(clang::Expr const*, clang::Expr const*, clang::dataflow::Environment const&) () pytorch#6 0x00000000030b1357 in std::_Function_handler<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::CallExpr const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&), clang::dataflow::(anonymous namespace)::buildDiagnoseMatchSwitch(clang::dataflow::UncheckedOptionalAccessModelOptions const&)::$_7>::_M_invoke(std::_Any_data const&, clang::CallExpr const*&&, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&) () pytorch#7 0x00000000030b1292 in std::_Function_handler<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::Stmt const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&), clang::dataflow::MatchSwitchBuilder<clang::dataflow::Environment const, std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > >::CaseOf<clang::CallExpr>(clang::ast_matchers::internal::Matcher<clang::Stmt>, std::function<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::CallExpr const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&)>) &&::{lambda(clang::Stmt const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&)pytorch#1}>::_M_invoke(std::_Any_data const&, clang::Stmt const*&&, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&) () pytorch#8 0x00000000030b1995 in clang::dataflow::MatchSwitchBuilder<clang::dataflow::Environment const, std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > >::Build() &&::{lambda(clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&)pytorch#1}::operator()(clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&) const () pytorch#9 0x00000000030b170c in std::_Function_handler<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&), clang::dataflow::MatchSwitchBuilder<clang::dataflow::Environment const, std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > >::Build() &&::{lambda(clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&)pytorch#1}>::_M_invoke(std::_Any_data const&, clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&) () pytorch#10 0x00000000030a7c27 in clang::dataflow::UncheckedOptionalAccessDiagnoser::diagnose(clang::ASTContext&, clang::Stmt const*, clang::dataflow::Environment const&) () pytorch#11 0x0000000002931286 in std::_Function_handler<void (clang::Stmt const*, clang::dataflow::DataflowAnalysisState<clang::dataflow::NoopLattice> const&), clang::tidy::bugprone::analyzeFunction(clang::FunctionDecl const&, clang::ASTContext&)::$_0>::_M_invoke(std::_Any_data const&, clang::Stmt const*&&, clang::dataflow::DataflowAnalysisState<clang::dataflow::NoopLattice> const&) () pytorch#12 0x0000000002930b41 in clang::dataflow::runDataflowAnalysis<clang::dataflow::UncheckedOptionalAccessModel>(clang::dataflow::ControlFlowContext const&, clang::dataflow::UncheckedOptionalAccessModel&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> const&)>)::{lambda(clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&)pytorch#1}::operator()(clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&) const () pytorch#13 0x00000000030c18cc in std::_Function_handler<void (clang::CFGStmt const&, clang::dataflow::TypeErasedDataflowAnalysisState const&), clang::dataflow::runTypeErasedDataflowAnalysis(clang::dataflow::ControlFlowContext const&, clang::dataflow::TypeErasedDataflowAnalysis&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&)>)::$_1>::_M_invoke(std::_Any_data const&, clang::CFGStmt const&, clang::dataflow::TypeErasedDataflowAnalysisState const&) () pytorch#14 0x00000000030bf069 in clang::dataflow::transferBlock(clang::dataflow::ControlFlowContext const&, std::vector<llvm::Optional<clang::dataflow::TypeErasedDataflowAnalysisState>, std::allocator<llvm::Optional<clang::dataflow::TypeErasedDataflowAnalysisState> > >&, clang::CFGBlock const&, clang::dataflow::Environment const&, clang::dataflow::TypeErasedDataflowAnalysis&, std::function<void (clang::CFGStmt const&, clang::dataflow::TypeErasedDataflowAnalysisState const&)>) () pytorch#15 0x00000000030bfaa5 in clang::dataflow::runTypeErasedDataflowAnalysis(clang::dataflow::ControlFlowContext const&, clang::dataflow::TypeErasedDataflowAnalysis&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&)>) () pytorch#16 0x00000000029301b3 in llvm::Expected<std::vector<llvm::Optional<clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> >, std::allocator<llvm::Optional<clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> > > > > clang::dataflow::runDataflowAnalysis<clang::dataflow::UncheckedOptionalAccessModel>(clang::dataflow::ControlFlowContext const&, clang::dataflow::UncheckedOptionalAccessModel&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> const&)>) () pytorch#17 0x000000000292fbe8 in clang::tidy::bugprone::UncheckedOptionalAccessCheck::check(clang::ast_matchers::MatchFinder::MatchResult const&) () pytorch#18 0x00000000022e1572 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::MatchVisitor::visitMatch(clang::ast_matchers::BoundNodes const&) () pytorch#19 0x0000000002797a1c in clang::ast_matchers::internal::BoundNodesTreeBuilder::visitMatches(clang::ast_matchers::internal::BoundNodesTreeBuilder::Visitor*) () pytorch#20 0x00000000022e0dc6 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::matchWithFilter(clang::DynTypedNode const&) () pytorch#21 0x00000000022e3b57 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () pytorch#22 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) () pytorch#23 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () pytorch#24 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) () pytorch#25 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () pytorch#26 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) () pytorch#27 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () pytorch#28 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) () pytorch#29 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () pytorch#30 0x00000000022e8791 in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) () pytorch#31 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () pytorch#32 0x00000000022c017a in clang::ast_matchers::MatchFinder::matchAST(clang::ASTContext&) () pytorch#33 0x000000000370ad3c in clang::MultiplexConsumer::HandleTranslationUnit(clang::ASTContext&) () pytorch#34 0x00000000038ed4bb in clang::ParseAST(clang::Sema&, bool, bool) () pytorch#35 0x000000000369eda7 in clang::FrontendAction::Execute() () pytorch#36 0x000000000360d3f6 in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) () pytorch#37 0x00000000027c475c in clang::tooling::FrontendActionFactory::runInvocation(std::shared_ptr<clang::CompilerInvocation>, clang::FileManager*, std::shared_ptr<clang::PCHContainerOperations>, clang::DiagnosticConsumer*) () pytorch#38 0x00000000022ad486 in clang::tidy::runClangTidy(clang::tidy::ClangTidyContext&, clang::tooling::CompilationDatabase const&, llvm::ArrayRef<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, llvm::IntrusiveRefCntPtr<llvm::vfs::OverlayFileSystem>, bool, bool, llvm::StringRef)::ActionFactory::runInvocation(std::shared_ptr<clang::CompilerInvocation>, clang::FileManager*, std::shared_ptr<clang::PCHContainerOperations>, clang::DiagnosticConsumer*) () pytorch#39 0x00000000027c44c6 in clang::tooling::ToolInvocation::runInvocation(char const*, clang::driver::Compilation*, std::shared_ptr<clang::CompilerInvocation>, std::shared_ptr<clang::PCHContainerOperations>) () pytorch#40 0x00000000027c360b in clang::tooling::ToolInvocation::run() () pytorch#41 0x00000000027c5bb1 in clang::tooling::ClangTool::run(clang::tooling::ToolAction*) () pytorch#42 0x00000000022a90c7 in clang::tidy::runClangTidy(clang::tidy::ClangTidyContext&, clang::tooling::CompilationDatabase const&, llvm::ArrayRef<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, llvm::IntrusiveRefCntPtr<llvm::vfs::OverlayFileSystem>, bool, bool, llvm::StringRef) () pytorch#43 0x0000000001ebc7f2 in clang::tidy::clangTidyMain(int, char const**) () pytorch#44 0x0000000004c54ba0 in __libc_start_main () pytorch#45 0x0000000001eb76ae in _start () ``` Another note is that clang-tidy is CPU-bound. So we could consider running lintrunner job on 4xlarge if needed. Pull Request resolved: pytorch#115124 Approved by: https://github.com/kit1980, https://github.com/Skylion007, https://github.com/malfet
pytorchmergebot
pushed a commit
that referenced
this pull request
Feb 2, 2024
user may not know which line of code called collectives in a big code base. When debugging, we can print python-cpp stacktrace in case user call ``ProcessGroup.reduce`` instead of ``torch.distributed.reduce`` ``` LOG(INFO) << "ProcessGroupNCCL::_allgather_base stacktrace: " << get_python_cpp_trace(); ``` output (using _allgather_base as an example): one example python-part trace is ``all_gather_into_tensor from /data/users/weif/pytorch/torch/distributed/distributed_c10d.py:2838`` ``` ProcessGroupNCCL::_allgather_base stacktrace: #0 torch::unwind::unwind() from ??:0 #1 torch::CapturedTraceback::gather(bool, bool, bool) from ??:0 #2 c10d::get_python_cpp_trace[abi:cxx11]() from :0 #3 c10d::ProcessGroupNCCL::_allgather_base(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&) from ??:0 #4 c10d::ops::(anonymous namespace)::_allgather_base_CUDA(at::Tensor&, at::Tensor&, c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> > const&, bool, long) from Ops.cpp:0 #5 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<std::tuple<at::Tensor, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > > (*)(at::Tensor&, at::Tensor&, c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> > const&, bool, long), std::tuple<at::Tensor, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > >, c10::guts::typelist::typelist<at::Tensor&, at::Tensor&, c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> > const&, bool, long> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from :0 #6 torch::autograd::basicAutogradNotImplementedFallbackImpl(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from autograd_not_implemented_fallback.cpp:0 #7 c10d::ProcessGroup::_allgather_base(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&) from :0 #8 pybind11::cpp_function::initialize<pybind11::cpp_function::initialize<c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> >, c10d::ProcessGroup, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release> >(c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > (c10d::ProcessGroup::*)(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&)#1}, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> >, c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release> >(pybind11::cpp_function::initialize<c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> >, c10d::ProcessGroup, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release> >(c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > (c10d::ProcessGroup::*)(at::Tensor&, at::Tensor&, c10d::AllgatherOp 67ED tions const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&)#1}&&, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > (*)(c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from :0 #9 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0 #10 cfunction_call from /usr/local/src/conda/python-3.10.12/Objects/methodobject.c:543 #11 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.12/Objects/call.c:215 #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:112 #13 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #14 all_gather_into_tensor from /data/users/weif/pytorch/torch/distributed/distributed_c10d.py:2838 #15 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #16 do_call_core from /usr/local/src/conda/python-3.10.12/Python/ceval.c:5945 #17 wrapper from /data/users/weif/pytorch/torch/distributed/c10d_logger.py:75 #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #19 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #20 _all_gather_flat_param from /data/users/weif/pytorch/torch/distributed/fsdp/_flat_param.py:1399 #21 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #22 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #23 unshard from /data/users/weif/pytorch/torch/distributed/fsdp/_flat_param.py:1308 #24 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #25 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #26 _unshard from /data/users/weif/pytorch/torch/distributed/fsdp/_runtime_utils.py:332 #27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #29 _pre_forward_unshard from /data/users/weif/pytorch/torch/distributed/fsdp/_runtime_utils.py:448 #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #31 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #32 _pre_forward from /data/users/weif/pytorch/torch/distributed/fsdp/_runtime_utils.py:413 #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #35 forward from /data/users/weif/pytorch/torch/distributed/fsdp/fully_sharded_data_parallel.py:839 #36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #37 do_call_core from /usr/local/src/conda/python-3.10.12/Python/ceval.c:5945 #38 _call_impl from /data/users/weif/pytorch/torch/nn/modules/module.py:1520 #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #40 do_call_core from /usr/local/src/conda/python-3.10.12/Python/ceval.c:5945 #41 _wrapped_call_impl from /data/users/weif/pytorch/torch/nn/modules/module.py:1511 #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #43 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.12/Objects/call.c:431 #44 slot_tp_call from /usr/local/src/conda/python-3.10.12/Objects/typeobject.c:7494 #45 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.12/Objects/call.c:215 #46 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:112 #47 inner from /data/users/weif/pytorch/run_fsdp.py:72 #48 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #50 run from /data/users/weif/pytorch/run_fsdp.py:76 #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #53 main from /data/users/weif/pytorch/run_fsdp.py:133 #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #56 <module> from /data/users/weif/pytorch/run_fsdp.py:137 #57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #58 PyEval_EvalCode from /usr/local/src/conda/python-3.10.12/Python/ceval.c:1134 #59 run_eval_code_obj from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:1291 #60 run_mod from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:1312 #61 pyrun_file from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:1208 #62 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:456 #63 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:90 #64 pymain_run_file_obj from /usr/local/src/conda/python-3.10.12/Modules/main.c:357 #65 Py_BytesMain from /usr/local/src/conda/python-3.10.12/Modules/main.c:1090 #66 __libc_start_call_main from ??:0 #67 <unwind unsupported> from ??:0 ``` Pull Request resolved: #118924 Approved by: https://github.com/kwen2501
pytorch-bot bot
pushed a commit
that referenced
this pull request
Feb 8, 2024
user may not know which line of code called collectives in a big code base. When debugging, we can print python-cpp stacktrace in case user call ``ProcessGroup.reduce`` instead of ``torch.distributed.reduce`` ``` LOG(INFO) << "ProcessGroupNCCL::_allgather_base stacktrace: " << get_python_cpp_trace(); ``` output (using _allgather_base as an example): one example python-part trace is ``all_gather_into_tensor from /data/users/weif/pytorch/torch/distributed/distributed_c10d.py:2838`` ``` ProcessGroupNCCL::_allgather_base stacktrace: #0 torch::unwind::unwind() from ??:0 #1 torch::CapturedTraceback::gather(bool, bool, bool) from ??:0 #2 c10d::get_python_cpp_trace[abi:cxx11]() from :0 #3 c10d::ProcessGroupNCCL::_allgather_base(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&) from ??:0 #4 c10d::ops::(anonymous namespace)::_allgather_base_CUDA(at::Tensor&, at::Tensor&, c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> > const&, bool, long) from Ops.cpp:0 #5 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<std::tuple<at::Tensor, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > > (*)(at::Tensor&, at::Tensor&, c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> > const&, bool, long), std::tuple<at::Tensor, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > >, c10::guts::typelist::typelist<at::Tensor&, at::Tensor&, c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> > const&, bool, long> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from :0 #6 torch::autograd::basicAutogradNotImplementedFallbackImpl(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from autograd_not_implemented_fallback.cpp:0 #7 c10d::ProcessGroup::_allgather_base(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&) from :0 #8 pybind11::cpp_function::initialize<pybind11::cpp_function::initialize<c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> >, c10d::ProcessGroup, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release> >(c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > (c10d::ProcessGroup::*)(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&)#1}, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> >, c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release> >(pybind11::cpp_function::initialize<c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> >, c10d::ProcessGroup, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release> >(c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > (c10d::ProcessGroup::*)(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&)#1}&&, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > (*)(c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from :0 #9 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0 #10 cfunction_call from /usr/local/src/conda/python-3.10.12/Objects/methodobject.c:543 #11 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.12/Objects/call.c:215 #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:112 #13 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #14 all_gather_into_tensor from /data/users/weif/pytorch/torch/distributed/distributed_c10d.py:2838 #15 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #16 do_call_core from /usr/local/src/conda/python-3.10.12/Python/ceval.c:5945 #17 wrapper from /data/users/weif/pytorch/torch/distributed/c10d_logger.py:75 #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #19 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #20 _all_gather_flat_param from /data/users/weif/pytorch/torch/distributed/fsdp/_flat_param.py:1399 #21 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #22 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #23 unshard from /data/users/weif/pytorch/torch/distribu F438 ted/fsdp/_flat_param.py:1308 #24 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #25 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #26 _unshard from /data/users/weif/pytorch/torch/distributed/fsdp/_runtime_utils.py:332 #27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #29 _pre_forward_unshard from /data/users/weif/pytorch/torch/distributed/fsdp/_runtime_utils.py:448 #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #31 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #32 _pre_forward from /data/users/weif/pytorch/torch/distributed/fsdp/_runtime_utils.py:413 #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #35 forward from /data/users/weif/pytorch/torch/distributed/fsdp/fully_sharded_data_parallel.py:839 #36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #37 do_call_core from /usr/local/src/conda/python-3.10.12/Python/ceval.c:5945 #38 _call_impl from /data/users/weif/pytorch/torch/nn/modules/module.py:1520 #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #40 do_call_core from /usr/local/src/conda/python-3.10.12/Python/ceval.c:5945 #41 _wrapped_call_impl from /data/users/weif/pytorch/torch/nn/modules/module.py:1511 #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #43 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.12/Objects/call.c:431 #44 slot_tp_call from /usr/local/src/conda/python-3.10.12/Objects/typeobject.c:7494 #45 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.12/Objects/call.c:215 #46 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:112 #47 inner from /data/users/weif/pytorch/run_fsdp.py:72 #48 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #50 run from /data/users/weif/pytorch/run_fsdp.py:76 #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #53 main from /data/users/weif/pytorch/run_fsdp.py:133 #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #56 <module> from /data/users/weif/pytorch/run_fsdp.py:137 #57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #58 PyEval_EvalCode from /usr/local/src/conda/python-3.10.12/Python/ceval.c:1134 #59 run_eval_code_obj from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:1291 #60 run_mod from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:1312 #61 pyrun_file from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:1208 #62 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:456 #63 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:90 #64 pymain_run_file_obj from /usr/local/src/conda/python-3.10.12/Modules/main.c:357 #65 Py_BytesMain from /usr/local/src/conda/python-3.10.12/Modules/main.c:1090 #66 __libc_start_call_main from ??:0 #67 <unwind unsupported> from ??:0 ``` Pull Request resolved: #118924 Approved by: https://github.com/kwen2501
pytorchmergebot
pushed a commit
that referenced
this pull request
May 21, 2024
#126677) …destruction of tensors cached by autocast ## Root Cause For out-of-tree device extension it is loaded after torch (different .so), so the global variable `cached_casts` may be constructed before caching allocator and then destructed in reversed order when exit. ## Fix Lazily initialize `cached_casts` to correct the order. ## How to Reproduce && Test Modify the testcase `TestAutocastGPU.test_cast_cache_is_global` in test/test_autocast.py to run on your out-of-tree device. You will see following failure in the end of test. ```bash ---------------------------------------------------------------------- Ran 1 test in 4.812s OK free: 0x30080ff44000400 terminate called after throwing an instance of 'c10::Error' what(): invalid device pointer: 0x30080ff44000400 Exception raised from free at /projs/framework/betterman/code/pytorch_new/catch/torch_mlu/csrc/framework/core/caching_allocator.cpp:1609 (most recent call first): frame #0: <unknown function> + 0x118fe1 (0x7ffaef4d3fe1 in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame #1: <unknown function> + 0x11b1c4 (0x7ffaef4d61c4 in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame #2: <unknown function> + 0x117677 (0x7ffaef4d2677 in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame #3: <unknown function> + 0x11a2bf (0x7ffaef4d52bf in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame #4: <unknown function> + 0x11a186 (0x7ffaef4d5186 in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame #5: <unknown function> + 0x119fde (0x7ffaef4d4fde in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame #6: <unknown function> + 0x119d2e (0x7ffaef4d4d2e in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame #7: <unknown function> + 0x119be0 (0x7ffaef4d4be0 in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame #8: <unknown function> + 0x119977 (0x7ffaef4d4977 in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame #9: <unknown function> + 0x119313 (0x7ffaef4d4313 in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame #10: <unknown function> + 0x118b4c (0x7ffaef4d3b4c in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame #11: c10::Error::Error(c10::SourceLocation, std::string) + 0x34 (0x7ffaef4d27c4 in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame #12: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x7f (0x7ffaef4d04ed in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame #13: torch_mlu::MLUCachingAllocator::Native::NativeCachingAllocator::free(void*) + 0xe6 (0x7ff9a8eeb112 in /projs/framework/betterman/code/pytorch_new/catch/torch_mlu/csrc/lib/libtorch_mlu.so) frame #14: torch_mlu::MLUCachingAllocator::Native::local_raw_delete(void*) + 0x3b (0x7ff9a8ed9480 in /projs/framework/betterman/code/pytorch_new/catch/torch_mlu/csrc/lib/libtorch_mlu.so) frame #15: std::unique_ptr<void, void (*)(void*)>::~unique_ptr() + 0x50 (0x7ffb0a5ea322 in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_python.so) frame #16: <unknown function> + 0x1269890 (0x7ffb0a5e4890 in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_python.so) frame #17: <unknown function> + 0x1269928 (0x7ffb0a5e4928 in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_python.so) frame #18: <unknown function> + 0x127572c (0x7ffb0a5f072c in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_python.so) frame #19: <unknown function> + 0x1275758 (0x7ffb0a5f0758 in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_python.so) frame #20: <unknown function> + 0xb9bc7 (0x7ffaef474bc7 in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame #21: <unknown function> + 0xb97bc (0x7ffaef4747bc in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame #22: <unknown function> + 0xdbc50 (0x7ffaef496c50 in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame #23: c10::TensorImpl::~TensorImpl() + 0x82 (0x7ffaef49157e in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame #24: c10::TensorImpl::~TensorImpl() + 0x1c (0x7ffaef4915aa in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame #25: <unknown function> + 0x2f596d9 (0x7ffaf24fc6d9 in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_cpu.so) frame #26: <unknown function> + 0x2f589c2 (0x7ffaf24fb9c2 in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_cpu.so) frame #27: <unknown function> + 0x2f57b92 (0x7ffaf24fab92 in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_cpu.so) frame #28: <unknown function> + 0x2f5c228 (0x7ffaf24ff228 in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_cpu.so) frame #29: <unknown function> + 0x30f3f70 (0x7ffaf2696f70 in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_cpu.so) frame #30: <unknown function> + 0x30f3f90 (0x7ffaf2696f90 in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_cpu.so) frame #31: <unknown function> + 0x30f5004 (0x7ffaf2698004 in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_cpu.so) frame #32: <unknown function> + 0x30f5024 (0x7ffaf2698024 in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_cpu.so) frame #33: <unknown function> + 0x31207f0 (0x7ffaf26c37f0 in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_cpu.so) frame #34: <unknown function> + 0x3120814 (0x7ffaf26c3814 in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_cpu.so) frame #35: <unknown function> + 0x30f51e8 (0x7ffaf26981e8 in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_cpu.so) frame #36: <unknown function> + 0x30f5148 (0x7ffaf2698148 in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_cpu.so) frame #37: <unknown function> + 0x316ecea (0x7ffaf2711cea in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_cpu.so) frame #38: <unknown function> + 0x468a7 (0x7ffb0c9ed8a7 in /lib/x86_64-linux-gnu/libc.so.6) frame #39: on_exit + 0 (0x7ffb0c9eda60 in /lib/x86_64-linux-gnu/libc.so.6) <omitting python frames> frame #47: __libc_start_main + 0xf3 (0x7ffb0c9cb083 in /lib/x86_64-linux-gnu/libc.so.6) Aborted (core dumped) ``` Pull Request resolved: #126677 Approved by: https://github.com/ezyang
pytorchmergebot
pushed a commit
that referenced
this pull request
Nov 22, 2024
See #140725 (comment) Running `torch.mps.synchronize()` after metal kernel resulted in infinite wait inside `[_MTLCommandBuffer waitUntilCompleted]` ``` (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP * frame #0: 0x00000001aa919084 Metal`pthread_cond_wait + 12 frame #1: 0x00000001aa78b1b4 Metal`-[_MTLCommandBuffer waitUntilCompleted] + 84 frame #2: 0x00000001032bf358 libtorch_python.dylib`torch::mps::MPSModule_deviceSynchronize(_object*, _object*) + 40 frame #3: 0x0000000100e94c20 Python`cfunction_vectorcall_NOARGS + 100 frame #4: 0x0000000100e389b8 Python`PyObject_Vectorcall + 92 frame #5: 0x0000000100f61e38 Python`_PyEval_EvalFrameDefault + 19040 frame #6: 0x0000000100f5d180 Python`PyEval_EvalCode + 200 frame #7: 0x0000000100fcd1a4 Python`run_eval_code_obj + 104 frame #8: 0x0000000100fccbe4 Python`run_mod + 168 frame #9: 0x0000000100fcb518 Python`pyrun_file + 164 frame #10: 0x0000000100fca854 Python`_PyRun_SimpleFileObject + 256 frame #11: 0x0000000100fca4e8 Python`_PyRun_AnyFileObject + 80 frame #12: 0x0000000100ff2028 Python`pymain_run_file_obj + 164 frame #13: 0x0000000100ff1ce4 Python`pymain_run_file + 72 frame #14: 0x0000000100ff0f74 Python`Py_RunMain + 988 frame #15: 0x0000000100ff1564 Python`pymain_main + 304 frame #16: 0x0000000100ff1604 Python`Py_BytesMain + 40 frame #17: 0x000000019f630274 dyld`start + 2840 ``` Pull Request resolved: #141296 Approved by: https://github.com/huydhn
youssef62
pushed a commit
to youssef62/pytorch
that referenced
this pull request
Nov 23, 2024
See pytorch#140725 (comment) Running `torch.mps.synchronize()` after metal kernel resulted in infinite wait inside `[_MTLCommandBuffer waitUntilCompleted]` ``` (lldb) bt * thread pytorch#1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP * frame #0: 0x00000001aa919084 Metal`pthread_cond_wait + 12 frame pytorch#1: 0x00000001aa78b1b4 Metal`-[_MTLCommandBuffer waitUntilCompleted] + 84 frame pytorch#2: 0x00000001032bf358 libtorch_python.dylib`torch::mps::MPSModule_deviceSynchronize(_object*, _object*) + 40 frame pytorch#3: 0x0000000100e94c20 Python`cfunction_vectorcall_NOARGS + 100 frame pytorch#4: 0x0000000100e389b8 Python`PyObject_Vectorcall + 92 frame pytorch#5: 0x0000000100f61e38 Python`_PyEval_EvalFrameDefault + 19040 frame pytorch#6: 0x0000000100f5d180 Python`PyEval_EvalCode + 200 frame pytorch#7: 0x0000000100fcd1a4 Python`run_eval_code_obj + 104 frame pytorch#8: 0x0000000100fccbe4 Python`run_mod + 168 frame pytorch#9: 0x0000000100fcb518 Python`pyrun_file + 164 frame pytorch#10: 0x0000000100fca854 Python`_PyRun_SimpleFileObject + 256 frame pytorch#11: 0x0000000100fca4e8 Python`_PyRun_AnyFileObject + 80 frame pytorch#12: 0x0000000100ff2028 Python`pymain_run_file_obj + 164 frame pytorch#13: 0x0000000100ff1ce4 Python`pymain_run_file + 72 frame pytorch#14: 0x0000000100ff0f74 Python`Py_RunMain + 988 frame pytorch#15: 0x0000000100ff1564 Python`pymain_main + 304 frame pytorch#16: 0x0000000100ff1604 Python`Py_BytesMain + 40 frame pytorch#17: 0x000000019f630274 dyld`start + 2840 ``` Pull Request resolved: pytorch#141296 Approved by: https://github.com/huydhn
chunyuan-w
pushed a commit
to chunyuan-w/pytorch
that referenced
this pull request
Nov 29, 2024
…ions fix the case where SKIP_MASK_SCORE is True
Ryo-not-rio
pushed a commit
to Ryo-not-rio/pytorch
that referenced
this pull request
Dec 2, 2024
See pytorch#140725 (comment) Running `torch.mps.synchronize()` after metal kernel resulted in infinite wait inside `[_MTLCommandBuffer waitUntilCompleted]` ``` (lldb) bt * thread pytorch#1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP * frame #0: 0x00000001aa919084 Metal`pthread_cond_wait + 12 frame pytorch#1: 0x00000001aa78b1b4 Metal`-[_MTLCommandBuffer waitUntilCompleted] + 84 frame pytorch#2: 0x00000001032bf358 libtorch_python.dylib`torch::mps::MPSModule_deviceSynchronize(_object*, _object*) + 40 frame pytorch#3: 0x0000000100e94c20 Python`cfunction_vectorcall_NOARGS + 100 frame pytorch#4: 0x0000000100e389b8 Python`PyObject_Vectorcall + 92 frame pytorch#5: 0x0000000100f61e38 Python`_PyEval_EvalFrameDefault + 19040 frame pytorch#6: 0x0000000100f5d180 Python`PyEval_EvalCode + 200 frame pytorch#7: 0x0000000100fcd1a4 Python`run_eval_code_obj + 104 frame pytorch#8: 0x0000000100fccbe4 Python`run_mod + 168 frame pytorch#9: 0x0000000100fcb518 Python`pyrun_file + 164 frame pytorch#10: 0x0000000100fca854 Python`_PyRun_SimpleFileObject + 256 frame pytorch#11: 0x0000000100fca4e8 Python`_PyRun_AnyFileObject + 80 frame pytorch#12: 0x0000000100ff2028 Python`pymain_run_file_obj + 164 frame pytorch#13: 0x0000000100ff1ce4 Python`pymain_run_file + 72 frame pytorch#14: 0x0000000100ff0f74 Python`Py_RunMain + 988 frame pytorch#15: 0x0000000100ff1564 Python`pymain_main + 304 frame pytorch#16: 0x0000000100ff1604 Python`Py_BytesMain + 40 frame pytorch#17: 0x000000019f630274 dyld`start + 2840 ``` Pull Request resolved: pytorch#141296 Approved by: https://github.com/huydhn
pobin6
pushed a commit
to pobin6/pytorch
that referenced
this pull request
Dec 5, 2024
See pytorch#140725 (comment) Running `torch.mps.synchronize()` after metal kernel resulted in infinite wait inside `[_MTLCommandBuffer waitUntilCompleted]` ``` (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP * frame #0: 0x00000001aa919084 Metal`pthread_cond_wait + 12 frame #1: 0x00000001aa78b1b4 Metal`-[_MTLCommandBuffer waitUntilCompleted] + 84 frame pytorch#2: 0x00000001032bf358 libtorch_python.dylib`torch::mps::MPSModule_deviceSynchronize(_object*, _object*) + 40 frame pytorch#3: 0x0000000100e94c20 Python`cfunction_vectorcall_NOARGS + 100 frame pytorch#4: 0x0000000100e389b8 Python`PyObject_Vectorcall + 92 frame pytorch#5: 0x0000000100f61e38 Python`_PyEval_EvalFrameDefault + 19040 frame pytorch#6: 0x0000000100f5d180 Python`PyEval_EvalCode + 200 frame pytorch#7: 0x0000000100fcd1a4 Python`run_eval_code_obj + 104 frame pytorch#8: 0x0000000100fccbe4 Python`run_mod + 168 frame pytorch#9: 0x0000000100fcb518 Python`pyrun_file + 164 frame pytorch#10: 0x0000000100fca854 Python`_PyRun_SimpleFileObject + 256 frame pytorch#11: 0x0000000100fca4e8 Python`_PyRun_AnyFileObject + 80 frame pytorch#12: 0x0000000100ff2028 Python`pymain_run_file_obj + 164 frame pytorch#13: 0x0000000100ff1ce4 Python`pymain_run_file + 72 frame pytorch#14: 0x0000000100ff0f74 Python`Py_RunMain + 988 frame pytorch#15: 0x0000000100ff1564 Python`pymain_main + 304 frame pytorch#16: 0x0000000100ff1604 Python`Py_BytesMain + 40 frame pytorch#17: 0x000000019f630274 dyld`start + 2840 ``` Pull Request resolved: pytorch#141296 Approved by: https://github.com/huydhn
gglin001
added a commit
to torch-nupu/pytorch
that referenced
this pull request
Jan 23, 2025
Dev merge main
c-p-i-o
added a commit
to c-p-i-o/pytorch
that referenced
this pull request
Jan 23, 2025
Summary: Fix memory leak on shutdown when socket is closed. We still need to free the buffer to make valgrind happy. Test Plan: Use `mtiavm`. Repro steps provided by cristianlume. 1. Build ``` buck2 run //mtia/vm:athena-amodel-usd-owl-rank- 2. Run 2 VMs on window 1: ``` mtiavm ssh --vm=0 -- $(buck run @//neteng/ai/rdma_gen/mode/owl //neteng/ai/rdma_gen:rdma_gen --emit-shell) --rdma_mode=mtiav1 --num_ranks=2 on window 2: ```` mtiavm ssh --vm=1 -- $(buck run @//neteng/ai/rdma_gen/mode/owl //neteng/ai/rdma_gen:rdma_gen --emit-shell) --rdma_mode=mtiav1 --num_ranks=2 --rank=1 --store_host=172.16.1.1 ``` without the fix: ``` ==8766==ERROR: LeakSanitizer: detected memory leaks Direct leak of 8000 byte(s) in 2 object(s) allocated from: #0 0x5696fe in malloc (/data/users/cpio/fbsource/buck-out/v2/gen/fbcode/d4f2c81239ceac96/neteng/ai/rdma_gen/__rdma_gen__/rdma_gen+0x5696fe) pytorch#1 0x7faa8d40c47b in c10d::detail::UvTcpSocket::alloc_buffer(uv_handle_s*, unsigned long, uv_buf_t*) fbcode/caffe2/torch/csrc/distributed/c10d/TCPStoreLibUvBackend.cpp:121 pytorch#2 0x7faa6f62316d in uv__read /home/engshare/third-party2/libuv/1.34.2/src/libuv-v1.34.2/src/unix/stream.c:1143:5 pytorch#3 0x7faa6f6239ef in uv__stream_io /home/engshare/third-party2/libuv/1.34.2/src/libuv-v1.34.2/src/unix/stream.c:1306:5 pytorch#4 0x7faa6f62941f in uv__io_poll /home/engshare/third-party2/libuv/1.34.2/src/libuv-v1.34.2/src/unix/linux-core.c:431:11 pytorch#5 0x7faa6f618629 in uv_run /home/engshare/third-party2/libuv/1.34.2/src/libuv-v1.34.2/src/unix/core.c:375:5 pytorch#6 0x7faa8d3e7320 in c10d::detail::LibUVStoreDaemon::run() fbcode/caffe2/torch/csrc/distributed/c10d/TCPStoreLibUvBackend.cpp:1216 pytorch#7 0x7faa8d3bc933 in void std::__invoke_impl<void, void (c10d::detail::BackgroundThread::*)(), c10d::detail::BackgroundThread*>(std::__invoke_memfun_deref, void (c10d::detail::BackgroundThread::*&&)(), c10d::detail::BackgroundThread*&&) fbcode/third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/invoke.h:74 pytorch#8 0x7faa8d3bc80c in std::__invoke_result<void (c10d::detail::BackgroundThread::*)(), c10d::detail::BackgroundThread*>::type std::__invoke<void (c10d::detail::BackgroundThread::*)(), c10d::detail::BackgroundThread*>(void (c10d::detail::BackgroundThread::*&&)(), c10d::detail::BackgroundThread*&&) fbcode/third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/invoke.h:96 pytorch#9 0x7faa8d3bc7e1 in void std::thread::_Invoker<std::tuple<void (c10d::detail::BackgroundThread::*)(), c10d::detail::BackgroundThread*>>::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) fbcode/third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/std_thread.h:253 pytorch#10 0x7faa8d3bc7a4 in std::thread::_Invoker<std::tuple<void (c10d::detail::BackgroundThread::*)(), c10d::detail::BackgroundThread*>>::operator()() fbcode/third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/std_thread.h:260 pytorch#11 0x7faa8d3bc608 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (c10d::detail::BackgroundThread::*)(), c10d::detail::BackgroundThread*>>>::_M_run() fbcode/third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/std_thread.h:211 pytorch#12 0x7faa436df5b4 in execute_native_thread_routine (/usr/local/fbcode/platform010/lib/libstdc++.so.6+0xdf5b4) (BuildId: 14a4eafe0cdc86af9a949a6c0c27bf21a033e047) pytorch#13 0x56744a in asan_thread_start(void*) ubsan.c pytorch#14 0x7faa43b2cf5b in __GI___clone3 (/usr/local/fbcode/platform010/lib/libc.so.6+0x12cf5b) (BuildId: 93cdceeb8322234c38e1f2c93ad0ff10c7632fa6) ``` With fix, no leak Differential Revision: D68566104
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Don't merge, this PR is only for test purposes