8000 refactor: acquireAsync will dispatch a job, not be dispatched by ximinez · Pull Request #5327 · XRPLF/rippled · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

refactor: acquireAsync will dispatch a job, not be dispatched #5327

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: develop
Choose a base branch
from

Conversation

ximinez
Copy link
Collaborator
@ximinez ximinez commented Feb 28, 2025

High Level Overview of Change

Does a few different things:

  1. Adds a new RAII class CanProcess to use where needed for job deduplication.
  2. Adds and improves logging across a variety of classes. Uses the new to_short_string(base_uint) function to help disambiguate some messages about peers and ledgers.
  3. Exits InboundLedger::onTimer without sending new requests if the complete ledger is available. (Duh.)
  4. Changes the interface and usage of InboundLedgers::acquireAsync so that it dispatches a job, and is not called from a dispatched job. This will reduce the number of jobs spun up only to immediately exit.

Context of Change

This is the safe subset of changes from #5301, which has problems, plus the item to dispatch from acquireAsync.

Type of Change

  • Refactor (non-breaking change that only restructures code)
  • Performance (increase or change in throughput and/or latency)
  • Tests (you added tests for code that already exists, or your new feature included in this PR)

API Impact

None

@ximinez ximinez force-pushed the ximinez/fix-getledger branch from 8a1331c to 021c10d Compare February 28, 2025 18:55
@ximinez ximinez force-pushed the ximinez/acquireAsyncDispatch branch from fd9bff2 to a241e2f Compare February 28, 2025 18:56
Copy link
codecov bot commented Feb 28, 2025

Codecov Report

Attention: Patch coverage is 60.20408% with 39 lines in your changes missing coverage. Please review.

Project coverage is 79.1%. Comparing base (9874d47) to head (e38847e).

Files with missing lines Patch % Lines
src/xrpld/app/misc/NetworkOPs.cpp 21.4% 22 Missing ⚠️
src/xrpld/app/ledger/detail/InboundLedgers.cpp 59.1% 9 Missing ⚠️
src/xrpld/app/ledger/detail/InboundLedger.cpp 25.0% 3 Missing ⚠️
src/xrpld/app/ledger/detail/LedgerMaster.cpp 0.0% 3 Missing ⚠️
src/xrpld/app/consensus/RCLConsensus.cpp 0.0% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##           develop   #5327   +/-   ##
=======================================
  Coverage     79.1%   79.1%           
=======================================
  Files          816     817    +1     
  Lines        71622   71652   +30     
  Branches      8237    8234    -3     
=======================================
+ Hits         56644   56679   +35     
+ Misses       14978   14973    -5     
Files with missing lines Coverage Δ
include/xrpl/basics/CanProcess.h 100.0% <100.0%> (ø)
include/xrpl/basics/base_uint.h 96.9% <100.0%> (+<0.1%) ⬆️
include/xrpl/protocol/LedgerHeader.h 100.0% <ø> (ø)
src/xrpld/app/consensus/RCLValidations.cpp 72.3% <100.0%> (-1.9%) ⬇️
src/xrpld/app/ledger/InboundLedgers.h 100.0% <ø> (ø)
src/xrpld/app/ledger/detail/TimeoutCounter.cpp 88.4% <100.0%> (+1.3%) ⬆️
src/xrpld/app/ledger/detail/TimeoutCounter.h 100.0% <ø> (ø)
src/xrpld/app/misc/HashRouter.h 100.0% <ø> (ø)
src/xrpld/app/misc/NetworkOPs.h 100.0% <ø> (ø)
src/xrpld/app/consensus/RCLConsensus.cpp 63.0% <0.0%> (+0.8%) ⬆️
... and 4 more

... and 6 files with indirect coverage changes

Impacted file tree graph

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ximinez ximinez force-pushed the ximinez/fix-getledger branch from 021c10d to 7dc9efd Compare March 11, 2025 15:41
@ximinez ximinez force-pushed the ximinez/acquireAsyncDispatch branch from 011e2c8 to 53f1f02 Compare March 11, 2025 15:48
@ximinez ximinez force-pushed the ximinez/fix-getledger branch from 7dc9efd to 068b71f Compare March 12, 2025 16:35
@ximinez ximinez force-pushed the ximinez/acquireAsyncDispatch branch from 53f1f02 to 16a3282 Compare March 12, 2025 20:42
@ximinez ximinez force-pushed the ximinez/fix-getledger branch from 068b71f to d7790ce Compare March 13, 2025 00:16
@ximinez ximinez force-pushed the ximinez/acquireAsyncDispatch branch from 16a3282 to ccb0099 Compare March 13, 2025 16:37
@ximinez ximinez force-pushed the ximinez/fix-getledger branch from d7790ce to 46f8af5 Compare March 13, 2025 16:45
@ximinez ximinez force-pushed the ximinez/acquireAsyncDispatch branch from ccb0099 to a53f0c0 Compare March 13, 2025 16:50
@ximinez ximinez force-pushed the ximinez/fix-getledger branch from 46f8af5 to 0e4f4d2 Compare March 17, 2025 23:01
@ximinez ximinez force-pushed the ximinez/acquireAsyncDispatch branch from a53f0c0 to 78f4bae Compare March 17, 2025 23:55
@ximinez ximinez force-pushed the ximinez/fix-getledger branch from 0e4f4d2 to 06e0540 Compare March 19, 2025 00:46
@ximinez ximinez force-pushed the ximinez/acquireAsyncDispatch branch from 78f4bae to c1382ab Compare March 19, 2025 00:46
@ximinez ximinez force-pushed the ximinez/fix-getledger branch from 06e0540 to 4b464d1 Compare March 20, 2025 01:05
@ximinez ximinez force-pushed the ximinez/acquireAsyncDispatch branch from c1382ab to 363cd17 Compare March 20, 2025 01:05
@ximinez ximinez force-pushed the ximinez/fix-getledger branch from 4b464d1 to a6799ec Compare March 20, 2025 18:07
@ximinez ximinez force-pushed the ximinez/acquireAsyncDispatch branch from 363cd17 to 0954203 Compare March 20, 2025 18:08
@ximinez ximinez force-pushed the ximinez/fix-getledger branch from a6799ec to 0b8c2b3 Compare March 24, 2025 23:40
@ximinez ximinez force-pushed the ximinez/acquireAsyncDispatch branch from 0954203 to adb9df3 Compare March 24, 2025 23:41
@ximinez ximinez force-pushed the ximinez/fix-getledger branch from 0b8c2b3 to f2b1a67 Compare March 25, 2025 16:12
@ximinez ximinez force-pushed the ximinez/acquireAsyncDispatch branch from adb9df3 to 262940a Compare March 25, 2025 16:12
@ximinez ximinez force-pushed the ximinez/fix-getledger branch from f2b1a67 to c1215d0 Compare March 27, 2025 18:37
@ximinez ximinez force-pushed the ximinez/acquireAsyncDispatch branch from 262940a to 9ecb783 Compare March 27, 2025 18:38
@ximinez ximinez force-pushed the ximinez/fix-getledger branch from c1215d0 to 966204c Compare March 31, 2025 21:55
@ximinez ximinez force-pushed the ximinez/acquireAsyncDispatch branch from 9ecb783 to e60c51a Compare March 31, 2025 21:55
@ximinez ximinez force-pushed the ximinez/fix-getledger branch from 966204c to c49d81c Compare April 2, 2025 14:43
@ximinez ximinez force-pushed the ximinez/acquireAsyncDispatch branch from e60c51a to da31e1c Compare April 2, 2025 14:43
@ximinez ximinez force-pushed the ximinez/fix-getledger branch from c49d81c to 07fa3a6 Compare April 4, 2025 17:23
@ximinez ximinez force-pushed the ximinez/fix-getledger branch from 99fb808 to 706a4c9 Compare April 24, 2025 14:44
@ximinez ximinez force-pushed the ximinez/acquireAsyncDispatch branch 2 times, most recently from 275a045 to 0a02530 Compare April 25, 2025 15:45
@ximinez ximinez force-pushed the ximinez/fix-getledger branch from 706a4c9 to 3f71758 Compare April 25, 2025 15:46
@ximinez ximinez force-pushed the ximinez/acquireAsyncDispatch branch from 0a02530 to a7c1d72 Compare April 25, 2025 16:41
@ximinez ximinez force-pushed the ximinez/fix-getledger branch from 3f71758 to b92c9f9 Compare April 25, 2025 16:42
@ximinez ximinez changed the base branch from ximinez/fix-getledger to develop April 25, 2025 16:52
@ximinez ximinez force-pushed the ximinez/acquireAsyncDispatch branch 2 times, most recently from 5863223 to 04213be Compare April 25, 2025 17:16
@ximinez ximinez changed the title DRAFT: refactor: acquireAsync will dispatch the job, not the other way around refactor: acquireAsync will dispatch a job, not be dispatched Apr 25, 2025
@ximinez ximinez marked this pull request as ready for review April 25, 2025 17:22
@ximinez ximinez force-pushed the ximinez/acquireAsyncDispatch branch from 04213be to 21ad0cc Compare April 25, 2025 20:37
@ximinez
Copy link
Collaborator Author
ximinez commented Apr 28, 2025

Unfortunately, this is not ready for prime time. I ran a simple test over the weekend, running this branch for some time, and running develop for some time.

What the data shows is:

  • This branch ran for about 14 hours. In that time, it went out of sync 239 times. (239 NetworkOPs:WRN STATE->connected - updateOperatingMode: no positions), which is about 17 de-syncs per hour.
  • develop ran for about 48 hours. In that time, it never dropped all the way down to "connected", and it went to "syncing" 43 times. At least of those had to be during initial sync, so call it 42. Either way, that's < 1 de-sync per hour.

Clearly, something is misbehaving, so I introduced a problem somewhere. I may be able to take some time to find out where, but it's not my priority right now, so it may be a while.

$ for f in acquireAsync.txt upstream.txt ; do echo -e "\n$f" ; head -1 $f ; tail -1 $f ; loganalysis $f; done

acquireAsync.txt
2025-Apr-25 22:41:05.433965087 UTC Application:NFO Process starting: rippled-2.4.0+7d8411985cc62926834ac06198422cb101feca0e.DEBUG, Instance Cookie: 17903171231208261479
2025-Apr-26 12:32:38.881102671 UTC NetworkOPs:NFO STATE->full - endConsensus: check full
      4 appeared;  following new file
      1 Application:NFO Process starting: rippled-#+7d8411985cc62926834ac06198422cb101feca0e.DEBUG, Instance Cookie: #
      4 become inaccessible: No such file or directory
      1 NetworkOPs:NFO STATE->connected - Heartbeat: sufficient peers
    291 NetworkOPs:NFO STATE->full - endConsensus: check full
      1 NetworkOPs:NFO STATE->syncing - Heartbeat: check connected
    296 NetworkOPs:NFO STATE->tracking - endConsensus: check tracking
      1 NetworkOPs:WRN STATE->connected - Heartbeat: check syncing
    239 NetworkOPs:WRN STATE->connected - updateOperatingMode: no positions
      2 NetworkOPs:WRN STATE->syncing - check LCL: not on consensus ledger
     25 NetworkOPs:WRN STATE->syncing - consensusViewChange
     29 NetworkOPs:WRN STATE->syncing - updateOperatingMode: no positions

upstream.txt
2025-Apr-26 14:11:56.660217534 UTC Application:NFO Process starting: rippled-2.4.0+fa1e25abef7bf12bde87ba23528346853cef88b0.DEBUG, Instance Cookie: 7137182022843094303
2025-Apr-28 14:21:47.931018040 UTC NetworkOPs:NFO STATE->full
     16 appeared;  following new file
      1 Application:NFO Process starting: rippled-#+fa1e25abef7bf12bde87ba23528346853cef88b0.DEBUG, Instance Cookie: #
     16 become inaccessible: No such file or directory
      1 NetworkOPs:NFO STATE->connected
     40 NetworkOPs:NFO STATE->full
     43 NetworkOPs:NFO STATE->syncing
     43 NetworkOPs:NFO STATE->tracking

ximinez added 4 commits April 28, 2025 14:34
- Improve logging related to ledger acquisition and operating mode
  changes
- Class "CanProcess" to keep track of processing of distinct items
- Delete the copy ctor & operator
@ximinez ximinez force-pushed the ximinez/acquireAsyncDispatch branch from 7d84119 to 87c61e1 Compare April 28, 2025 18:34
@ximinez
Copy link
Collaborator Author
ximinez commented May 1, 2025

This is ready for review. I accidentally ran the original code in the test mentioned in my previous comment. When I ran it again over the last few days, it had a more de-syncs, but not unreasonably so. About 40 in the last 24 hours. Maybe still enough to be a problem, but not the disaster I observed earlier.

     38 NetworkOPs:NFO STATE->full - endConsensus: check full
     39 NetworkOPs:NFO STATE->tracking - endConsensus: check tracking
      4 NetworkOPs:WRN STATE->connected - updateOperatingMode: no positions
      4 NetworkOPs:WRN STATE->syncing - check LCL: not on consensus ledger
     27 NetworkOPs:WRN STATE->syncing - consensusViewChange
      4 NetworkOPs:WRN STATE->syncing - updateOperatingMode: no positions

@ximinez ximinez requested a review from a team as a code owner May 14, 2025 14:07
@bthomee bthomee requested review from vvysokikh1 and Bronek June 6, 2025 15:19
Copy link
Collaborator
@vvysokikh1 vvysokikh1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit conflicted about this change. I think it should work, but I don't like the design. Would it be better to implement wrapper around container type that would provide such properties instead of RAII wrapper around iterator?

Comment on lines +98 to +100
// TODO: Use structured binding once LLVM 16 is the minimum supported
// version. See also: https://github.com/llvm/llvm-project/issues/48582
// https://github.com/llvm/llvm-project/commit/127bf44385424891eb04cff8e52d3f157fc2cb7c
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we already agreed to support clang16 as a minimum version. Maybe worth switching to that?

Comment on lines +128 to +133
template <class Mutex, class Item>
std::function<void()>
insert(Mutex& mtx, std::set<Item>& collection, Item const& item)
{
return doInsert<true>(mtx, collection, item);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should there be a specialization for other containers types where invalidation does not happen on insert/erase?

bool isNew = true;
std::shared_ptr<InboundLedger> inbound;
{
ScopedLockType sl(mLock);
if (stopping_)
{
JLOG(j_.debug()) << "Abort(stopping): " << ss.str();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ss.str() is always empty here

uint256 const& hash,
std::uint32_t seq,
InboundLedger::Reason reason) override
{
std::unique_lock lock(acquiresMutex_);
try
if (auto check = std::make_shared<CanProcess const>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could be auto const check = ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0