Improve transaction relay logic #4985

ximinez · 2024-04-10T23:40:35Z

High Level Overview of Change

This PR, if merged, will improve transaction relay logic around a few edge cases.

(I'll write a single commit message later, but before this PR is squashed and merged.)

Context of Change

A few months ago, while examining some of the issues around the 2.0.0 release, and auditing transaction relay code, I identified a few areas with potential for improvement.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Tests (you added tests for code that already exists, or your new feature included in this PR)

Before / After

This PR is divided into four mostly independent changes.

"Decrease shouldRelay limit to 30s." Pretty self-explanatory. Currently, the limit is 5 minutes, by which point the HashRouter entry could have expired, making this transaction look brand new (and thus causing it to be relayed back to peers which have sent it to us recently).
"Give a transaction more chances to be retried." Will put a transaction into LedgerMaster's held transactions if the transaction gets a ter, tel, or tef result. Old behavior was just ter.
- Additionally, to prevent a transaction from being repeatedly held indefinitely, it must meet some extra conditions. (Documented in a comment in the code.)
"Pop all transactions with sequential sequences, or tickets." When a transaction is processed successfully, currently, one held transaction for the same account (if any) will be popped out of the held transactions list, and queued up for the next transaction batch. This change pops all transactions for the account, but only if they have sequential sequences (for non-ticket transactions) or use a ticket. This issue was identified from interactions with @mtrippled's Apply transaction batches in periodic intervals. #4504, which was merged, but unfortunately reverted later by Revert "Apply transaction batches in periodic intervals (#4504)" #4852. When the batches were spaced out, it could potentially take a very long time for a large number of held transactions for an account to get processed through. However, whether batched or not, this change will help get held transactions cleared out, particularly if a missing earlier transaction is what held them up.
"Process held transactions through existing NetworkOPs batching." In the current processing, at the end of each consensus round, all held transactions are directly applied to the open ledger, then the held list is reset. This bypasses all of the logic in NetworkOPs::apply which, among other things, broadcasts successful transactions to peers. This means that the transaction may not get broadcast to peers for a really long time (5 minutes in the current implementation, or 30 seconds with this first commit). If the node is a bottleneck (either due to network configuration, or because the transaction was submitted locally), the transaction may not be seen by any other nodes or validators before it expires or cause 8000 s other problems.

* Allows transactions, validator lists, proposals, and validations to be relayed more often, but only when triggered by another event, such as receiving it from a peer * Decrease from 5min. * Expected to help transaction throughput on poorly connected networks.

* Hold if the transaction gets a ter, tel, or tef result. * Use the new SF_HELD flag to ultimately prevent the transaction from being held and retried too many times.

* Ensures that successful transactions are broadcast to peers, appropriate failed transactions are held for later attempts, fee changes are sent to subscribers, etc.

codecov-commenter · 2024-04-11T23:33:05Z

Codecov Report

Attention: Patch coverage is 87.60331% with 15 lines in your changes missing coverage. Please review.

Project coverage is 78.1%. Comparing base (3502df2) to head (0c0362f).
Report is 1 commits behind head on develop.

Files with missing lines	Patch %	Lines
src/xrpld/app/misc/NetworkOPs.cpp	82.6%	15 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##           develop   #4985   +/-   ##
=======================================
  Coverage     78.1%   78.1%           
=======================================
  Files          795     795           
  Lines        68582   68663   +81     
  Branches      8278    8283    +5     
=======================================
+ Hits         53574   53649   +75     
- Misses       15008   15014    +6

Files with missing lines	Coverage Δ
src/xrpld/app/ledger/LocalTxs.h	`100.0% <ø> (ø)`
src/xrpld/app/ledger/detail/LedgerMaster.cpp	`43.9% <100.0%> (-0.1%)`	⬇️
src/xrpld/app/ledger/detail/LocalTxs.cpp	`100.0% <100.0%> (ø)`
src/xrpld/app/main/Application.cpp	`69.3% <100.0%> (ø)`
src/xrpld/app/misc/CanonicalTXSet.cpp	`100.0% <100.0%> (ø)`
src/xrpld/app/misc/HashRouter.cpp	`100.0% <100.0%> (ø)`
src/xrpld/app/misc/HashRouter.h	`100.0% <100.0%> (ø)`
src/xrpld/app/misc/NetworkOPs.h	`100.0% <ø> (ø)`
src/xrpld/app/misc/Transaction.h	`100.0% <ø> (ø)`
src/xrpld/app/misc/NetworkOPs.cpp	`69.6% <82.6%> (+0.4%)`	⬆️

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

scottschurr

Very nice! I really appreciate the way the commits were divided up. It made the code review much easier.

I mostly left complementary comments. But there's one bool that I suspect needs to be changed to a std::atomic<bool>. See what you think...

scottschurr · 2024-04-18T02:02:18Z

src/ripple/app/misc/CanonicalTXSet.cpp

+        (!itrNext->second->getSeqProxy().isSeq() ||
+         itrNext->second->getSeqProxy().value() == seqProxy.value() + 1))


Nice! This takes full advantage of the "unusual" sort order of SeqProxy, that all sequence numbers sort in front of all tickets.

scottschurr · 2024-04-18T02:18:30Z

src/ripple/app/misc/NetworkOPs.cpp

-                auto const txNext = m_ledgerMaster.popAcctTransaction(txCur);
-                if (txNext)
+                auto txNext = m_ledgerMaster.popAcctTransaction(txCur);
+                while (txNext)


I was initially worried that this while loop might submit_held() a boatload of transactions. But the TxQ defaults maximumTxnPerAccount to 10. So the largest number of times this loop could run (ordinarily) would be 10. That seems reasonable, and a good way to clear out an account that has a lot of transactions queued.

This new characteristic may be worth pointing out to the performance folks, in case they want to stress it.

To clarify, this loop doesn't do anything with the TxQ. This is reading the list of held transactions in LedgerMaster. So, AFAIK, there is no limit. However, the only way for a tx to get into that list is in this same function (see calls to addHeldTransaction), and only transactions received directly from RPC or from peers get to this function. That said, a malicious client or peer could probably get a bunch of transactions held (this is true without these changes). Thus, it might make sense to add a limit to mitigate some kinds of attacks.

Also, submit_held is a std::vector. Anything added to it will be processed in the next batch of transactions.

scottschurr · 2024-04-18T02:26:57Z

src/ripple/app/ledger/impl/LedgerMaster.cpp

+        // VFALCO NOTE The hash for an open ledger is undefined so we use
+        // something that is a reasonable substitute.
+        CanonicalTXSet set(app_.openLedger().current()->info().parentHash);
+        std::swap(mHeldTransactions, set);


This is a great way to minimize processing while the lock is held. Good spotting!

scottschurr · 2024-04-18T02:33:36Z

src/ripple/app/misc/NetworkOPs.cpp

-    bool bLocal,
-    FailHard failType)
+bool
+NetworkOPsImp::preProcessTransaction(std::shared_ptr<Transaction>& transaction)


Splitting the validation and canonicalization part of processTransaction into this other method was a good idea.

scottschurr · 2024-04-18T17:45:13Z

src/ripple/app/misc/NetworkOPs.cpp

@@ -1276,6 +1300,17 @@ NetworkOPsImp::doTransactionSync(
        transaction->setApplying();
    }

+    doTransactionSyncBatch(
+        lock, [&transaction](std::unique_lock<std::mutex>& lock) {
+            return transaction->getApplying();


I looked into this call to Transaction::getApplying(). The bool being read is neither protected by a lock nor atomic. I think we need to change the Transaction::mApplying member variable into a std::atomic<bool>, since the bool is being accessed across threads.

This problem was present before your change. But please fix it while we're thinking about it.

I think this was done intentionally because it avoids the overhead of taking a lock, and the consequences of a race condition are nearly-zero.

Most significantly, almost all accesses of the *Applying functions are under NetworkOPsImp's lock. The exceptions are

The new processTransactionSet function.

The popAcctTransaction loop we were just talking about.

If there is a race condition, and getApplying returns false when it should be true, the transaction will be processed again. Not that big a deal if it's a rare one-off. Most of the time, it'll get tefALREADY or tefPAST_SEQ.

On the flip side, if it returns true, when it should be false, then the transaction must have been attempted recently, so no big deal if it doesn't immediately get tried right away.

If there's a race between setApplying and clearApplying, and the flag ends up set, then a batch is about to try to process the transaction and will call clearApplying later. If it ends up cleared, then it might get attempted again later as is the case with item 2.

Alllll that said, I don't think it would hurt to rewrite those two lock exceptions I pointed out to ensure that the lock is taken for the function calls.

scottschurr · 2024-04-18T23:24:47Z

src/ripple/app/misc/NetworkOPs.cpp

@@ -1224,12 +1232,28 @@ NetworkOPsImp::processTransaction(
        transaction->setStatus(INVALID);
        transaction->setResult(temBAD_SIGNATURE);
        app_.getHashRouter().setFlags(transaction->getID(), SF_BAD);
-        return;
+        return false;


Similar to above, these 5 lines are not hit by the unit tests. 🤷

It should be impossible to get to this line. Note the block above explains that I don't think the check is necessary, and has an assert on the same validity at the end.

// NOTE eahennis - I think this check is redundant, // but I'm not 100% sure yet. // If so, only cost is looking up HashRouter flags. auto const view = m_ledgerMaster.getCurrentLedger(); auto const [validity, reason] = checkValidity( app_.getHashRouter(), *transaction->getSTransaction(), view->rules(), app_.config()); assert(validity == Validity::Valid);

scottschurr · 2024-04-18T23:24:55Z

src/ripple/app/misc/NetworkOPs.cpp

        JLOG(m_journal.warn()) << transaction->getID() << ": cached bad!\n";
        transaction->setStatus(INVALID);
        transaction->setResult(temBAD_SIGNATURE);
-        return;
+        return false;


Interesting. According to my local code coverage these four lines are never hit. I know there are a few places in the unit tests that produce corrupted signatures. They must be handled elsewhere. Just noticing, no need to address this in this pull request.

Yeah, this one isn't documented as explicitly, but the idea is that it should also be impossible to get to this point with a bad signature.

scottschurr · 2024-04-19T00:02:25Z

src/ripple/app/misc/NetworkOPs.cpp

+        mTransactions.swap(transactions);
+    else
+    {
+        for (auto& t : transactions)


Does it make sense to reserve space in mTransactions? Consider

mTransactions.reserve(mTransactions.size() + transactions.size());

Yes it does!

* upstream/develop: fix: Remove redundant STAmount conversion in test (4996) fix: resolve database deadlock: (4989) test: verify the rounding behavior of equal-asset AMM deposits (4982) test: Add tests to raise coverage of AMM (4971) chore: Improve codecov coverage reporting (4977) test: Unit test for AMM offer overflow (4986) fix amendment to add `PreviousTxnID`/`PreviousTxnLgrSequence` (4751)

* upstream/develop: Set version to 2.2.0-b3

* upstream/develop: Ignore more commits Address compiler warnings Add markers around source lists Fix source lists Rewrite includes Format formerly .hpp files Rename .hpp to .h Simplify protobuf generation Consolidate external libraries Remove packaging scripts Remove unused files

Process held transactions through existing NetworkOPs batching: * Ensures that successful transactions are broadcast to peers, appropriate failed transactions are held for later attempts, fee changes are sent to subscribers, etc. Pop all transactions with sequential sequences, or tickets Give a transaction more chances to be retried: * Hold if the transaction gets a ter, tel, or tef result. * Use the new SF_HELD flag to ultimately prevent the transaction from being held and retried too many times. Decrease `shouldRelay` limit to 30s: * Allows transactions, validator lists, proposals, and validations to be relayed more often, but only when triggered by another event, such as receiving it from a peer * Decrease from 5min. * Expected to help transaction throughput on poorly connected networks.

* upstream/develop: Set version to 2.2.0-rc1

* upstream/develop: Remove flow assert: (5009) Update list of maintainers: (4984)

* upstream/develop: Add external directory to Conan recipe's exports (5006) Add missing includes (5011)

* commit 'c706926': (23 commits) Change order of checks in amm_info: (4924) Add the fixEnforceNFTokenTrustline amendment: (4946) Replaces the usage of boost::string_view with std::string_view (4509) docs: explain how to find a clang-format patch generated by CI (4521) XLS-52d: NFTokenMintOffer (4845) chore: remove repeat words (5041) Expose all amendments known by libxrpl (5026) fixReducedOffersV2: prevent offers from blocking order books: (5032) Additional unit tests for testing deletion of trust lines (4886) Fix conan typo: (5044) Add new command line option to make replaying transactions easier: (5027) Fix compatibility with Conan 2.x: (5001) Set version to 2.2.0 Set version to 2.2.0-rc3 Add xrpl.libpp as an exported lib in conan (5022) Fix Oracle's token pair deterministic order: (5021) Set version to 2.2.0-rc2 Fix last Liquidity Provider withdrawal: Fix offer crossing via single path AMM with transfer fee: Fix adjustAmountsByLPTokens(): ...

* commit 'f6879da': Add bin/physical.sh (4997) Prepare to rearrange sources: (4997)

* upstream/develop: fixInnerObjTemplate2 amendment (5047) Set version to 2.3.0-b1 Ignore restructuring commits (4997) Recompute loops (4997) Rewrite includes (4997) Rearrange sources (4997) Move CMake directory (4997)

* upstream/develop: fix CTID in tx command returns invalidParams on lowercase hex (5049) Invariant: prevent a deleted account from leaving (most) artifacts on the ledger. (4663) Bump codecov plugin version to version 4.5.0 (5055) fix "account_nfts" with unassociated marker returning issue (5045)

Bronek · 2025-04-28T10:51:49Z

src/xrpld/app/misc/CanonicalTXSet.cpp

-        itrNext->first.getAccount() == effectiveAccount)
+        itrNext->first.getAccount() == effectiveAccount &&
+        (!itrNext->second->getSeqProxy().isSeq() ||
+         itrNext->second->getSeqProxy().value() == seqProxy.value() + 1))


could this produce different transaction ~~order~~ set compared to the old code ? If so, shouldn't this be amendment-gated ?

could this produce different transaction ~~order~~ set compared to the old code ? If so, shouldn't this be amendment-gated ?

No, because this transaction is only relevant in the context of NetworkOPsImp::apply, which only handles the open ledger. Ordering rules are enforced, but don't matter the way they do in building a consensus ledger. Also, the one use is in LedgerMaster::popAccountTransaction, and is done on mHeldTransactions, which are transactions that are held for retry into the open ledger. Again, nothing to do with consensus.

src/xrpld/app/misc/NetworkOPs.cpp

Bronek

All the changes here are an improvement. There's one open question "does this need amendment ?" and it can be most likely addressed by a PR comment here (assuming the answer is "no", which I feel is likely).

ximinez · 2025-04-28T16:33:55Z

All the changes here are an improvement. There's one open question "does this need amendment ?" and it can be most likely addressed by a PR comment here (assuming the answer is "no", which I feel is likely).

No. 😀 An amendment is not needed for these changes. I am running some "due diligence" on my local node right now to make sure there are no obvious negative side effects like de-syncs.

@mtrippled

* refactor: Remove unused and add missing includes (#5293) The codebase is filled with includes that are unused, and which thus can be removed. At the same time, the files often do not include all headers that contain the definitions used in those files. This change uses clang-format and clang-tidy to clean up the includes, with minor manual intervention to ensure the code compiles on all platforms. * refactor: Calculate numFeatures automatically (#5324) Requiring manual updates of numFeatures is an annoying manual process that is easily forgotten, and leads to frequent merge conflicts. This change takes advantage of the `XRPL_FEATURE` and `XRPL_FIX` macros, and adds a new `XRPL_RETIRE` macro to automatically set `numFeatures`. * refactor: Improve ordering of headers with clang-format (#5343) Removes all manual header groupings from source and header files by leveraging clang-format options. * Rename "deadlock" to "stall" in `LoadManager` (#5341) What the LoadManager class does is stall detection, which is not the same as deadlock detection. In the condition of severe CPU starvation, LoadManager will currently intentionally crash rippled reporting `LogicError: Deadlock detected`. This error message is misleading as the condition being detected is not a deadlock. This change fixes and refactors the code in response. * Adds hub.xrpl-commons.org as a new Bootstrap Cluster (#5263) * fix: Error message for ledger_entry rpc (#5344) Changes the error to `malformedAddress` for `permissioned_domain` in the `ledger_entry` rpc, when the account is not a string. This change makes it more clear to a user what is wrong with their request. * fix: Handle invalid marker parameter in grpc call (#5317) The `end_marker` is used to limit the range of ledger entries to fetch. If `end_marker` is less than `marker`, a crash can occur. This change adds an additional check. * fix: trust line RPC no ripple flag (#5345) The Trustline RPC `no_ripple` flag gets set depending on `lsfDefaultRipple` flag, which is not a flag of a trustline but of the account root. The `lsfDefaultRipple` flag does not provide any insight if this particular trust line has `lsfLowNoRipple` or `lsfHighNoRipple` flag set, so it should not be used here at all. This change simplifies the logic. * refactor: Updates Conan dependencies: RocksDB (#5335) Updates RocksDB to version 9.7.3, the latest version supported in Conan 1.x. A patch for 9.7.4 that fixes a memory leak is included. * fix: Remove null pointer deref, just do abort (#5338) This change removes the existing undefined behavior from `LogicError`, so we can be certain that there will be always a stacktrace. De-referencing a null pointer is an old trick to generate `SIGSEGV`, which would typically also create a stacktrace. However it is also an undefined behaviour and compilers can do something else. A more robust way to create a stacktrace while crashing the program is to use `std::abort`, which we have also used in this location for a long time. If we combine the two, we might not get the expected behaviour - namely, the nullpointer deref followed by `std::abort`, as handled in certain compiler versions may not immediately cause a crash. We have observed stacktrace being wiped instead, and thread put in indeterminate state, then stacktrace created without any useful information. * chore: Add PR number to payload (#5310) This PR adds one more payload field to the libXRPL compatibility check workflow - the PR number itself. * chore: Update link to ripple-binary-codec (#5355) The link to ripple-binary-codec's definitions.json appears to be outdated. The updated link is also documented here: https://xrpl.org/docs/references/protocol/binary-format#definitions-file * Prevent consensus from getting stuck in the establish phase (#5277) - Detects if the consensus process is "stalled". If it is, then we can declare a consensus and end successfully even if we do not have 80% agreement on our proposal. - "Stalled" is defined as: - We have a close time consensus - Each disputed transaction is individually stalled: - It has been in the final "stuck" 95% requirement for at least 2 (avMIN_ROUNDS) "inner rounds" of phaseEstablish, - and either all of the other trusted proposers or this validator, if proposing, have had the same vote(s) for at least 4 (avSTALLED_ROUNDS) "inner rounds", and at least 80% of the validators (including this one, if appropriate) agree about the vote (whether yes or no). - If we have been in the establish phase for more than 10x the previous consensus establish phase's time, then consensus is considered "expired", and we will leave the round, which sends a partial validation (indicating that the node is moving on without validating). Two restrictions avoid prematurely exiting, or having an extended exit in extreme situations. - The 10x time is clamped to be within a range of 15s (ledgerMAX_CONSENSUS) to 120s (ledgerABANDON_CONSENSUS). - If consensus has not had an opportunity to walk through all avalanche states (defined as not going through 8 "inner rounds" of phaseEstablish), then ConsensusState::Expired is treated as ConsensusState::No. - When enough nodes leave the round, any remaining nodes will see they've fallen behind, and move on, too, generally before hitting the timeout. Any validations or partial validations sent during this time will help the consensus process bring the nodes back together. * test: enable TxQ unit tests work with variable reference fee (#5118) In preparation for a potential reference fee change we would like to verify that fee change works as expected. The first step is to fix all unit tests to be able to work with different reference fee values. * test: enable unit tests to work with variable reference fee (#5145) Fix remaining unit tests to be able to process reference fee values other than 10. * Intrusive SHAMap smart pointers for efficient memory use and lock-free synchronization (#5152) The main goal of this optimisation is memory reduction in SHAMapTreeNodes by introducing intrusive pointers instead of standard std::shared_ptr and std::weak_ptr. * refactor: Move integration tests from 'examples/' into 'tests/' (#5367) This change moves `examples/example` into `tests/conan` to make it clear it is an integration test, and adjusts the `conan` CI job accordingly * test: enable compile time param to change reference fee value (#5159) Adds an extra CI pipeline to perform unit tests using different values for fees. * Fix undefined uint128_t type on Windows non-unity builds (#5377) As part of import optimization, a transitive include had been removed that defined `BOOST_COMP_MSVC` on Windows. In unity builds, this definition was pulled in, but in non-unity builds it was not - causing a compilation error. An inspection of the Boost code revealed that we can just gate the statements by `_MS_VER` instead. A `#pragma message` is added to verify that the statement is only printed on Windows builds. * fix: uint128 ambiguousness breaking macos unity build (#5386) * Fix to correct memory ordering for compare_exchange_weak and wait in the intrusive reference counting logic (#5381) This change addresses a memory ordering assertion failure observed on one of the Windows test machines during the IntrusiveShared_test suite. * fix: disable `channel_authorize` when `signing_support` is disabled (#5385) * fix: Use the build image from ghcr.io (#5390) The ci pipelines are constantly hitting Docker Hub's public rate limiting since increasing the number of jobs we're running. This change switches over to images hosted in GitHub's registry. * Remove UNREACHABLE from `NetworkOPsImp::processTrustedProposal` (#5387) It’s possible for this to happen legitimately if a set of peers, including a validator, are connected in a cycle, and the latency and message processing time between those peers is significantly less than the latency between the validator and the last peer. It’s unlikely in the real world, but obviously easy to simulate with Antithesis. * Instrument proposal, validation and transaction messages (#5348) Adds metric counters for the following P2P message types: * Untrusted proposal and validation messages * Duplicate proposal, validation and transaction messages * refactor(trivial): reorganize ledger entry tests and helper functions (#5376) This PR splits out `ledger_entry` tests into its own file (`LedgerEntry_test.cpp`) and alphabetizes the helper functions in `LedgerEntry.cpp`. These commits were split out of #5237 to make that PR a little more manageable, since these basic trivial changes are most of the diff. There is no code change, just moving code around. * fix: `fixPayChanV1` (#4717) This change introduces a new fix amendment (`fixPayChanV1`) that prevents the creation of new `PaymentChannelCreate` transaction with a `CancelAfter` time less than the current ledger time. It piggy backs off of fix1571. Once the amendment is activated, creating a new `PaymentChannel` will require that if you specify the `CancelAfter` time/value, that value must be greater than or equal to the current ledger time. Currently users can create a payment channel where the `CancelAfter` time is before the current ledger time. This results in the payment channel being immediately closed on the next PaymentChannel transaction. * Fix: admin RPC webhook queue limit removal and timeout reduction (#5163) When using subscribe at admin RPC port to send webhooks for the transaction stream to a backend, on large(r) ledgers the endpoint receives fewer HTTP POSTs with TX information than the amount of transactions in a ledger. This change removes the hardcoded queue length to avoid dropping TX notifications for the admin-only command. In addition, the per-request TTL for outgoing RPC HTTP calls has been reduced from 10 minutes to 30 seconds. * fix: Adds CTID to RPC tx and updates error (#4738) This change fixes a number of issues involved with CTID: * CTID is not present on all RPC tx transactions. * rpcWRONG_NETWORK is missing in the E B9CF rrorCodes.cpp * Temporary disable automatic triggering macOS pipeline (#5397) We temporarily disable running unit tests on macOS on the CI pipeline while we are investigating the delays. * refactor: Clean up test logging to make it easier to search (#5396) This PR replaces the word `failed` with `failure` in any test names and renames some test files to fix MSVC warnings, so that it is easier to search through the test output to find tests that failed. * chore: Run CI on PRs that are Ready or have the "DraftRunCI" label (#5400) - Avoids costly overhead for idle PRs where the CI results don't add any value. * fix: CTID to use correct ledger_index (#5408) * chore: Small clarification to lsfDefaultRipple comment (#5410) * fix: Replaces random endpoint resolution with sequential (#5365) This change addresses an issue where `rippled` attempts to connect to an IPv6 address, even when the local network lacks IPv6 support, resulting in a "Network is unreachable" error. The fix replaces the custom endpoint selection logic with `boost::async_connect`, which sequentially attempts to connect to available endpoints until one succeeds or all fail. * Improve transaction relay logic (#4985) Combines four related changes: 1. "Decrease `shouldRelay` limit to 30s." Pretty self-explanatory. Currently, the limit is 5 minutes, by which point the `HashRouter` entry could have expired, making this transaction look brand new (and thus causing it to be relayed back to peers which have sent it to us recently). 2. "Give a transaction more chances to be retried." Will put a transaction into `LedgerMaster`'s held transactions if the transaction gets a `ter`, `tel`, or `tef` result. Old behavior was just `ter`. * Additionally, to prevent a transaction from being repeatedly held indefinitely, it must meet some extra conditions. (Documented in a comment in the code.) 3. "Pop all transactions with sequential sequences, or tickets." When a transaction is processed successfully, currently, one held transaction for the same account (if any) will be popped out of the held transactions list, and queued up for the next transaction batch. This change pops all transactions for the account, but only if they have sequential sequences (for non-ticket transactions) or use a ticket. This issue was identified from interactions with @mtrippled's #4504, which was merged, but unfortunately reverted later by #4852. When the batches were spaced out, it could potentially take a very long time for a large number of held transactions for an account to get processed through. However, whether batched or not, this change will help get held transactions cleared out, particularly if a missing earlier transaction is what held them up. 4. "Process held transactions through existing NetworkOPs batching." In the current processing, at the end of each consensus round, all held transactions are directly applied to the open ledger, then the held list is reset. This bypasses all of the logic in `NetworkOPs::apply` which, among other things, broadcasts successful transactions to peers. This means that the transaction may not get broadcast to peers for a really long time (5 minutes in the current implementation, or 30 seconds with this first commit). If the node is a bottleneck (either due to network configuration, or because the transaction was submitted locally), the transaction may not be seen by any other nodes or validators before it expires or causes other problems. * Enable passive squelching (#5358) This change updates the squelching logic to accept squelch messages for untrusted validators. As a result, servers will also squelch untrusted validator messages reducing duplicate traffic they generate. In particular: * Updates squelch message handling logic to squelch messages for all validators, not only trusted ones. * Updates the logic to send squelch messages to peers that don't squelch themselves * Increases the threshold for the number of messages that a peer has to deliver to consider it as a candidate for validator messages. * Add PermissionDelegation feature (#5354) This change implements the account permission delegation described in XLS-75d, see XRPLF/XRPL-Standards#257. * Introduces transaction-level and granular permissions that can be delegated to other accounts. * Adds `DelegateSet` transaction to grant specified permissions to another account. * Adds `ltDelegate` ledger object to maintain the permission list for delegating/delegated account pair. * Adds an optional `Delegate` field in common fields, allowing a delegated account to send transactions on behalf of the delegating account within the granted permission scope. The `Account` field remains the delegating account; the `Delegate` field specifies the delegated account. The transaction is signed by the delegated account. * refactor: use east const convention (#5409) This change refactors the codebase to use the "east const convention", and adds a clang-format rule to follow this convention. * fix: enable LedgerStateFix for delegation (#5427) * Configure CODEOWNERS for changes to RPC code (#5266) To ensure changes to any RPC-related code are compatible with other services, such as Clio, the RPC team will be required to review them. * fix: Ensure that coverage file generation is atomic. (#5426) Running unit tests in parallel and multiple threads can write into one file can corrupt output files, and then gcovr won't be able to parse the corrupted file. This change adds -fprofile-update=atomic as instructed by https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68080. * fix: Update validators-example.txt fix xrplf example URL (#5384) * Fix: Resolve slow test on macOS pipeline (#5392) Using std::barrier performs extremely poorly (~1 hour vs ~1 minute to run the test suite) in certain macOS environments. To unblock our macOS CI pipeline, std::barrier has been replaced with a custom mutex-based barrier (Barrier) that significantly improves performance without compromising correctness. * Set version to 2.5.0-b1 --------- Co-authored-by: Bart <bthomee@users.noreply.github.com> Co-authored-by: Ed Hennis <ed@ripple.com> Co-authored-by: Bronek Kozicki <brok@incorrekt.com> Co-authored-by: Darius Tumas <Tokeiito@users.noreply.github.com> Co-authored-by: Sergey Kuznetsov <skuznetsov@ripple.com> Co-authored-by: cyan317 <120398799+cindyyan317@users.noreply.github.com> Co-authored-by: Vlad <129996061+vvysokikh1@users.noreply.github.com> Co-authored-by: Alex Kremer <akremer@ripple.com> Co-authored-by: Valentin Balaschenko <13349202+vlntb@users.noreply.github.com> Co-authored-by: Mayukha Vadari <mvadari@ripple.com> Co-authored-by: Vito Tumas <5780819+Tapanito@users.noreply.github.com> Co-authored-by: Denis Angell <dangell@transia.co> Co-authored-by: Wietse Wind <w.wind@ipublications.net> Co-authored-by: yinyiqian1 <yqian@ripple.com> Co-authored-by: Jingchen <a1q123456@users.noreply.github.com> Co-authored-by: brettmollin <brettmollin@users.noreply.github.com>

ximinez added 4 commits April 10, 2024 17:58

Give a transaction more chances to be retried:

36dddfa

* Hold if the transaction gets a ter, tel, or tef result. * Use the new SF_HELD flag to ultimately prevent the transaction from being held and retried too many times.

Pop all transactions with sequential sequences, or tickets

99a63a0

Process held transactions through existing NetworkOPs batching:

e048a41

* Ensures that successful transactions are broadcast to peers, appropriate failed transactions are held for later attempts, fee changes are sent to subscribers, etc.

ximinez added the Perf Attn Needed Attention needed from RippleX Performance Team label Apr 10, 2024

ximinez requested review from mtrippled and scottschurr April 10, 2024 23:40

ximinez assigned mtrippled and scottschurr Apr 10, 2024

ximinez force-pushed the relay branch from fed03f7 to e048a41Compare April 10, 2024 23:59

scottschurr reviewed Apr 19, 2024

View reviewed changes

ximinez added 4 commits April 18, 2024 22:25

Merge remote-tracking branch 'upstream/develop' into relay

88c0074

* upstream/develop: Set version to 2.2.0-b3

Merge branch 'develop' into relay

a1218af

ximinez added 3 commits April 26, 2024 19:28

Merge remote-tracking branch 'upstream/develop' into relay

f67f346

* upstream/develop: Set version to 2.2.0-rc1

Merge remote-tracking branch 'upstream/develop' into relay

3863b01

* upstream/develop: Remove flow assert: (5009) Update list of maintainers: (4984)

Merge remote-tracking branch 'upstream/develop' into relay

8b43ea8

* upstream/develop: Add external directory to Conan recipe's exports (5006) Add missing includes (5011)

ximinez force-pushed the relay branch from b9762e8 to 8b43ea8 Compare July 1, 2024 22:07

ximinez and others added 9 commits July 2, 2024 17:06

Merge commit 'f6879da' into relay

7096d0a

* commit 'f6879da': Add bin/physical.sh (4997) Prepare to rearrange sources: (4997)

Move CMake directory

643ca1a

Rearrange sources

f0170d3

Rewrite includes

8742e3b

Recompute loops

e6f2597

Merge remote-tracking branch 'upstream/develop' into relay

0f71e1c

* upstream/develop: fixInnerObjTemplate2 amendment (5047) Set version to 2.3.0-b1 Ignore restructuring commits (4997) Recompute loops (4997) Rewrite includes (4997) Rearrange sources (4997) Move CMake directory (4997)

Fix formatting

31a18f8

ximinez added 16 commits March 18, 2025 20:44

Merge branch 'develop' into relay

461a7cd

Merge branch 'develop' into relay

24ec8b6

Merge branch 'develop' into relay

3dd76a9

Merge branch 'develop' into relay

7b0402b

Merge branch 'develop' into relay

de851fd

Merge branch 'develop' into relay

00490df

Merge branch 'develop' into relay

57edd8a

Merge branch 'develop' into relay

3951f87

Merge branch 'develop' into relay

c1a4f07

Merge branch 'develop' into relay

3d0597a

Merge branch 'develop' into relay

16362dc

Merge branch 'develop' into relay

a958bed

Merge branch 'develop' into relay

765ccfe

Merge branch 'develop' into relay

08134a0

Merge branch 'develop' into relay

684016a

Merge branch 'develop' into relay

16ae8e1

ximinez added Perf Test Desired (Optional) RippleX Perf Team should look at this PR. The PR will not necessarily wait for testing to finish and removed Perf Attn Needed Attention needed from RippleX Performance Team labels Apr 17, 2025

Merge branch 'develop' into relay

d360a6c

Bronek reviewed Apr 28, 2025

View reviewed changes

src/xrpld/app/misc/NetworkOPs.cpp Show resolved Hide resolved

Bronek reviewed Apr 28, 2025

View reviewed changes

src/xrpld/app/misc/NetworkOPs.cpp Show resolved Hide resolved

Bronek approved these changes Apr 28, 2025

View reviewed changes

Merge branch 'develop' into relay

2eb90c0

Merge branch 'develop' into relay

0c0362f

ximinez added the Ready to merge *PR author* thinks it's ready to merge. Has passed code review. Perf sign-off may still be required. label Apr 29, 2025

ximinez merged commit 4a084ce into XRPLF:develop May 1, 2025
24 checks passed

ximinez deleted the relay branch May 1, 2025 17:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve transaction relay logic #4985

Improve transaction relay logic #4985

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		(!itrNext->second->getSeqProxy().isSeq() \|\|
		itrNext->second->getSeqProxy().value() == seqProxy.value() + 1))

Improve transaction relay logic #4985

Improve transaction relay logic #4985

Uh oh!

Conversation

High Level Overview of Change

Context of Change

Type of Change

Before / After

Uh oh!

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!