8000 Analyze, prototype and experiment easily applicable solutions to reduce bandwidth usage · Issue #1058 · cometbft/cometbft · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Analyze, prototype and experiment easily applicable solutions to reduce bandwidth usage #1058

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task done
lasarojc opened this issue Jun 30, 2023 · 6 comments · Fixed by #1584
Closed
1 task done
Labels
mempool P:bandwidth-optimization Priority: Optimize bandwidth usage tracking A complex issue broken down into sub-problems
Milestone

Comments

@lasarojc
Copy link
Contributor
lasarojc commented Jun 30, 2023

Problem

The bandwidth consumption in CometBFT is high. In particular, this comes from the fact that the reactor of the mempool is a basic flooding gossip protocol (à la Bitcoin). Because the network topology is dense, it is very likely that a node receives a transaction multiple times from different neighbors. Notice that such a redundancy is useful to prevent byzantine or selfish nodes to stop the dissemination of a transaction in the network.

Some evidence of this redundancy is seen in #613

Target audience

Consensus developers

Upside

Small improvements could cut bandwidth usage in CometBFT by (at least) 25%, based on other results.

Downside

There are many possible improvements to test. To mitigate the risk of spending too much time on discussing improvements, we will divide the improvements into two batches, one with quick wins and one with ideas that require further though and refinement.

For each of the approaches that are chosen on the brainstorm we will

  • prototype the approach
  • experiment with improvements using e2e and fast prototyping
  • experiment with the approaches that turned out promising in the QA infra
  • document the solutions as ADRs

The remaining ideas will become part of #11 and if there is room to try these and other ideas on that issue, we will iterate on brainstorming and prototyping.

Consider if #1048 is pre-requisite for this task of if backlog.

Definition of done

  • Provide in the form of an ADR a new gossip protocol, or an improvement to the current one, that reduces the bandwidth of mempool transactions, as demonstrated by experiments on a testnet that resembles as close as possible to a real-world mainnet.
  • Implement, test, and merge into main the solution proposed in the ADR.
  • Protocols or ideas discussed but not implemented or prototyped should also be documented.
@lasarojc lasarojc added the P:bandwidth-optimization Priority: Optimize bandwidth usage label Jun 30, 2023
@lasarojc lasarojc added this to the 2023-Q3 milestone Jul 4, 2023
@lasarojc lasarojc added the tracking A complex issue broken down into sub-problems label Jul 5, 2023
@hvanz
Copy link
Member
hvanz commented Jul 5, 2023

Tentative list of improvements to the propagation protocol:

  1. For each transaction, keep track of the nodes that are known to have the transaction (currently it is only the senders)
  2. Reply with TxAck instead of Tx itself if Tx is large.
  3. Random sleep during gossip to allow receiving Tx before having to send it.
  4. Send HaveTxId instead of Tx itself, depending on Tx size, to warn that Tx is not required.
  5. Send HaveTxIdSet (BF) instead of Tx themselves, depending on the Tx set size, to let neighbors know which Txs are required.
  6. Propose TxId instead of Tx and accept a block only when the Tx with TxId is known.
    • aka Content Addressable Transactions (CAT), in Celestia
    • also mentioned in the Narwhal and Tusk paper (page 4, right column)
  7. Batch transactions (was removed, see tendermint#5796)
  8. Attach to Txs message the list of peers that are known to have the txs
    • draft implementation
    • Note that this approach should use a Bloom filter on the list of peers to reduce the message size, and it needs to check for Byzantine behavior.

@lasarojc lasarojc changed the title mempool: Analyze, prototype and experiment easily applicable solutions to reduce bandwidth usage Analyze, prototype and experiment easily applicable solutions to reduce bandwidth usage Jul 5, 2023
@lasarojc
Copy link
Contributor Author
lasarojc commented Jul 19, 2023

Output of discussing the tentative list of optimizations.

For each transaction, keep track of the nodes that are known to have the transaction (currently it is only the senders)

  • Open an issue to discuss the situation in which the same transaction can be sent several times to the same peer because the iterator used to navigate the list of pending transactions (the mempool) can be reset to the beginning of the list. There are two solutions for this problem:
    • Prevent the mempool implementation from resetting iterators to the beginning of the list
    • Store in the senders map the peers to which a node sends a transaction, which prevents sending the same transaction to a peer multiple times even when the mempool iterator is reset
  • Why is the code the way it is right now? Is it a deficiency in the iterator code or was it added to solve a bug?

Reply with TxAck instead of Tx itself if Tx is large

  • The idea comes from a misunderstanding of the mempool behavior, which is now better defined here
  • another way of looking at it is as a duplicate of 4th approach in the list.
  • In any case, we are crossing this approach out of the list.

Random sleep during gossip to allow receiving Tx before having to send it.

  • This is now better understood as a form of throttling, which may be implemented as a random sleep inside the broadcastTx routine or as quota on the amount of bytes sent within a time period.
  • Yet another approach would be skipping some transactions during the gossiping, which could lead to the fifo ordering of transactions not being respected.

@lasarojc
Copy link
Contributor Author
lasarojc commented Jul 19, 2023

@lasarojc lasarojc moved this from Todo to In Progress in CometBFT 2023 Jul 19, 2023
@cometbft cometbft deleted a comment from cason Jul 20, 2023
@p0mvn
Copy link
p0mvn commented Aug 31, 2023

Hi! Are there any quick wins I could contribute to?

@github-project-automation github-project-automation bot moved this from In Progress to Done in CometBFT 2023 Nov 9, 2023
@lasarojc
Copy link
Contributor Author
lasarojc commented Nov 9, 2023

Reopening to handle back port into experimental branches

@lasarojc lasarojc reopened this Nov 9, 2023
@adizere adizere modified the milestones: 2023-Q3, 2024-Q1 Jan 10, 2024
@adizere adizere added this to CometBFT Jan 10, 2024
@github-project-automation github-project-automation bot moved this to Todo in CometBFT Jan 10, 2024
@adizere adizere moved this from Todo to Done in CometBFT Jan 11, 2024
@adizere
Copy link
Member
adizere commented Jan 11, 2024

Fixed with the flurry of PRs mentioned above.

@adizere adizere closed this as completed Jan 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mempool P:bandwidth-optimization Priority: Optimize bandwidth usage tracking A complex issue broken down into sub-problems
Projects
No open projects
Status: Done
Status: Done
4 participants
0