-
Notifications
You must be signed in to change notification settings - Fork 636
Potential DoS vector #517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Seems related to tendermint/tendermint#9840 |
Is this the new home of tendermint official repo ? |
CometBFT is a fork and successor to Tendermint Core, and implements the Tendermint algorithm. The team that used to maintain Tendermint Core moved to CometBFT for administrative and technical reasons. Tendermint Core repo is being archived. For full details: blog post and tweet. CometBFT is meant for use in Cosmos networks and beyond. |
Hey guys, I want to basically let you know that today I spoke with Reece Williams, The author of the caching tool that looked like it might be able to handle this issue. I think that the conclusion is that it is at best a band-aid. What I mean is that we're going to need to solve this in the comet layer. I'm available by Twitter DM for more information, and we have a slack channel with the comet team. |
Thanks for the explanation. I missed the notice. Isn't this enough as a hotfix? faneaatiku#1 That function |
The question that pops in my mind ATM: is this issue replicated on other endpoints ? Asking because |
I confirm the validity of faddat's comment. Multiple DoS bugs definitely need a solution that is baked into the software - not 3rd party "solutions" |
A quick reminder here regarding the current form of the RPC: https://docs.cometbft.com/v0.34/core/running-in-production#rpc There are a number of endpoints whose results cannot be cached. See https://github.com/cometbft/cometbft/blob/95e17851cdf359ee6c09178af3b301d63616181d/rpc/core/routes.go as well as https://docs.cometbft.com/v0.34/rpc/ for details as to which endpoints' results can and cannot be cached. For endpoints whose responses cannot be cached (e.g. Our longer-term plan here is to allow the RPC to be run in a separate process to the node itself, while being independently scalable (via, for example, #82). Another interim solution is to allow operators to control certain defaults, like the maximum number of items per page for paginated endpoints' responses, as the quantity of data returned here (in terms of number of bytes) depends heavily on the application being run, volume of transactions, etc. (i.e. it's highly app chain-specific). Can anyone actually confirm that these requests are returning more than 100 items per page? Because from our own tests we cannot cause any paginated endpoints to return more than 100 items per page due to the use of Line 128 in 95e1785
Finally, I want to remind everyone here of our code of conduct and security guidelines. |
I can confirm calling the query in the first post with |
One of they issues with using this endpoints and why it can be considered a DoS attack is how the sotrage layer queries for the data. When calling Query of data Line 203 in 95e1785
Retrieval of data: cometbft/state/indexer/block/kv/kv.go Lines 166 to 203 in 95e1785
The above is not in charge of minimizing the results just getting it and due to the lexicographical nature of the indexes it may iterate through a large amount of data and return it: cometbft/state/indexer/block/kv/kv.go Line 291 in 95e1785
some minor pruning of data can be done but as specified in the example Sorting data: Lines 208 to 218 in 95e1785
Last but not least, pagination: Lines 220 to 246 in 95e1785
A couple of points to be taken from here
|
Thank you @edjroz , this matches/confirms our investigations!
As a long-term solution, we leaning towards encouraging operators to offload RPC to an external data node, such as the tentative design in #82. Your feedback there would be very welcome! |
Following up on this point:
As a node operator, how do I actually block these? I tried rejecting requests to |
Have you tried something like this? I assume you're fronting your RPC with a web server like Nginx, right? (As per https://docs.cometbft.com/v0.37/core/running-in-production#rpc) |
Yes I'm using a similar block. It doesn't matter because the block search function can still be accessed with a JSONRPC request to the root as described here: https://docs.cometbft.com/v0.34/rpc/
The same technique works to access status information including moniker and listen address on RPCs that 404 the |
Cosmos nodes should never expose any port or endpoint to the internet directly. This issue identifies a specific problem with specific endpoint(s), but no handler in CometBFT or Cosmos is written in a way that is robust enough to serve general internet traffic. If you want to offer specific endpoints to arbitrary third-party clients, you need to guard that access with some kind of intermediating layer, which (ideally) does authn/authz, (definitely) does request validation, (probably) does rate limiting, etc. |
So for public RPC endpoints (which are necessary in a multitude of cases) not exposing nodes directly to the Internet is common sense, but we have to intercept all incoming requests and validate them which requires replicating the logic to decode and parse every incoming request type? It definitely feels like this should squarely fall within the remit of the handler to sanitise input. It already has context of the requests and what the params mean. If the handler does not handle this, then we need to duplicate a lot of logic... |
I'm 👍 on improving the safety of any/all RPC endpoints. My only point is that the running in production docs say that exposing those endpoints to the public internet is not a supported use case, and if it's not supported it shouldn't be done 😅 |
Thanks for that feedback. I reported it because it probably hits public APIs (wallets, dapps, explorers, etc.) which are critically important to the ecosystem. It's surely not the DoS of the century but I suspect that it's a consequential one. |
I understand the docs recommend against it, but the reality is that the current Cosmos ecosystem relies heavily on public RPC endpoints. Most wallets and apps have been built around this, and the popular client library CosmJS even calls the |
The docs don't "recommend against" this use case, they explicitly state that the use case is unsupported. |
I mean if you want to be pedantic these are literally the first five words of the RPC section in the docs:
But that's not the point. The majority of the Cosmos ecosystem uses these endpoints in an apparently unsupported way, it makes more sense to me to figure out how to support them than to expect every app to change how it works. |
- the smaller block height scope allows for less of the db to be read into memory to answer this query, for large chains this can be a very significant saving (>30x difference - see issue: cometbft/cometbft#517 - the drawback is that the total blocks paginated is no longer known, we now estimate it using the chain height at start (import progress may go over 100% before finishing)
* feat: add simple Docker service script * fix: local node modules should not be copied into the Docker container * feat: make node module version installation more strict * feat: add sqlite3 dependency * refactor: separate root entry JS file from source file directory * feat: add indexing to SQLite3 * refactor: abstract out fetch page iteration logic * refactor: abstract out fetch page iteration logic * refactor: generalize tx page reading a bit further * feat: improve progress printing * feat: improve progress printing more, add special starting line * feat: add block catch up handling * docs: add example of localhost listening in Docker * feat: allow catching up from arbitrary block heights * feat: improve progress printing more, add special ending line * feat: add polling sync mechanism to keep indexer data up to date * feat: improve logging of catchUp and keepUp progress * feat: add example stat: data filter and reduce to cumulative output * refactor: move current ingestion methods to a subfolder: - the RPC endpoint ingestion needs to be separated from the REST endpoint ingestion methods * feat: duplicate block time across all tables fro convenience * feat: allow faster block catchup method: - the smaller block height scope allows for less of the db to be read into memory to answer this query, for large chains this can be a very significant saving (>30x difference - see issue: cometbft/cometbft#517 - the drawback is that the total blocks paginated is no longer known, we now estimate it using the chain height at start (import progress may go over 100% before finishing) * feat: improve import logging for very long and slow querying chains * fix: make meta column values reference dex.pairs ids as intended * feat: switch to REST endpoint catchUp and keepUp methods: - much faster on long chains - but some detail is required to be removed * fix: fix small issues * feat: add new event tables to keep track of app state data * feat: update events to v0.1.6 style events * feat: add debug route to display all table data in dev environment * feat: add Swap table TokenOut column * feat: add dex token id table * fix: only complete event saving on successful events * fix: add temporary commit to be removed once on API v0.1.6 * fix: add temp fix to being unable to detect NewDepositFailure events: - they are named as "NewDeposit" events * fix: add back foreign key constraints * feat: add event.TickUpdate table * fix: add temporary TickUpdate events to simulate TickUpdate data * fix: naming of TickUpdate index * feat: add tx and tx_result event indexes into event payload tables: - so they will point to unique tx event references * refactor: make table indexes more consistent * feat: add more event table indexes: give uniqueness contstraints * fix: add all sources of potential foreign key token names * feat: process transactions and events in order: better row ordering * fix: add fake TickUpdate events better * feat: add derived tick state and price states per transaction :) * Revert "fix: add temp fix to being unable to detect NewDepositFailure events:" This reverts commit 80a8a60. * Revert "fix: add temporary commit to be removed once on API v0.1.6" This reverts commit b225c31. * feat: update to API v0.2.0 * fix: having empty reserve cells is a worse problem * fix: improve URL formatting regex * fix: pagination.key is no longer being returned by the REST API * fix: improve API error logging * feat: return price timeline data for a pair * feat: allow timeline extents to be reversed * fix: parse "pageSize" query as "page-size" * refactor: make from and to height processing more readable * feat: add offset query parameter for list * feat: add custom pagination * fix: simplify pagination query using unix timestamp extents * fix: Buffer errors when "next-key" query is not provided * refactor: stop pre-preparing the SQL statements: its fast enough * refactor: reorganize file structure * refactor: reorganize file structure: separate routes from data/server * feat: add open, close, min, max ticks to price timeseries data * fix: update trade pair swap volume stat endpoint * feat: extend pair volume states with more outputs * feat: add CORS configuration * feat: add Readme docs to help people get started * refactor: name pagination query vars better * feat: make pagination keys similar to those from CosmosSDK * refactor: simplify pagination key logic * feat: detect if there is no page after the current page: - only conditionally return a next-key * fix: pagination key data is nested * fix: open and close positions were flipped * fix: output column names to align with trade charts * feat: send timeseries data in a condensed shape that is noted * refactor: remove unused RPC methods * refactor: move to sqlite wrapper dependency: - has support for promises - has support for inline SQL syntax highlighting * chore: add type hinting for SQL syntax highlighting * fix: serialize usage * fix: go back to using memory for dev purposes (clashing files) * feat: allow debug row limit to be customized * refactor: simplify sql statement preparation with template strings * refactor: set database file through environment vars * feat: add import speed logs * feat: use TypeScript * fix: lastID of ignored inserted row is not the ID of the existing row: - fix by finding the row ID first before attempting an insert * refactor: remove implicit promise var, use resolved value instead * docs: remove rambling comment thoughts * refactor: make large query easier to read * refactor: switch all sql syntax to sql-template-strings format * chore: install prettier depenedency * refactor: run prettier * chore: install and configure eslint tools * fix: linting issue picked up badly named type * fix: linting fixes * fix: indentation * docs: add reasoning for temporary hack fix * fix: remove hack fix because we use API v0.2.1 now * refactor: fix complex substitution in queries with inline subqueries * style: restrict line width to 89 chars * refactor: space out SQL by keywords better, align table/column names * docs: improve Get Started docs * feat: add husky to do a precommit linting hook * Test change * refactor: show build step as precursor to serving in package.json * docs: make package scripts cleaner, use "npm run dev" as dev process * fix: indextation of API call * docs: add devcontainer for easier VSCode usage including extensions * docs: make dev process and .env file process more clear * docs: note requirements for git hooks to work * feat: improve log storage and readability (include extension): - to use the extension to view log colors, run the * docs: improve log file type hinting * docs: update entrypoint * fix: remove incorrect import extensions: - these were previously silently ignored * fix: avoid top-level await to use normal node serving: - the top-level await meant requiring modules instead of commonJS and the node binary will normally only work with commonJS which is why the --experimental-specifier-resolution=node flag was required (and we don't want to rely on an experimental flag) * feat: make compilation faster by using esbuild to build * feat: allow dev environment to reload and serve the indexer on changes * fix: move esbuild to required dependencies: - we won't ship the transpiled code so it will be required to transpile the code before serving - we serve this instead of typescript/tsc because it is faster * feat: reduce Docker image build size in production * feat: add Dev Container context: - so that it does not sync node_modules folders across OS system and the Docker system - the SQLite binaries in particular will likely be different * fix: fix log message * feat: reduce production Docker image size ~80% by using Alpine Linux: - the binary building step appears to no longer be necessary... * docs: improve docs around new setup * refactor: separate out ingestion of each table of data * refactor: split out event decoding * refactor: collect all insert row invocations into the same logic: - less confusing than each table cascading different logic parts * refactor: make event action to table insert relation clearer * fix: spacing and comments * refactor: split out SQL table creation * refactor: simplify awaiting in series logic with for loops * feat: skip state and price update if reserves unchanged on TickUpdate * docs: Update docker compose command * feat: simplify Table columns by naming them exactly after their inputs
Maybe we can just get to the place where we don't blow out p2p and we don't overflow the nodes logs?
I've got to agree. Willing to put time and effort into this, but not sure where to direct that.
I have a vibe that this might have been used to destabilize the hub's p2p. There was a disturbance in the force yesterday, reported here: Upon looking into it:
... then it is likely that someone was shooting at nodes like this: but I figured that out in 2021, and first identified that the hub is choked in 2020... so, if we are able to have the node keep p2ping despite queries, that'd be really cool. |
I think the problem is due to The method should be like this: func keyForHeight(result *abci.TxResult) []byte {
keyBytes := joinBytes([]byte(types.TxHeightKey), []byte(tagKeySeparator),
[]byte{byte(result.Height >> 24), byte(result.Height >> 16), byte(result.Height >> 8), byte(result.Height)},
[]byte(fmt.Sprintf("/%d%s",
result.Index,
// Added to facilitate having the eventSeq in event keys
// Otherwise queries break expecting 5 entries
eventSeqSeparator+"0",
)),
)
return keyBytes
} and with natural indexing like this, we can use range index in fromKey := startKeyWithHeight(startKeyBz, lowerHeight)
toKey := startKeyWithHeight(startKeyBz, upperHeight)
// already have correct range
it, err := txi.store.Iterator(fromKey, toKey)
if err != nil {
panic(err)
}
defer it.Close()
LOOP:
for ; it.Valid(); it.Next() { and it could return thousand blocks in milliseconds Alternatively we can use txId as 8-byte array, assuming 1 block can not have more than 5000 transactions txId := result.Height * 5000 + result.Index then txId is also in natural ordering, we can search range directly by height * 5000 to search, and get back height = txId / 5000. |
Ported from cosmos/gaia#2285 & modified to keep the essential part about the potential vulnerability itself.
There is a DoS vulnerability in the
block_search?query=
method. Requesting the value"block.height > 1"&page=1&per_page=10000
in a loop triggers an OOM crash. Pagination needs to be implemented here. Every public Cosmos API I've looked at is vulnerable to the attack. More info: https://twitter.com/123456/status/1634714205902323712The text was updated successfully, but these errors were encountered: