Potential DoS vector #517

adizere · 2023-03-13T13:30:06Z

Ported from cosmos/gaia#2285 & modified to keep the essential part about the potential vulnerability itself.

There is a DoS vulnerability in the block_search?query= method. Requesting the value "block.height > 1"&page=1&per_page=10000 in a loop triggers an OOM crash. Pagination needs to be implemented here. Every public Cosmos API I've looked at is vulnerable to the attack. More info: https://twitter.com/123456/status/1634714205902323712

The text was updated successfully, but these errors were encountered:

adizere · 2023-03-13T13:40:02Z

Seems related to tendermint/tendermint#9840

faneaatiku · 2023-03-13T13:46:29Z

Is this the new home of tendermint official repo ?

adizere · 2023-03-13T14:14:20Z

CometBFT is a fork and successor to Tendermint Core, and implements the Tendermint algorithm. The team that used to maintain Tendermint Core moved to CometBFT for administrative and technical reasons. Tendermint Core repo is being archived. For full details: blog post and tweet.

CometBFT is meant for use in Cosmos networks and beyond.

faddat · 2023-03-13T14:34:00Z

Hey guys, I want to basically let you know that today I spoke with Reece Williams, The author of the caching tool that looked like it might be able to handle this issue. I think that the conclusion is that it is at best a band-aid. What I mean is that we're going to need to solve this in the comet layer.

I'm available by Twitter DM for more information, and we have a slack channel with the comet team.

faneaatiku · 2023-03-13T15:00:34Z

CometBFT is a fork and successor to Tendermint Core, and implements the Tendermint algorithm. The team that used to maintain Tendermint Core moved to CometBFT for administrative and technical reasons. Tendermint Core repo is being archived. For full details: blog post and tweet.

CometBFT is meant for use in Cosmos networks and beyond.

Thanks for the explanation. I missed the notice.

Isn't this enough as a hotfix? faneaatiku#1

That function validatePerPage which in fact does a sanitization .... Is more than enough imho.

faneaatiku · 2023-03-13T15:04:22Z

The question that pops in my mind ATM: is this issue replicated on other endpoints ? Asking because perPage is not checked in other routes either. like TxSearch

visualbasic6 · 2023-03-13T18:29:08Z

I confirm the validity of faddat's comment. Multiple DoS bugs definitely need a solution that is baked into the software - not 3rd party "solutions"

thanethomson · 2023-03-13T19:18:23Z

A quick reminder here regarding the current form of the RPC: https://docs.cometbft.com/v0.34/core/running-in-production#rpc

There are a number of endpoints whose results cannot be cached. See https://github.com/cometbft/cometbft/blob/95e17851cdf359ee6c09178af3b301d63616181d/rpc/core/routes.go as well as https://docs.cometbft.com/v0.34/rpc/ for details as to which endpoints' results can and cannot be cached. For endpoints whose responses cannot be cached (e.g. /block_search and /tx_search), extreme caution needs to be exercised in terms of making those endpoints publicly accessible.

Our longer-term plan here is to allow the RPC to be run in a separate process to the node itself, while being independently scalable (via, for example, #82).

Another interim solution is to allow operators to control certain defaults, like the maximum number of items per page for paginated endpoints' responses, as the quantity of data returned here (in terms of number of bytes) depends heavily on the application being run, volume of transactions, etc. (i.e. it's highly app chain-specific).

Can anyone actually confirm that these requests are returning more than 100 items per page? Because from our own tests we cannot cause any paginated endpoints to return more than 100 items per page due to the use of

cometbft/rpc/core/env.go

Line 128 in 95e1785

func validatePerPage(perPagePtr *int) int {

Finally, I want to remind everyone here of our code of conduct and security guidelines.

mintthemoon · 2023-03-14T19:01:07Z

Can anyone actually confirm that these requests are returning more than 100 items per page? Because from our own tests we cannot cause any paginated endpoints to return more than 100 items per page due to the use of

cometbft/rpc/core/env.go

Line 128 in 95e1785

func validatePerPage(perPagePtr *int) int {

I can confirm calling the query in the first post with per_page=100 locks up a node the same way as 10000 does. Tested with kujirad v0.7.1 (tendermint v0.34.23).

edjroz · 2023-03-30T20:15:13Z

One of they issues with using this endpoints and why it can be considered a DoS attack is how the sotrage layer queries for the data. When calling TxSearch or BlockSearch both use the indexer which uses ranges to determine where to query, but because the storage layer uses lexicographical ordering it has to load all bytes to memory and then it proceeds to filtering out.

Query of data

cometbft/rpc/core/blocks.go

Line 203 in 95e1785

results, err := env.BlockIndexer.Search(ctx.Context(), q)

Retrieval of data:

cometbft/state/indexer/block/kv/kv.go

Lines 166 to 203 in 95e1785

    
           if len(ranges) > 0 { 
        
           	skipIndexes = append(skipIndexes, rangeIndexes...) 
        
           	for _, qr := range ranges { 
        
           		if qr.Key == types.BlockHeightKey && matchEvents && !heightInfo.onlyHeightRange { 
        
           			// If the query contains ranges other than the height then we need to treat the height 
        
           			// range when querying the conditions of the other range. 
        
           			// Otherwise we can just return all the blocks within the height range (as there is no 
        
           			// additional constraint on events) 
        
           			continue 
        
           		} 
        
           		prefix, err := orderedcode.Append(nil, qr.Key) 
        
           		if err != nil { 
        
           			return nil, fmt.Errorf("failed to create prefix key: %w", err) 
        
           		} 
        
           		if !heightsInitialized { 
        
           			filteredHeights, err = idx.matchRange(ctx, qr, prefix, filteredHeights, true, matchEvents, heightInfo) 
        
           			if err != nil { 
        
           				return nil, err 
        
           			} 
        
           			heightsInitialized = true 
        
           			// Ignore any remaining conditions if the first condition resulted in no 
        
           			// matches (assuming implicit AND operand). 
        
           			if len(filteredHeights) == 0 { 
        
           				break 
        
           			} 
        
           		} else { 
        
           			filteredHeights, err = idx.matchRange(ctx, qr, prefix, filteredHeights, false, matchEvents, heightInfo) 
        
           			if err != nil { 
        
           				return nil, err 
        
           			} 
        
           		} 
        
           	} 
        
           }

The above is not in charge of minimizing the results just getting it and due to the lexicographical nature of the indexes it may iterate through a large amount of data and return it:

cometbft/state/indexer/block/kv/kv.go

Line 291 in 95e1785

it, err := dbm.IteratePrefix(idx.store, startKey)

some minor pruning of data can be done but as specified in the example block.height > 1 is a large data sampling only after all results have been retrieved from storage does the pagination begin to take place

Sorting data:

cometbft/rpc/core/blocks.go

Lines 208 to 218 in 95e1785

    
           // sort results (must be done before pagination) 
        
           switch orderBy { 
        
           case "desc", "": 
        
           	sort.Slice(results, func(i, j int) bool { return results[i] > results[j] }) 
        
           case "asc": 
        
           	sort.Slice(results, func(i, j int) bool { return results[i] < results[j] }) 
        
           default: 
        
           	return nil, errors.New("expected order_by to be either `asc` or `desc` or empty") 
        
           }

Last but not least, pagination:

cometbft/rpc/core/blocks.go

Lines 220 to 246 in 95e1785

    
           // paginate results 
        
           totalCount := len(results) 
        
           perPage := validatePerPage(perPagePtr) 
        
           page, err := validatePage(pagePtr, perPage, totalCount) 
        
           if err != nil { 
        
           	return nil, err 
        
           } 
        
           skipCount := validateSkipCount(page, perPage) 
        
           pageSize := cmtmath.MinInt(perPage, totalCount-skipCount) 
        
           apiResults := make([]*ctypes.ResultBlock, 0, pageSize) 
        
           for i := skipCount; i < skipCount+pageSize; i++ { 
        
           	block := env.BlockStore.LoadBlock(results[i]) 
        
           	if block != nil { 
        
           		blockMeta := env.BlockStore.LoadBlockMeta(block.Height) 
        
           		if blockMeta != nil { 
        
           			apiResults = append(apiResults, &ctypes.ResultBlock{ 
        
           				Block:   block, 
        
           				BlockID: blockMeta.BlockID, 
        
           			}) 
        
           		} 
        
           	} 
        
           } 
        
           return &ctypes.ResultBlockSearch{Blocks: apiResults, TotalCount: totalCount}, nil

A couple of points to be taken from here

You should not expose TxSearch or BlockSearch pretty much anything that would expose your storage layer directly since anyone can send a large query and blow up your node.
Queries should be constraint as much as possible good examples are within the kv_test this query language should be more widely available in the documentation.
The storage layer indexing might require some look into how it can be improved to avoid having to load everything in memory before paginating.

adizere · 2023-04-03T08:16:51Z

Thank you @edjroz , this matches/confirms our investigations!

The storage layer indexing might require some look into how it can be improved to avoid having to load everything in memory before paginating.

As a long-term solution, we leaning towards encouraging operators to offload RPC to an external data node, such as the tentative design in #82. Your feedback there would be very welcome!

mintthemoon · 2023-04-03T21:37:16Z

Following up on this point:

You should not expose TxSearch or BlockSearch pretty much anything that would expose your storage layer directly since anyone can send a large query and blow up your node.

As a node operator, how do I actually block these? I tried rejecting requests to /tx_search and /block_search but I've found the feature is still available with an RPC call like the blockSearch function in cosmjs performs.

thanethomson · 2023-04-03T22:19:23Z

As a node operator, how do I actually block these?

Have you tried something like this? I assume you're fronting your RPC with a web server like Nginx, right? (As per https://docs.cometbft.com/v0.37/core/running-in-production#rpc)

mintthemoon · 2023-04-03T22:24:51Z

As a node operator, how do I actually block these?

Have you tried something like this? I assume you're fronting your RPC with a web server like Nginx, right? (As per https://docs.cometbft.com/v0.37/core/running-in-production#rpc)

Yes I'm using a similar block. It doesn't matter because the block search function can still be accessed with a JSONRPC request to the root as described here: https://docs.cometbft.com/v0.34/rpc/

JSONRPC requests can be POST'd to the root RPC endpoint via HTTP.
curl --header "Content-Type: application/json" --request POST --data '{"method": "block", "params": ["5"], "id": 1}' localhost:26657

The same technique works to access status information including moniker and listen address on RPCs that 404 the /status endpoint.

peterbourgon · 2023-04-03T23:41:17Z

You should not expose TxSearch or BlockSearch pretty much anything that would expose your storage layer directly since anyone can send a large query and blow up your node.

As a node operator, how do I actually block these? I tried rejecting requests to /tx_search and /block_search but I've found the feature is still available with an RPC call like the blockSearch function in cosmjs performs.

Cosmos nodes should never expose any port or endpoint to the internet directly.

This issue identifies a specific problem with specific endpoint(s), but no handler in CometBFT or Cosmos is written in a way that is robust enough to serve general internet traffic. If you want to offer specific endpoints to arbitrary third-party clients, you need to guard that access with some kind of intermediating layer, which (ideally) does authn/authz, (definitely) does request validation, (probably) does rate limiting, etc.

joe-bowman · 2023-04-06T10:03:10Z

Cosmos nodes should never expose any port or endpoint to the internet directly.

This issue identifies a specific problem with specific endpoint(s), but no handler in CometBFT or Cosmos is written in a way that is robust enough to serve general internet traffic. If you want to offer specific endpoints to arbitrary third-party clients, you need to guard that access with some kind of intermediating layer, which (ideally) does authn/authz, (definitely) does request validation, (probably) does rate limiting, etc.

So for public RPC endpoints (which are necessary in a multitude of cases) not exposing nodes directly to the Internet is common sense, but we have to intercept all incoming requests and validate them which requires replicating the logic to decode and parse every incoming request type?

It definitely feels like this should squarely fall within the remit of the handler to sanitise input. It already has context of the requests and what the params mean. If the handler does not handle this, then we need to duplicate a lot of logic...

peterbourgon · 2023-04-09T02:58:04Z

I'm 👍 on improving the safety of any/all RPC endpoints. My only point is that the running in production docs say that exposing those endpoints to the public internet is not a supported use case, and if it's not supported it shouldn't be done 😅

visualbasic6 · 2023-04-09T04:11:23Z

Thanks for that feedback. I reported it because it probably hits public APIs (wallets, dapps, explorers, etc.) which are critically important to the ecosystem. It's surely not the DoS of the century but I suspect that it's a consequential one.

mintthemoon · 2023-04-10T13:51:31Z

I understand the docs recommend against it, but the reality is that the current Cosmos ecosystem relies heavily on public RPC endpoints. Most wallets and apps have been built around this, and the popular client library CosmJS even calls the TxSearch function at issue here every time it submits a transaction to check that it went through, so securing that without breaking apps is nontrivial.

peterbourgon · 2023-04-11T21:23:50Z

The docs don't "recommend against" this use case, they explicitly state that the use case is unsupported.

mintthemoon · 2023-04-12T02:11:00Z

I mean if you want to be pedantic these are literally the first five words of the RPC section in the docs:

It is generally not recommended

But that's not the point. The majority of the Cosmos ecosystem uses these endpoints in an apparently unsupported way, it makes more sense to me to figure out how to support them than to expect every app to change how it works.

- the smaller block height scope allows for less of the db to be read into memory to answer this query, for large chains this can be a very significant saving (>30x difference - see issue: cometbft/cometbft#517 - the drawback is that the total blocks paginated is no longer known, we now estimate it using the chain height at start (import progress may go over 100% before finishing)

* feat: add simple Docker service script * fix: local node modules should not be copied into the Docker container * feat: make node module version installation more strict * feat: add sqlite3 dependency * refactor: separate root entry JS file from source file directory * feat: add indexing to SQLite3 * refactor: abstract out fetch page iteration logic * refactor: abstract out fetch page iteration logic * refactor: generalize tx page reading a bit further * feat: improve progress printing * feat: improve progress printing more, add special starting line * feat: add block catch up handling * docs: add example of localhost listening in Docker * feat: allow catching up from arbitrary block heights * feat: improve progress printing more, add special ending line * feat: add polling sync mechanism to keep indexer data up to date * feat: improve logging of catchUp and keepUp progress * feat: add example stat: data filter and reduce to cumulative output * refactor: move current ingestion methods to a subfolder: - the RPC endpoint ingestion needs to be separated from the REST endpoint ingestion methods * feat: duplicate block time across all tables fro convenience * feat: allow faster block catchup method: - the smaller block height scope allows for less of the db to be read into memory to answer this query, for large chains this can be a very significant saving (>30x difference - see issue: cometbft/cometbft#517 - the drawback is that the total blocks paginated is no longer known, we now estimate it using the chain height at start (import progress may go over 100% before finishing) * feat: improve import logging for very long and slow querying chains * fix: make meta column values reference dex.pairs ids as intended * feat: switch to REST endpoint catchUp and keepUp methods: - much faster on long chains - but some detail is required to be removed * fix: fix small issues * feat: add new event tables to keep track of app state data * feat: update events to v0.1.6 style events * feat: add debug route to display all table data in dev environment * feat: add Swap table TokenOut column * feat: add dex token id table * fix: only complete event saving on successful events * fix: add temporary commit to be removed once on API v0.1.6 * fix: add temp fix to being unable to detect NewDepositFailure events: - they are named as "NewDeposit" events * fix: add back foreign key constraints * feat: add event.TickUpdate table * fix: add temporary TickUpdate events to simulate TickUpdate data * fix: naming of TickUpdate index * feat: add tx and tx_result event indexes into event payload tables: - so they will point to unique tx event references * refactor: make table indexes more consistent * feat: add more event table indexes: give uniqueness contstraints * fix: add all sources of potential foreign key token names * feat: process transactions and events in order: better row ordering * fix: add fake TickUpdate events better * feat: add derived tick state and price states per transaction :) * Revert "fix: add temp fix to being unable to detect NewDepositFailure events:" This reverts commit 80a8a60. * Revert "fix: add temporary commit to be removed once on API v0.1.6" This reverts commit b225c31. * feat: update to API v0.2.0 * fix: having empty reserve cells is a worse problem * fix: improve URL formatting regex * fix: pagination.key is no longer being returned by the REST API * fix: improve API error logging * feat: return price timeline data for a pair * feat: allow timeline extents to be reversed * fix: parse "pageSize" query as "page-size" * refactor: make from and to height processing more readable * feat: add offset query parameter for list * feat: add custom pagination * fix: simplify pagination query using unix timestamp extents * fix: Buffer errors when "next-key" query is not provided * refactor: stop pre-preparing the SQL statements: its fast enough * refactor: reorganize file structure * refactor: reorganize file structure: separate routes from data/server * feat: add open, close, min, max ticks to price timeseries data * fix: update trade pair swap volume stat endpoint * feat: extend pair volume states with more outputs * feat: add CORS configuration * feat: add Readme docs to help people get started * refactor: name pagination query vars better * feat: make pagination keys similar to those from CosmosSDK * refactor: simplify pagination key logic * feat: detect if there is no page after the current page: - only conditionally return a next-key * fix: pagination key data is nested * fix: open and close positions were flipped * fix: output column names to align with trade charts * feat: send timeseries data in a condensed shape that is noted * refactor: remove unused RPC methods * refactor: move to sqlite wrapper dependency: - has support for promises - has support for inline SQL syntax highlighting * chore: add type hinting for SQL syntax highlighting * fix: serialize usage * fix: go back to using memory for dev purposes (clashing files) * feat: allow debug row limit to be customized * refactor: simplify sql statement preparation with template strings * refactor: set database file through environment vars * feat: add import speed logs * feat: use TypeScript * fix: lastID of ignored inserted row is not the ID of the existing row: - fix by finding the row ID first before attempting an insert * refactor: remove implicit promise var, use resolved value instead * docs: remove rambling comment thoughts * refactor: make large query easier to read * refactor: switch all sql syntax to sql-template-strings format * chore: install prettier depenedency * refactor: run prettier * chore: install and configure eslint tools * fix: linting issue picked up badly named type * fix: linting fixes * fix: indentation * docs: add reasoning for temporary hack fix * fix: remove hack fix because we use API v0.2.1 now * refactor: fix complex substitution in queries with inline subqueries * style: restrict line width to 89 chars * refactor: space out SQL by keywords better, align table/column names * docs: improve Get Started docs * feat: add husky to do a precommit linting hook * Test change * refactor: show build step as precursor to serving in package.json * docs: make package scripts cleaner, use "npm run dev" as dev process * fix: indextation of API call * docs: add devcontainer for easier VSCode usage including extensions * docs: make dev process and .env file process more clear * docs: note requirements for git hooks to work * feat: improve log storage and readability (include extension): - to use the extension to view log colors, run the * docs: improve log file type hinting * docs: update entrypoint * fix: remove incorrect import extensions: - these were previously silently ignored * fix: avoid top-level await to use normal node serving: - the top-level await meant requiring modules instead of commonJS and the node binary will normally only work with commonJS which is why the --experimental-specifier-resolution=node flag was required (and we don't want to rely on an experimental flag) * feat: make compilation faster by using esbuild to build * feat: allow dev environment to reload and serve the indexer on changes * fix: move esbuild to required dependencies: - we won't ship the transpiled code so it will be required to transpile the code before serving - we serve this instead of typescript/tsc because it is faster * feat: reduce Docker image build size in production * feat: add Dev Container context: - so that it does not sync node_modules folders across OS system and the Docker system - the SQLite binaries in particular will likely be different * fix: fix log message * feat: reduce production Docker image size ~80% by using Alpine Linux: - the binary building step appears to no longer be necessary... * docs: improve docs around new setup * refactor: separate out ingestion of each table of data * refactor: split out event decoding * refactor: collect all insert row invocations into the same logic: - less confusing than each table cascading different logic parts * refactor: make event action to table insert relation clearer * fix: spacing and comments * refactor: split out SQL table creation * refactor: simplify awaiting in series logic with for loops * feat: skip state and price update if reserves unchanged on TickUpdate * docs: Update docker compose command * feat: simplify Table columns by naming them exactly after their inputs

faddat · 2023-07-03T08:31:11Z

@peterbourgon

This issue identifies a specific problem with specific endpoint(s), but no handler in CometBFT or Cosmos is written in a way that is robust enough to serve general internet traffic. If you want to offer specific endpoints to arbitrary third-party clients, you need to guard that access with some kind of intermediating layer, which (ideally) does authn/authz, (definitely) does request validation, (probably) does rate limiting, etc.

Maybe we can just get to the place where we don't blow out p2p and we don't overflow the nodes logs?

Killing Nodes from afar tendermint/tendermint#9840

@mintthemoon

But that's not the point. The majority of the Cosmos ecosystem uses these endpoints in an apparently unsupported way, it makes more sense to me to figure out how to support them than to expect every app to change how it works.

I've got to agree. Willing to put time and effort into this, but not sure where to direct that.

@peterbourgon

I'm 👍 on improving the safety of any/all RPC endpoints. My only point is that the running in production docs say that exposing those endpoints to the public internet is not a supported use case, and if it's not supported it shouldn't be done 😅

I have a vibe that this might have been used to destabilize the hub's p2p. There was a disturbance in the force yesterday, reported here:

https://twitter.com/Zdeadex/status/1675566228398546944

Upon looking into it:

to my knowledge those nodes aren't on shared infra, I have spoken with Tesselated so far, they run independent.
barring a netsplit, which can be fixed like this:
- fix: possible netsplits and chokes cosmos/gaia#2635

... then it is likely that someone was shooting at nodes like this:

https://twitter.com/gregosuri/status/1436181842336444416

but I figured that out in 2021, and first identified that the hub is choked in 2020... so, if we are able to have the node keep p2ping despite queries, that'd be really cool.

tubackkhoa · 2023-09-11T08:49:10Z

I think the problem is due to keyForHeight indexing, it uses string index instead of byte index, which should be in ordering.

The method should be like this:

func keyForHeight(result *abci.TxResult) []byte {	
	keyBytes := joinBytes([]byte(types.TxHeightKey), []byte(tagKeySeparator),
		[]byte{byte(result.Height >> 24), byte(result.Height >> 16), byte(result.Height >> 8), byte(result.Height)},
		[]byte(fmt.Sprintf("/%d%s",			
			result.Index,
			// Added to facilitate having the eventSeq in event keys
			// Otherwise queries break expecting 5 entries
			eventSeqSeparator+"0",
		)),
	)
	return keyBytes
}

and with natural indexing like this, we can use range index in matchRangeHeight method without getting all the data then sorting in memory like this

	fromKey := startKeyWithHeight(startKeyBz, lowerHeight)
	toKey := startKeyWithHeight(startKeyBz, upperHeight)

	// already have correct range
	it, err := txi.store.Iterator(fromKey, toKey)
	if err != nil {
		panic(err)
	}
	defer it.Close()

LOOP:
	for ; it.Valid(); it.Next() {

and it could return thousand blocks in milliseconds

Alternatively we can use txId as 8-byte array, assuming 1 block can not have more than 5000 transactions

txId := result.Height * 5000 + result.Index

then txId is also in natural ordering, we can search range directly by height * 5000 to search, and get back height = txId / 5000.

adizere added bug Something isn't working needs-triage This issue/PR has not yet been triaged by the team. labels Mar 13, 2023

adizere mentioned this issue Mar 13, 2023

Remote API DoS cosmos/gaia#2285

Closed

thanethomson added rpc and removed needs-triage This issue/PR has not yet been triaged by the team. labels Mar 13, 2023

mrlp4 mentioned this issue Sep 22, 2023

DOS vulnerability in /tx_search RPC endpoint #1387

Closed

faddat mentioned this issue Nov 3, 2024

bug: It is possible to perma break the rpc #4411

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Potential DoS vector #517

Potential DoS vector #517

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Potential DoS vector #517

Potential DoS vector #517

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!