8000 Potential DoS vector · Issue #517 · cometbft/cometbft · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Potential DoS vector #517

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
adizere opened this issue Mar 13, 2023 · 23 comments
Open

Potential DoS vector #517

adizere opened this issue Mar 13, 2023 · 23 comments
Labels
bug Something isn't working rpc

Comments

@adizere
Copy link
Member
adizere commented Mar 13, 2023

Ported from cosmos/gaia#2285 & modified to keep the essential part about the potential vulnerability itself.


There is a DoS vulnerability in the block_search?query= method. Requesting the value "block.height > 1"&page=1&per_page=10000 in a loop triggers an OOM crash. Pagination needs to be implemented here. Every public Cosmos API I've looked at is vulnerable to the attack. More info: https://twitter.com/123456/status/1634714205902323712


@adizere adizere added bug Something isn't working needs-triage This issue/PR has not yet been triaged by the team. labels Mar 13, 2023
@adizere
Copy link
Member Author
adizere commented Mar 13, 2023

Seems related to tendermint/tendermint#9840

@faneaatiku
Copy link

Is this the new home of tendermint official repo ?

@adizere
Copy link
Member Author
adizere commented Mar 13, 2023

CometBFT is a fork and successor to Tendermint Core, and implements the Tendermint algorithm. The team that used to maintain Tendermint Core moved to CometBFT for administrative and technical reasons. Tendermint Core repo is being archived. For full details: blog post and tweet.

CometBFT is meant for use in Cosmos networks and beyond.

@faddat
Copy link
Contributor
faddat commented Mar 13, 2023

Hey guys, I want to basically let you know that today I spoke with Reece Williams, The author of the caching tool that looked like it might be able to handle this issue. I think that the conclusion is that it is at best a band-aid. What I mean is that we're going to need to solve this in the comet layer.

I'm available by Twitter DM for more information, and we have a slack channel with the comet team.

@faneaatiku
Copy link
faneaatiku commented Mar 13, 2023

CometBFT is a fork and successor to Tendermint Core, and implements the Tendermint algorithm. The team that used to maintain Tendermint Core moved to CometBFT for administrative and technical reasons. Tendermint Core repo is being archived. For full details: blog post and tweet.

CometBFT is meant for use in Cosmos networks and beyond.

Thanks for the explanation. I missed the notice.

Isn't this enough as a hotfix? faneaatiku#1

That function validatePerPage which in fact does a sanitization .... Is more than enough imho.

@faneaatiku
Copy link

The question that pops in my mind ATM: is this issue replicated on other endpoints ? Asking because perPage is not checked in other routes either. like TxSearch

@visualbasic6
Copy link
visualbasic6 commented Mar 13, 2023

I confirm the validity of faddat's comment. Multiple DoS bugs definitely need a solution that is baked into the software - not 3rd party "solutions"

image

@thanethomson
Copy link
Contributor

A quick reminder here regarding the current form of the RPC: https://docs.cometbft.com/v0.34/core/running-in-production#rpc

There are a number of endpoints whose results cannot be cached. See https://github.com/cometbft/cometbft/blob/95e17851cdf359ee6c09178af3b301d63616181d/rpc/core/routes.go as well as https://docs.cometbft.com/v0.34/rpc/ for details as to which endpoints' results can and cannot be cached. For endpoints whose responses cannot be cached (e.g. /block_search and /tx_search), extreme caution needs to be exercised in terms of making those endpoints publicly accessible.

Our longer-term plan here is to allow the RPC to be run in a separate process to the node itself, while being independently scalable (via, for example, #82).

Another interim solution is to allow operators to control certain defaults, like the maximum number of items per page for paginated endpoints' responses, as the quantity of data returned here (in terms of number of bytes) depends heavily on the application being run, volume of transactions, etc. (i.e. it's highly app chain-specific).

Can anyone actually confirm that these requests are returning more than 100 items per page? Because from our own tests we cannot cause any paginated endpoints to return more than 100 items per page due to the use of

func validatePerPage(perPagePtr *int) int {

Finally, I want to remind everyone here of our code of conduct and security guidelines.

@thanethomson thanethomson added rpc and removed needs-triage This issue/PR has not yet been triaged by the team. labels Mar 13, 2023
@mintthemoon
Copy link
mintthemoon commented Mar 14, 2023

Can anyone actually confirm that these requests are returning more than 100 items per page? Because from our own tests we cannot cause any paginated endpoints to return more than 100 items per page due to the use of

func validatePerPage(perPagePtr *int) int {

I can confirm calling the query in the first post with per_page=100 locks up a node the same way as 10000 does. Tested with kujirad v0.7.1 (tendermint v0.34.23).

@edjroz
Copy link
edjroz commented Mar 30, 2023

One of they issues with using this endpoints and why it can be considered a DoS attack is how the sotrage layer queries for the data. When calling TxSearch or BlockSearch both use the indexer which uses ranges to determine where to query, but because the storage layer uses lexicographical ordering it has to load all bytes to memory and then it proceeds to filtering out.

Query of data

results, err := env.BlockIndexer.Search(ctx.Context(), q)

Retrieval of data:

if len(ranges) > 0 {
skipIndexes = append(skipIndexes, rangeIndexes...)
for _, qr := range ranges {
if qr.Key == types.BlockHeightKey && matchEvents && !heightInfo.onlyHeightRange {
// If the query contains ranges other than the height then we need to treat the height
// range when querying the conditions of the other range.
// Otherwise we can just return all the blocks within the height range (as there is no
// additional constraint on events)
continue
}
prefix, err := orderedcode.Append(nil, qr.Key)
if err != nil {
return nil, fmt.Errorf("failed to create prefix key: %w", err)
}
if !heightsInitialized {
filteredHeights, err = idx.matchRange(ctx, qr, prefix, filteredHeights, true, matchEvents, heightInfo)
if err != nil {
return nil, err
}
heightsInitialized = true
// Ignore any remaining conditions if the first condition resulted in no
// matches (assuming implicit AND operand).
if len(filteredHeights) == 0 {
break
}
} else {
filteredHeights, err = idx.matchRange(ctx, qr, prefix, filteredHeights, false, matchEvents, heightInfo)
if err != nil {
return nil, err
}
}
}
}

The above is not in charge of minimizing the results just getting it and due to the lexicographical nature of the indexes it may iterate through a large amount of data and return it:

it, err := dbm.IteratePrefix(idx.store, startKey)

some minor pruning of data can be done but as specified in the example block.height > 1 is a large data sampling only after all results have been retrieved from storage does the pagination begin to take place

Sorting data:

cometbft/rpc/core/blocks.go

Lines 208 to 218 in 95e1785

// sort results (must be done before pagination)
switch orderBy {
case "desc", "":
sort.Slice(results, func(i, j int) bool { return results[i] > results[j] })
case "asc":
sort.Slice(results, func(i, j int) bool { return results[i] < results[j] })
default:
return nil, errors.New("expected order_by to be either `asc` or `desc` or empty")
}

Last but not least, pagination:

cometbft/rpc/core/blocks.go

Lines 220 to 246 in 95e1785

// paginate results
totalCount := len(results)
perPage := validatePerPage(perPagePtr)
page, err := validatePage(pagePtr, perPage, totalCount)
if err != nil {
return nil, err
}
skipCount := validateSkipCount(page, perPage)
pageSize := cmtmath.MinInt(perPage, totalCount-skipCount)
apiResults := make([]*ctypes.ResultBlock, 0, pageSize)
for i := skipCount; i < skipCount+pageSize; i++ {
block := env.BlockStore.LoadBlock(results[i])
if block != nil {
blockMeta := env.BlockStore.LoadBlockMeta(block.Height)
if blockMeta != nil {
apiResults = append(apiResults, &ctypes.ResultBlock{
Block: block,
BlockID: blockMeta.BlockID,
})
}
}
}
return &ctypes.ResultBlockSearch{Blocks: apiResults, TotalCount: totalCount}, nil

A couple of points to be taken from here

  • You should not expose TxSearch or BlockSearch pretty much anything that would expose your storage layer directly since anyone can send a large query and blow up your node.
  • Queries should be constraint as much as possible good examples are within the kv_test this query language should be more widely available in the documentation.
  • The storage layer indexing might require some look into how it can be improved to avoid having to load everything in memory before paginating.

@adizere
Copy link
8000 Member Author
adizere commented Apr 3, 2023

Thank you @edjroz , this matches/confirms our investigations!

The storage layer indexing might require some look into how it can be improved to avoid having to load everything in memory before paginating.

As a long-term solution, we leaning towards encouraging operators to offload RPC to an external data node, such as the tentative design in #82. Your feedback there would be very welcome!

@mintthemoon
Copy link

Following up on this point:

  • You should not expose TxSearch or BlockSearch pretty much anything that would expose your storage layer directly since anyone can send a large query and blow up your node.

As a node operator, how do I actually block these? I tried rejecting requests to /tx_search and /block_search but I've found the feature is still available with an RPC call like the blockSearch function in cosmjs performs.

@thanethomson
Copy link
Contributor
thanethomson commented Apr 3, 2023

As a node operator, how do I actually block these?

Have you tried something like this? I assume you're fronting your RPC with a web server like Nginx, right? (As per https://docs.cometbft.com/v0.37/core/running-in-production#rpc)

@mintthemoon
Copy link

As a node operator, how do I actually block these?

Have you tried something like this? I assume you're fronting your RPC with a web server like Nginx, right? (As per https://docs.cometbft.com/v0.37/core/running-in-production#rpc)

Yes I'm using a similar block. It doesn't matter because the block search function can still be accessed with a JSONRPC request to the root as described here: https://docs.cometbft.com/v0.34/rpc/

JSONRPC requests can be POST'd to the root RPC endpoint via HTTP.
curl --header "Content-Type: application/json" --request POST --data '{"method": "block", "params": ["5"], "id": 1}' localhost:26657

The same technique works to access status information including moniker and listen address on RPCs that 404 the /status endpoint.

@peterbourgon
Copy link
  • You should not expose TxSearch or BlockSearch pretty much anything that would expose your storage layer directly since anyone can send a large query and blow up your node.

As a node operator, how do I actually block these? I tried rejecting requests to /tx_search and /block_search but I've found the feature is still available with an RPC call like the blockSearch function in cosmjs performs.

Cosmos nodes should never expose any port or endpoint to the internet directly.

This issue identifies a specific problem with specific endpoint(s), but no handler in CometBFT or Cosmos is written in a way that is robust enough to serve general internet traffic. If you want to offer specific endpoints to arbitrary third-party clients, you need to guard that access with some kind of intermediating layer, which (ideally) does authn/authz, (definitely) does request validation, (probably) does rate limiting, etc.

@joe-bowman
Copy link

Cosmos nodes should never expose any port or endpoint to the internet directly.

This issue identifies a specific problem with specific endpoint(s), but no handler in CometBFT or Cosmos is written in a way that is robust enough to serve general internet traffic. If you want to offer specific endpoints to arbitrary third-party clients, you need to guard that access with some kind of intermediating layer, which (ideally) does authn/authz, (definitely) does request validation, (probably) does rate limiting, etc.

So for public RPC endpoints (which are necessary in a multitude of cases) not exposing nodes directly to the Internet is common sense, but we have to intercept all incoming requests and validate them which requires replicating the logic to decode and parse every incoming request type?

It definitely feels like this should squarely fall within the remit of the handler to sanitise input. It already has context of the requests and what the params mean. If the handler does not handle this, then we need to duplicate a lot of logic...

@peterbourgon
Copy link
peterbourgon commented Apr 9, 2023

I'm 👍 on improving the safety of any/all RPC endpoints. My only point is that the running in production docs say that exposing those endpoints to the public internet is not a supported use case, and if it's not supported it shouldn't be done 😅

@visualbasic6
Copy link

Thanks for that feedback. I reported it because it probably hits public APIs (wallets, dapps, explorers, etc.) which are critically important to the ecosystem. It's surely not the DoS of the century but I suspect that it's a consequential one.

@mintthemoon
Copy link
mintthemoon commented Apr 10, 2023

I understand the docs recommend against it, but the reality is that the current Cosmos ecosystem relies heavily on public RPC endpoints. Most wallets and apps have been built around this, and the popular client library CosmJS even calls the TxSearch function at issue here every time it submits a transaction to check that it went through, so securing that without breaking apps is nontrivial.

@peterbourgon
Copy link

The docs don't "recommend against" this use case, they explicitly state that the use case is unsupported.

@mintthemoon
Copy link

I mean if you want to be pedantic these are literally the first five words of the RPC section in the docs:

It is generally not recommended

But that's not the point. The majority of the Cosmos ecosystem uses these endpoints in an apparently unsupported way, it makes more sense to me to figure out how to support them than to expect every app to change how it works.

dib542 added a commit to duality-labs/hapi-indexer that referenced this issue Apr 17, 2023
    - the smaller block height scope allows for less of the db to be
      read into memory to answer this query, for large chains this
      can be a very significant saving (>30x difference
    - see issue: cometbft/cometbft#517
    - the drawback is that the total blocks paginated is no longer
      known, we now estimate it using the chain height at start
      (import progress may go over 100% before finishing)
dib542 added a commit to duality-labs/hapi-indexer that referenced this issue May 24, 2023
* feat: add simple Docker service script

* fix: local node modules should not be copied into the Docker container

* feat: make node module version installation more strict

* feat: add sqlite3 dependency

* refactor: separate root entry JS file from source file directory

* feat: add indexing to SQLite3

* refactor: abstract out fetch page iteration logic

* refactor: abstract out fetch page iteration logic

* refactor: generalize tx page reading a bit further

* feat: improve progress printing

* feat: improve progress printing more, add special starting line

* feat: add block catch up handling

* docs: add example of localhost listening in Docker

* feat: allow catching up from arbitrary block heights

* feat: improve progress printing more, add special ending line

* feat: add polling sync mechanism to keep indexer data up to date

* feat: improve logging of catchUp and keepUp progress

* feat: add example stat: data filter and reduce to cumulative output

* refactor: move current ingestion methods to a subfolder:

    - the RPC endpoint ingestion needs to be separated from the
      REST endpoint ingestion methods

* feat: duplicate block time across all tables fro convenience

* feat: allow faster block catchup method:

    - the smaller block height scope allows for less of the db to be
      read into memory to answer this query, for large chains this
      can be a very significant saving (>30x difference
    - see issue: cometbft/cometbft#517
    - the drawback is that the total blocks paginated is no longer
      known, we now estimate it using the chain height at start
      (import progress may go over 100% before finishing)

* feat: improve import logging for very long and slow querying chains

* fix: make meta column values reference dex.pairs ids as intended

* feat: switch to REST endpoint catchUp and keepUp methods:

    - much faster on long chains
    - but some detail is required to be removed

* fix: fix small issues

* feat: add new event tables to keep track of app state data

* feat: update events to v0.1.6 style events

* feat: add debug route to display all table data in dev environment

* feat: add Swap table TokenOut column

* feat: add dex token id table

* fix: only complete event saving on successful events

* fix: add temporary commit to be removed once on API v0.1.6

* fix: add temp fix to being unable to detect NewDepositFailure events:

    - they are named as "NewDeposit" events

* fix: add back foreign key constraints

* feat: add event.TickUpdate table

* fix: add temporary TickUpdate events to simulate TickUpdate data

* fix: naming of TickUpdate index

* feat: add tx and tx_result event indexes into event payload tables:

    - so they will point to unique tx event references

* refactor: make table indexes more consistent

* feat: add more event table indexes: give uniqueness contstraints

* fix: add all sources of potential foreign key token names

* feat: process transactions and events in order: better row ordering

* fix: add fake TickUpdate events better

* feat: add derived tick state and price states per transaction :)

* Revert "fix: add temp fix to being unable to detect NewDepositFailure events:"

This reverts commit 80a8a60.

* Revert "fix: add temporary commit to be removed once on API v0.1.6"

This reverts commit b225c31.

* feat: update to API v0.2.0

* fix: having empty reserve cells is a worse problem

* fix: improve URL formatting regex

* fix: pagination.key is no longer being returned by the REST API

* fix: improve API error logging

* feat: return price timeline data for a pair

* feat: allow timeline extents to be reversed

* fix: parse "pageSize" query as "page-size"

* refactor: make from and to height processing more readable

* feat: add offset query parameter for list

* feat: add custom pagination

* fix: simplify pagination query using unix timestamp extents

* fix: Buffer errors when "next-key" query is not provided

* refactor: stop pre-preparing the SQL statements: its fast enough

* refactor: reorganize file structure

* refactor: reorganize file structure: separate routes from data/server

* feat: add open, close, min, max ticks to price timeseries data

* fix: update trade pair swap volume stat endpoint

* feat: extend pair volume states with more outputs

* feat: add CORS configuration

* feat: add Readme docs to help people get started

* refactor: name pagination query vars better

* feat: make pagination keys similar to those from CosmosSDK

* refactor: simplify pagination key logic

* feat: detect if there is no page after the current page:

    - only conditionally return a next-key

* fix: pagination key data is nested

* fix: open and close positions were flipped

* fix: output column names to align with trade charts

* feat: send timeseries data in a condensed shape that is noted

* refactor: remove unused RPC methods

* refactor: move to sqlite wrapper dependency:

    - has support for promises
    - has support for inline SQL syntax highlighting

* chore: add type hinting for SQL syntax highlighting

* fix: serialize usage

* fix: go back to using memory for dev purposes (clashing files)

* feat: allow debug row limit to be customized

* refactor: simplify sql statement preparation with template strings

* refactor: set database file through environment vars

* feat: add import speed logs

* feat: use TypeScript

* fix: lastID of ignored inserted row is not the ID of the existing row:

    - fix by finding the row ID first before attempting an insert

* refactor: remove implicit promise var, use resolved value instead

* docs: remove rambling comment thoughts

* refactor: make large query easier to read

* refactor: switch all sql syntax to sql-template-strings format

* chore: install prettier depenedency

* refactor: run prettier

* chore: install and configure eslint tools

* fix: linting issue picked up badly named type

* fix: linting fixes

* fix: indentation

* docs: add reasoning for temporary hack fix

* fix: remove hack fix because we use API v0.2.1 now

* refactor: fix complex substitution in queries with inline subqueries

* style: restrict line width to 89 chars

* refactor: space out SQL by keywords better, align table/column names

* docs: improve Get Started docs

* feat: add husky to do a precommit linting hook

* Test change

* refactor: show build step as precursor to serving in package.json

* docs: make package scripts cleaner, use "npm run dev" as dev process

* fix: indextation of API call

* docs: add devcontainer for easier VSCode usage including extensions

* docs: make dev process and .env file process more clear

* docs: note requirements for git hooks to work

* feat: improve log storage and readability (include extension):

    - to use the extension to view log colors, run the

* docs: improve log file type hinting

* docs: update entrypoint

* fix: remove incorrect import extensions:

    - these were previously silently ignored

* fix: avoid top-level await to use normal node serving:

    - the top-level await meant requiring modules instead of commonJS
      and the node binary will normally only work with commonJS
      which is why the --experimental-specifier-resolution=node flag
      was required (and we don't want to rely on an experimental flag)

* feat: make compilation faster by using esbuild to build

* feat: allow dev environment to reload and serve the indexer on changes

* fix: move esbuild to required dependencies:

    - we won't ship the transpiled code so it will be required
      to transpile the code before serving
    - we serve this instead of typescript/tsc because it is faster

* feat: reduce Docker image build size in production

* feat: add Dev Container context:

    - so that it does not sync node_modules folders
      across OS system and the Docker system
    - the SQLite binaries in particular will likely be different

* fix: fix log message

* feat: reduce production Docker image size ~80% by using Alpine Linux:

    - the binary building step appears to no longer be necessary...

* docs: improve docs around new setup

* refactor: separate out ingestion of each table of data

* refactor: split out event decoding

* refactor: collect all insert row invocations into the same logic:

    - less confusing than each table cascading different logic parts

* refactor: make event action to table insert relation clearer

* fix: spacing and comments

* refactor: split out SQL table creation

* refactor: simplify awaiting in series logic with for loops

* feat: skip state and price update if reserves unchanged on TickUpdate

* docs: Update docker compose command

* feat: simplify Table columns by naming them exactly after their inputs
@faddat
Copy link
Contributor
faddat commented Jul 3, 2023

@peterbourgon

This issue identifies a specific problem with specific endpoint(s), but no handler in CometBFT or Cosmos is written in a way that is robust enough to serve general internet traffic. If you want to offer specific endpoints to arbitrary third-party clients, you need to guard that access with some kind of intermediating layer, which (ideally) does authn/authz, (definitely) does request validation, (probably) does rate limiting, etc.

Maybe we can just get to the place where we don't blow out p2p and we don't overflow the nodes logs?

@mintthemoon

But that's not the point. The majority of the Cosmos ecosystem uses these endpoints in an apparently unsupported way, it makes more sense to me to figure out how to support them than to expect every app to change how it works.

I've got to agree. Willing to put time and effort into this, but not sure where to direct that.

@peterbourgon

I'm 👍 on improving the safety of any/all RPC endpoints. My only point is that the running in production docs say that exposing those endpoints to the public internet is not a supported use case, and if it's not supported it shouldn't be done 😅

I have a vibe that this might have been used to destabilize the hub's p2p. There was a disturbance in the force yesterday, reported here:

Upon looking into it:

... then it is likely that someone was shooting at nodes like this:

but I figured that out in 2021, and first identified that the hub is choked in 2020... so, if we are able to have the node keep p2ping despite queries, that'd be really cool.

@tubackkhoa
Copy link

I think the problem is due to keyForHeight indexing, it uses string index instead of byte index, which should be in ordering.

The method should be like this:

func keyForHeight(result *abci.TxResult) []byte {	
	keyBytes := joinBytes([]byte(types.TxHeightKey), []byte(tagKeySeparator),
		[]byte{byte(result.Height >> 24), byte(result.Height >> 16), byte(result.Height >> 8), byte(result.Height)},
		[]byte(fmt.Sprintf("/%d%s",			
			result.Index,
			// Added to facilitate having the eventSeq in event keys
			// Otherwise queries break expecting 5 entries
			eventSeqSeparator+"0",
		)),
	)
	return keyBytes
}

and with natural indexing like this, we can use range index in matchRangeHeight method without getting all the data then sorting in memory like this

	fromKey := startKeyWithHeight(startKeyBz, lowerHeight)
	toKey := startKeyWithHeight(startKeyBz, upperHeight)

	// already have correct range
	it, err := txi.store.Iterator(fromKey, toKey)
	if err != nil {
		panic(err)
	}
	defer it.Close()

LOOP:
	for ; it.Valid(); it.Next() {

and it could return thousand blocks in milliseconds

Alternatively we can use txId as 8-byte array, assuming 1 block can not have more than 5000 transactions

txId := result.Height * 5000 + result.Index

then txId is also in natural ordering, we can search range directly by height * 5000 to search, and get back height = txId / 5000.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working rpc
Projects
None yet
Development

No branches or pull requests

10 participants
0