8000 pubsub/kvindexer:support for big numbers - v2 (backport #797) by mergify[bot] · Pull Request #840 · cometbft/cometbft · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

pubsub/kvindexer:support for big numbers - v2 (backport #797) #840

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
- `[kvindexer]` Added support for big integers and big floats in the kvindexer.
Breaking changes: function `Number` in package `libs/pubsub/query/syntax` changed its return value.
([\#797](https://github.com/cometbft/cometbft/pull/797))
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
- `[pubsub]` Added support for big integers and big floats in the pubsub event query system.
Breaking changes: function `Number` in package `libs/pubsub/query/syntax` changed its return value.
([\#797](https://github.com/cometbft/cometbft/pull/797))
3 changes: 3 additions & 0 deletions .changelog/unreleased/improvements/797-pubsub-float.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
- `[pubsub/kvindexer]` Numeric query conditions and event values are represented as big floats with default precision of 125.
Integers are read as "big ints" and represented with as many bits as they need when converting to floats.
([\#797](https://github.com/cometbft/cometbft/pull/797))
9 changes: 9 additions & 0 deletions docs/app-dev/indexing-transactions.md
Original file line number Diff line number Diff line change
Expand Up @@ -276,3 +276,12 @@ This behavior was fixed with CometBFT 0.34.26+. However, if the data was indexed
Tendermint Core and not re-indexed, that data will be queried as if all the attributes within a height
occurred within the same event.

## Event attribute value types

Users can use anything as an event value. However, if the event attribute value is a number, the following needs to be taken into account:

- Negative numbers will not be properly retrieved when querying the indexer.
- Event values are converted to big floats (from the `big/math` package). The precision of the floating point number is set to the bit length
of the integer it is supposed to represent, so that there is no loss of information due to insufficient precision. This was not present before CometBFT v0.38.x and all float values were ignored.
- As of CometBFT v0.38.x, queries can contain floating point numbers as well.
- Note that comparing to floats can be imprecise with a high number of decimals.
19 changes: 19 additions & 0 deletions docs/core/subscription.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,25 @@ You can also use tags, given you had included them into DeliverTx
response, to query transaction results. See [Indexing
transactions](../app-dev/indexing-transactions.md) for details.

## Query parameter and event type restrictions

While CometBFT imposes no restrictions on the application with regards to the type of
the event output, there are several considerations that need to be taken into account
when querying events with numeric values.

- Queries convert all numeric event values to `big.Float` , provided by `math/big`. Integers
are converted into a float with a precision equal to the number of bits needed
to represent this integer. This is done to avoid precision loss for big integers when they
are converted with the default precision (`64`).
- When comparing two values, if either one of them is a float, the other one will be represented
as a big float. Integers are again parsed as big floats with a precision equal to the number
of bits required to represent them.
- As with all floating point comparisons, comparing floats with decimal values can lead to imprecise
results.
- Queries cannot include negative numbers

Prior to version `v0.38.x`, floats were not supported as query parameters.

## ValidatorSetUpdates

When validator set changes, ValidatorSetUpdates event is published. The
Expand Down
119 changes: 119 additions & 0 deletions internal/indexer/indexer_utils.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
package indexer

import (
"fmt"
"math/big"

"github.com/cometbft/cometbft/state/indexer"
)

// If the actual event value is a float, we get the condition and parse it as a float
// to compare against
func compareFloat(op1 *big.Float, op2 interface{}) (int, bool, error) {
switch opVal := op2.(type) {
case *big.Int:
vF := new(big.Float)
vF.SetInt(opVal)
cmp := op1.Cmp(vF)
return cmp, false, nil

case *big.Float:
return op1.Cmp(opVal), true, nil
default:
return -1, false, fmt.Errorf("unable to parse arguments, bad type: %T", op2)
}
}

// If the event value we compare against the condition (op2) is an integer
// we convert the int to float with a precision equal to the number of bits
// needed to represent the integer to avoid rounding issues with floats
// where 100 would equal to 100.2 because 100.2 is rounded to 100, while 100.7
// would be rounded to 101.
func compareInt(op1 *big.Int, op2 interface{}) (int, bool, error) {

switch opVal := op2.(type) {
case *big.Int:
return op1.Cmp(opVal), false, nil
case *big.Float:
vF := new(big.Float)
vF.SetInt(op1)
return vF.Cmp(opVal), true, nil
default:
return -1, false, fmt.Errorf("unable to parse arguments, unexpected type: %T", op2)
}
}

func CheckBounds(ranges indexer.QueryRange, v interface{}) (bool, error) {
// These functions fetch the lower and upper bounds of the query
// It is expected that for x > 5, the value of lowerBound is 6.
// This is achieved by adding one to the actual lower bound.
// For a query of x < 5, the value of upper bound is 4.
// This is achieved by subtracting one from the actual upper bound.

// For integers this behavior will work. However, for floats, we cannot simply add/sub 1.
// Query :x < 5.5 ; x = 5 should match the query. If we subtracted one as for integers,
// the upperBound would be 4.5 and x would not match. Thus we do not subtract anything for
// floating point bounds.

// We can rewrite these functions to not add/sub 1 but the function handles also time arguments.
// To be sure we are not breaking existing queries that compare time, and as we are planning to replace
// the indexer in the future, we adapt the code here to handle floats as a special case.
lowerBound := ranges.LowerBoundValue()
upperBound := ranges.UpperBoundValue()

// *Explanation for the isFloat condition below.*
// In LowerBoundValue(), for floating points, we cannot simply add 1 due to the reasons explained in
// in the comment at the beginning. The same is true for subtracting one for UpperBoundValue().
// That means that for integers, if the condition is >=, cmp will be either 0 or 1
// ( cmp == -1 should always be false).
// But if the lowerBound is a float, we have not subtracted one, so returning a 0
// is correct only if ranges.IncludeLowerBound is true.
// example int: x < 100; upperBound = 99; if x.Cmp(99) == 0 the condition holds
// example float: x < 100.0; upperBound = 100.0; if x.Cmp(100) ==0 then returning x
// would be wrong.
switch vVal := v.(type) {
case *big.Int:
if lowerBound != nil {
cmp, isFloat, err := compareInt(vVal, lowerBound)
if err != nil {
return false, err
}
if cmp == -1 || (isFloat && cmp == 0 && !ranges.IncludeLowerBound) {
return false, err
}
}
if upperBound != nil {
cmp, isFloat, err := compareInt(vVal, upperBound)
if err != nil {
return false, err
}
if cmp == 1 || (isFloat && cmp == 0 && !ranges.IncludeUpperBound) {
return false, err
}
}

case *big.Float:
if lowerBound != nil {
cmp, isFloat, err := compareFloat(vVal, lowerBound)
if err != nil {
return false, err
}
if cmp == -1 || (cmp == 0 && isFloat && !ranges.IncludeLowerBound) {
return false, err
}
}
if upperBound != nil {
cmp, isFloat, err := compareFloat(vVal, upperBound)
if err != nil {
return false, err
}
if cmp == 1 || (cmp == 0 && isFloat && !ranges.IncludeUpperBound) {
return false, err
}
}

default:
return false, fmt.Errorf("invalid argument type in query: %T", v)
}
return true, nil
}
32 changes: 21 additions & 11 deletions libs/pubsub/query/query.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ package query

import (
"fmt"
"math/big"
"regexp"
"strconv"
"strings"
"time"

Expand Down Expand Up @@ -218,13 +218,23 @@ func compileCondition(cond syntax.Condition) (condition, error) {
return out, nil
}

// TODO(creachadair): The existing implementation allows anything number shaped
// to be treated as a number. This preserves the parts of that behavior we had
// tests for, but we should probably get rid of that.
// We use this regex to support queries of the form "8atom", "6.5stake",
// which are actively used in production.
// The regex takes care of removing the non-number suffix.
var extractNum = regexp.MustCompile(`^\d+(\.\d+)?`)

func parseNumber(s string) (float64, error) {
return strconv.ParseFloat(extractNum.FindString(s), 64)
func parseNumber(s string) (*big.Float, error) {
intVal := new(big.Int)
if _, ok := intVal.SetString(s, 10); !ok {
f, _, err := big.ParseFloat(extractNum.FindString(s), 10, 125, big.ToNearestEven)
if err != nil {
return nil, err
}
return f, err
}
f, _, err := big.ParseFloat(extractNum.FindString(s), 10, uint(intVal.BitLen()), big.ToNearestEven)
return f, err

}

// A map of operator ⇒ argtype ⇒ match-constructor.
Expand All @@ -248,7 +258,7 @@ var opTypeMap = map[syntax.Token]map[syntax.Token]func(interface{}) func(string)
syntax.TNumber: func(v interface{}) func(string) bool {
return func(s string) bool {
w, err := parseNumber(s)
return err == nil && w == v.(float64)
return err == nil && w.Cmp(v.(*big.Float)) == 0
}
},
syntax.TDate: func(v interface{}) func(string) bool {
Expand All @@ -268,7 +278,7 @@ var opTypeMap = map[syntax.Token]map[syntax.Token]func(interface{}) func(string)
syntax.TNumber: func(v interface{}) func(string) bool {
return func(s string) bool {
w, err := parseNumber(s)
return err == nil && w < v.(float64)
return err == nil && w.Cmp(v.(*big.Float)) < 0
}
},
syntax.TDate: func(v interface{}) func(string) bool {
Expand All @@ -288,7 +298,7 @@ var opTypeMap = map[syntax.Token]map[syntax.Token]func(interface{}) func(string)
syntax.TNumber: func(v interface{}) func(string) bool {
return func(s string) bool {
w, err := parseNumber(s)
return err == nil && w <= v.(float64)
return err == nil && w.Cmp(v.(*big.Float)) <= 0
}
},
syntax.TDate: func(v interface{}) func(string) bool {
Expand All @@ -308,7 +318,7 @@ var opTypeMap = map[syntax.Token]map[syntax.Token]func(interface{}) func(string)
syntax.TNumber: func(v interface{}) func(string) bool {
return func(s string) bool {
w, err := parseNumber(s)
return err == nil && w > v.(float64)
return err == nil && w.Cmp(v.(*big.Float)) > 0
}
},
syntax.TDate: func(v interface{}) func(string) bool {
Expand All @@ -328,7 +338,7 @@ var opTypeMap = map[syntax.Token]map[syntax.Token]func(interface{}) func(string)
syntax.TNumber: func(v interface{}) func(string) bool {
return func(s string) bool {
w, err := parseNumber(s)
return err == nil && w >= v.(float64)
return err == nil && w.Cmp(v.(*big.Float)) >= 0
}
},
syntax.TDate: func(v interface{}) func(string) bool {
Expand Down
78 changes: 78 additions & 0 deletions libs/pubsub/query/query_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,84 @@ var apiTypeEvents = []types.Event{
},
}

func TestBigNumbers(t *testing.T) {

apiBigNumTest := map[string][]string{
"big.value": {
"99999999999999999999",
},
"big2.value": {
"18446744073709551615", // max(uint64) == 18446744073709551615
},
"big.floatvalue": {
"99999999999999999999.10",
},
"big2.floatvalue": {
"18446744073709551615.6", // max(uint64) == 18446744073709551615
},
}

testCases := []struct {
s string
events map[string][]string
matches bool
}{

// Test cases for values that exceed the capacity if int64/float64.
{`big.value >= 99999999999999999999`,
apiBigNumTest,
true},
{`big.value > 99999999999999999998`,
apiBigNumTest,
true},
{`big2.value <= 18446744073709551615`,
apiBigNumTest, true},
{`big.floatvalue >= 99999999999999999999`,
apiBigNumTest,
true},
{`big.floatvalue > 99999999999999999998.10`,
apiBigNumTest,
true},
{`big.floatvalue > 99999999999999999998`,
apiBigNumTest,
true},
{`big2.floatvalue <= 18446744073709551615.6`,
apiBigNumTest,
true},
{`big2.floatvalue <= 18446744073709551615.6`,
apiBigNumTest,
true},
{`big2.floatvalue >= 18446744073709551615`,
apiBigNumTest,
true},
{`big2.floatvalue >= 12.5`,
apiBigNumTest,
true},
{`big.value >= 10`,
apiBigNumTest,
true},
}

for i, tc := range testCases {
t.Run(fmt.Sprintf("%02d", i+1), func(t *testing.T) {
c, err := query.New(tc.s)
if err != nil {
t.Fatalf("NewCompiled %#q: unexpected error: %v", tc.s, err)
}

got, err := c.Matches(tc.events)
if err != nil {
t.Errorf("Query: %#q\nInput: %+v\nMatches: got error %v",
tc.s, tc.events, err)
}
if got != tc.matches {
t.Errorf("Query: %#q\nInput: %+v\nMatches: got %v, want %v",
tc.s, tc.events, got, tc.matches)
}
})
}
}

func TestCompiledMatches(t *testing.T) {
var (
txDate = "2017-01-01"
Expand Down
35 changes: 26 additions & 9 deletions libs/pubsub/query/syntax/parser.go
< 7D4E td id="diff-7920dbb4cef13b985df274e6a628608a40b8a5af106a242bff6120fe4a6d15b6R80" data-line-number="80" class="blob-num blob-num-addition js-linkable-line-number js-blob-rnum">
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,7 @@ package syntax
import (
"fmt"
"io"
"math"
"strconv"
"math/big"
"strings"
"time"
)
Expand Down Expand Up @@ -68,17 +67,35 @@ func (a *Arg) String() string {
}
}

// Number returns the value of the argument text as a number, or a NaN if the
// Number returns the value of the argument text as a number, or nil if the
// text does not encode a valid number value.
func (a *Arg) Number() float64 {
func (a *Arg) Number() *big.Float {
if a == nil {
return -1
return nil
}
v, err := strconv.ParseFloat(a.text, 64)
if err == nil && v >= 0 {
return v
intVal := new(big.Int)
if _, ok := intVal.SetString(a.text, 10); !ok {
f, _, err := big.ParseFloat(a.text, 10, 125, big.ToNearestEven)
if err != nil {
return nil
}
return f
}
// If it is indeed a big integer, we make sure to convert it to a float with enough precision
// to represent all the bits
bitLen := uint(intVal.BitLen())
var f *big.Float
var err error
if bitLen <= 64 {
f, _, err = big.ParseFloat(a.text, 10, 0, big.ToNearestEven)
} else {
f, _, err = big.ParseFloat(a.text, 10, bitLen, big.ToNearestEven)
}
if err != nil {
return nil
}
return math.NaN()
return f

}

// Time returns the value of the argument text as a time, or the zero value if
Expand Down
Loading
0