Abci refactor #163

brennanjl · 2023-08-07T03:54:10Z

Contained in this branch is an example of how we can implement abci. I tried to design this with what @jchappelow discussed with us on Friday. There are a few different considerations here:

Logical separation of functional responsibilities (Usecases / modules)

I have designed a basic separation of concerns for different parts of our application functionality. Namely, our blockchain has two distinctly separate "applications" that do not logically involve each other:

Databases (creating, dropping, executing)
Validator joining

This example separate these into two different modules.

Atomic commits across data stores

Our application needs to ensure atomic commits across data stores across all applications. We have already designed a way of implementing this using a two-phase commit process. There are virtually unlimited number of different data stores that need to be accounted for:

The account store
The validator store
The master engine database
All deployed databases

These data stores can be used in any number of modules (for example, the account store is needed in both the validator module and the database module). This design can handle atomic commits across data stores without coupling the modules that they are used in.

Overview

I have broken down the app into 3 different layers:

ABCI
Modules
Everything Else

The ABCI has a couple jobs (more will certainly need to be added, e.g. snapshotting):

Taking in transactions, verifying them, and forwarding them to their respective module.
Signaling when a "Session" should begin tracking state changes, and when it should commit those state changes.

The modules implement specific business logic, and can be considered their own "stand-alone apps". For example, the database module handles all deployments/executions, as well as pricing and writing to the account store. The validator module would also be writing to the same account store, as well as to its own validator store.

Sessions run along-side these, and track + persist changesets when appropriate. A lot of sessions still needs to be implemented, but an example is actually implemented in pkg/engine/session.go (this example will need to be taken out of the engine and applied to the more generalized session package, but it should convey what needs to occur + how it can be encapsulated for any sort of data store).

Where to Look

There are a few new packages that implement this functionality:

pkg/abci: this implements a basic (not yet functional) ABCI app. We can certainly move this back into the other ABCI app, but I made this separate for now to make it easy to read.
pkg/modules: this contains a fully functional database modules. This is the logical equivalent of the Dataset Usecase, but it takes into account what was discussed by Jon on Friday. Unlike internal/usecases, it does not use internal/entity; we sort of have a weird thing going on with entity where virtually everything in internal/entity is duplicated somewhere in pkg. This has been replaced with using the direct package type.
pkg/sessions: this contains the logic for atomic sessions. At the time of writing this, I'm not actually done with this, but I am going to continue working on it tonight. A logical equivalent can be found in pkg/engine/session.go.

What needs to happen

I want us to have a discussion tomorrow once everyone has had a chance to look at this, so that we can decide whether or not it is a structure we want to further pursue. On top of that, there are a few things that we either still need to figure out / implement to get this "working".

Transactions: We have a really weird thing going on with transactions where they contain an arbitrary payload and the both the client as well as Kwild have to sort-of guess what the structure of the payload should look like. This might be unavoidable, but we should have a discussion on the best way to handle transaction payloads (I took an initial swing in pkg/abci/payloads.go).
Validator Module: We still need a Validator module that writes to a SQLite database. I don't think the lift on this would be too much; we essentially just need a ValidatorStore package and a Validators module.
Dependency Injection: None of the necessary dependency injection has been implemented. This should be pretty straight forward, so I am not too worried about this.\
Serialization: As discussed last week, we almost certainly need our own serialization techniques. I looked at Ethereum's RLP, and at a high level it seemed fine, but I haven't done enough analysis to determine whether or not it works for us. This serialization will be used for both transactions as well as their payloads.
Changeset Ordering: This contains a first pass on deterministic changeset ordering, but it is not tested, and does not work properly (it is not deterministic).

I'm sure I'm missing a few things here, but this should be enough to get us started down the right path. Please be very honest with any feedback you might have.

jchappelow · 2023-08-07T04:25:02Z

Really looking forward to digging into this first thing in the a.m.! What you've described in the PR description makes sense, from a quick scan, but will give it a through review.

jchappelow

The abci app structure looks pretty good. Just some initial comments so far. Please see my last comment about the ABCI concurrency model and the imminent changes to the consensus connection methods. I'm curious how we are going to adapt.

jchappelow · 2023-08-07T15:05:31Z

pkg/modules/databases/execution.go

+// ExecutionResponse is the response from any interaction that modifies state.
+type ExecutionResponse struct {
+	// Fee is the amount of tokens spent on the execution
+	Fee *big.Int
+}


Looks like you're using the equivalent type in pkg/tx.

Yeah, a bit of code duplication throughout here. This gets to the larger question of transactions/payloads (both their structure, encoding, and where they should go).

jchappelow · 2023-08-07T15:16:21Z

pkg/abci/abci.go

+	// payloadEncoder is the encoder that encodes and decodes payloads
+	payloadEncoder PayloadEncoder


The payloadEncoder is not set in the constructor, but I also don't see any impls of PayloadEncoder yet.
Since you now have DeliverTx instantiating the typed payload structs, I imagine that the decoder would simply be json.(Un)Marshal. Is that your thinking?

Also, I'm curious when encoding might be done by the abci app.

I know we discussed this in person but just for thoroughness:

I put that there to show that we might not want to use straight-up JSON encoding/decoding (can be hard to make backwards compatible).

As for when encoding needs to be performed: mostly in DeliverTx. The transaction itself is decoded from the ABCI type, and the payload then decoded from the transaction.

jchappelow · 2023-08-07T15:18:33Z

pkg/abci/payloads.go

+type PayloadValidatorApprove struct {
+	ValidatorToApprove string
+	ApprovedBy         string


Is the intent to give all these payload fields json: tags and use json.Unmarshal in the encoder?

Maybe; depending on how we decide to implement encoding/decoding. Ethereum's RLP looks pretty good, with two caveats:

It can't encode signed integers. Not a huge issue, but a little annoying.

It can't encode maps. We use maps in payloads pretty often (a lot of this should be changed, but there are some times where it is good, like in Extensions).

jchappelow · 2023-08-07T15:30:46Z

pkg/abci/abci.go

+	return abciTypes.ResponseDeliverTx{
+		Code:    abciTypes.CodeTypeOK,
+		GasUsed: res.Fee.Int64(),
+	}


Does cometBFT not need the other fields set, like Data and Events? I don't know what it does with this return, but @charithabandi had quite a bit of information set in the other version of the app.

Optional. Good to have, Cometbft has an indexer built based on these events' information. For example, if you create deploy, drop, execute, node join, leave events. You can do list all node joins, or list all txs where node1 approved a joinee etc. So, it's a good to have for lookups.

Also you can on the client side subscribe for events based on certain search params. Ex: Subscribe to all Node join events etc

jchappelow · 2023-08-07T15:35:04Z

pkg/abci/abci.go

+	ctx := context.Background()
+	err := a.committer.Commit(context.Background())


jchappelow · 2023-08-07T15:39:28Z

pkg/abci/abci.go

+	a.commitWaiter.Wait()
+	a.commitWaiter.Add(1)


We might want to look into another approach. Multiple waiters may both unblock at the same time and both hit Add(1) at the same time. (EDIT: Oh, apply in your atomic waiter is async. Hmm) However, I've started to look into how Application methods are called and what guarantees are provided about their concurrent use. Here are some points about the ABCI concurrency model:

it's a work in progress: Introduce relaxed ABCI concurrency models cometbft/cometbft#88

there are 4 "connections" from comet->abci

the method calls within one of those connections are synchronous

depending on the local client used (there's an "unsync" version), we get global sync or sync just within the scope of the individual connections

All four of BeginBlock, DeliverTx, EndBlock, and Commit are from the "consensus connection", so synchronous.

From https://docs.cometbft.com/v0.37/spec/abci/abci++_app_requirements#managing-the-application-state-and-related-topics:

Connection State

CometBFT maintains four concurrent ABCI++ connections, namely Consensus Connection, Mempool Connection, Info/Query Connection, and Snapshot Connection. It is common for an application to maintain a distinct copy of the state for each connection, which are synchronized upon Commit calls.

Concurrency

In principle, each of the four ABCI++ connections operates concurrently with one another. This means applications need to ensure access to state is thread safe. Both the default in-process ABCI client and the default Go ABCI server use a global lock to guard the handling of events across all connections, so they are not concurrent at all. This means whether your app is compiled in-process with CometBFT using the NewLocalClient, or run out-of-process using the SocketServer, ABCI messages from all connections are received in sequence, one at a time.

Closely related is that v0.38 introduces a FinalizeBlock's app method that "replaces the functionality provided previously by the combination of ABCI methods BeginBlock, DeliverTx, and EndBlock. FinalizeBlock's parameters are an aggregation of those in BeginBlock, DeliverTx, and EndBlock."

https://github.com/cometbft/cometbft/blob/main/docs/guides/go.md#133-finalizeblock

abci: implement finalize block tendermint/tendermint#9468

Maybe the three of us can chat about this in case you already have an understanding about all the above.

charithabandi · 2023-08-07T21:36:48Z

Modularization LGTM!

What happens if a crash occurs after writing the changesets to the WAL and before checkpointing them to the DB? Would you be truncating the current WAL? or append the tx changesets at the end with the replay block(keep in mind the begin and commit writes)?

brennanjl · 2023-08-08T03:45:50Z

Modularization LGTM!

What happens if a crash occurs after writing the changesets to the WAL and before checkpointing them to the DB? Would you be truncating the current WAL? or append the tx changesets at the end with the replay block(keep in mind the begin and commit writes)?

If a crash occurs after changesets are written to the wal, it will assume none of them are applied and reapply them when the committer starts up again (before any new blocks are mined). This is because changesets are idempotent (more or less; we can make them idempotent pretty easily).

brennanjl · 2023-08-08T03:50:19Z

I've pushed up a basic implementation of Committable for our SQL Client in pkg/sessions/sql-session/session.go (could probably use some renaming there, as well as potentially a new location).

I hope this provides a bit more clarity in the design decisions for Commitable.

I still need to think as bit more on the AtomicCommitter API; @jchappelow is it common to use channels in APIs?

I'm going to spend a bit more time tonight thinking these things through, we can discuss more tomorrow.

jchappelow · 2023-08-08T04:29:33Z

I still need to think as bit more on the AtomicCommitter API; @jchappelow is it common to use channels in APIs?

In Go, yes, if the method relates to an operation happening in another goroutine. The standard library sets an example in the context and time packages. A typical pattern is to provide a channel that is either closed (usually for broadcast) or sent a result or error. The latter case tends to correspond to promises used in other languages, whereby a promise type has a blocking receive method is returned.

What's best all depends on what the caller needs to achieve from the result of the async function. When it's just a gate, or some flow control, a channel generally fits the bill and is most expressive.

It's really fine with the extra Apply method, but in the current application logic, it's not clear why it needs it. (Does the abci app need to know that there are two steps to end a block, or even that there's a wal? Maybe. Just discussion point for me.)

brennanjl · 2023-08-08T04:50:42Z

It's really fine with the extra Apply method, but in the current application logic, it's not clear why it needs it. (Does the abci app need to know that there are two steps to end a block, or even that there's a wal? Maybe. Just discussion point for me.)

Gotcha. There isn't any reason why the caller explicitly needs access, but here's the breakdown (I'm pretty sure we are on the same page, but just to be clear):

Commit generates the app hash, which is necessary to continue in the consensus process for the next block. CometBFT is free to delete its current wal holding the previous block, since it is now committed to the db.
Apply actually applies the state changes to the underlying databases, which is needs to finish before the next BeginBlock. The reason this was done async is because it can take a decent amount of time (in particular, if there is some long-running read occurring on the db, which it then has to wait for).

I don't see a reason why it couldn't be handled with a channel.

brennanjl · 2023-08-08T05:11:20Z

Just added an implementation for the engine at pkg/engine/session.go.

None of this is bug tested yet, so I'm sure there is quite a bit of bugs. I'm mostly just leaving it there for anyone who is curious.

There's still quite a bit in atomic committer that needs to be cleaned up, so I wouldn't take what is there as Gospel.

jchappelow · 2023-08-08T05:25:52Z

Apply actually applies the state changes to the underlying databases, which is needs to finish before the next BeginBlock.

Mmm, thanks for restating that. One thing we want is the app hash on commit completion, and the other is to permit/unblock the next block on apply completion.

So perhaps a more natural approach than either an Apply method that app has to run in it's own goroutine, or a channel return, would be an onApplied callback provided to Commit. That is,

Commit(ctx context, onApplied func(error)) (appHash []byte, err error)

Really microscopic design nit in the grand scheme, but helpful for me to understand. Thanks for talking through.

brennanjl · 2023-08-08T19:13:48Z

So perhaps a more natural approach than either an Apply method that app has to run in it's own goroutine, or a channel return, would be an onApplied callback provided to Commit.

Implemented on the sessions module; I'm now beginning the full refactor, where I will include it in the abci

brennanjl · 2023-08-09T01:26:58Z

I've gone through and implemented most of what needs to be covered in the refactor. There are a few outstanding areas that are not building, as well a a few areas that are building but we need to rethink, but overall it should be enough to unblock work on CometBFT and the validator store.

What's changed

This isn't a comprehensive list, but this does cover the highlights of what has changed:

No More Usecases

Usecases is getting removed in favor of modules. They are conceptually similar, except modules have a more limited scope than the use cases. The previous datasets usecase was used as a catch-all for everything; now, the datasets module's API only handles deploying / dropping databases, executing actions, querying data / reading actions, and pricing.

No More Entity

We previously had a package internal/entity which essentially acted as our public API. There were two issues with this:

internal is only for things that are internal to our Kwil implementation, and our public API is the exact opposite of that.
there was a lot of code duplication between our entity and other parts of our system. This was because we were working around the fact that you are not supposed to import into pkg/ from internal/ (which we did a lot).

This issue hasn't actually been solved yet, but only sidestepped. I will cover this a bit more later.

Prioriting `pkg` over `internal`

One thing I realized while going through this is that when things were used in internal, we did not care much about coupling. This, for the most part, is probably the right way to think about internal. The issue was that we had a lot of stuff in internal (usecases, abci, public api, etc.). These have all been moved to pkg. We should really prioritize pkg over internal, and also make a concerted effort to not share across packages unless absolutely necessary.

internal should pretty much only be used for dependency injection and starting processes. Implementations should be done in pkg.

ABCI in `pkg`

ABCI has been moved into package. This is related to the last point, but I want to especially draw a point to it here since this is particularly in-flight. We should not be coupled to our consensus engine any more than we have to. Nowhere else in the system should packages be aware of ABCI, CometBFT, or anything related to that. Just last week, we were discussing potentially moving to Cosmos. This would've been pretty damn hard, since so much of our logic was / is coupled to ABCI.

No More ChainClient

We have gotten rid of pretty much everything related to the chain client. This removes the need to support interacting with other chains. In particular, this significantly reduces the amount of configuration required for Kwild, as well as reduces the public API.

What's Not Building / Won't Work

There are a few things that are not done / building, or that won't work until changes are added, so I will briefly outline them here:

Not Building

internal/app: this is where we do all of our dependency injection, as well as launch processes. This is still up in the air, but it is my top priority.
kwil-cli: the cli is pretty broken right now, however it is quite easy to fix. The only thing that really needs to be done is deleting all of the fund commands, as well as fixing up a few imports / configurations.
test: Both the acceptance test and the integration tests are not working. This is for several reasons, including: breaking changes to the Kwild Driver, ABCI not being fully implemented, and the removal of ChainSyncer

Not Working

The two main things that are "not working" (but technically building) are:
1. ABCI
2. Kwil Driver

The issues with the ABCI have been discussed, so I won't cover them here, except for the fact that a lot of things don't transfer 1-1. I haven't yet spent time making the distinction of what can and cannot be transferred, and plan on doing so later this week.
The Kwil Driver has also changed quite a bit, and (probably) doesn't work. Functionalities like approve/deposit have been removed, as well as there are a few things that we really need to make sure we think about:

Synchronous vs Asynchronous
The old Kwild driver was totally synchronous (since we didn't have a blockchain). This driver handles asynchronous calls by imposing 15 second waits after each one (making the API appear synchronous). This might be the best way to do it, but I have a feeling there must be a better way; in particular, in the acceptance tests, we deploy several different schemas, and execute several different actions. Waiting 15 seconds for each of them will be a real pain (we probably do upwards of 10 "execution" calls from the acceptance tests, so it would be quite long).
Node join / approve and validator keys
Right now, validators have separate key types. I don't actually think this is necessary (not sure, but it seems CometBFT provides an easily fulfillable interface PrivKey in github.com/cometbft/cometbft/crypto). If it is necessary, we should probably talk about how we differentiate between keypair types (if we need to at all). If there is a strict delineation between what is a validator key and not, we should talk a bit more about what each one can / cannot do.

What works but should be addressed

There's a number of stuff that now "works", but is probably not ideal. The biggest one is where we handle payloads for transactions, how we decode them, and how that relates to a less public implementation. I think the best example of this I can give is pkg/engine/types/schema.go. We had a separate instance of this in entity that was much more stable, and got converted to this before it called the package. The two were slightly different, and this was actually quite helpful; it allowed me to make changes freely on the engine in ways that made more sense, and I simply had to convert those back to a more stable structure.

There are more examples of this, and since we no longer have internal/entity, we have to rethink where we specify our public API. I have some thoughts on how to do this, but will think on them a bit more tonight and update you more tomorrow.

Fin

This sums up the changes that occurred in the refactor. I'm very open to feedback on this, and just wanted to get this out so we can see it and make a decision as to if we want to adopt it.

jchappelow · 2023-08-09T03:45:55Z

From a technical perspective, there is a lot to like in this refactor.

txsvc as a consumer rather than creator of a host of other subsystems
narrowed and minimal interfaces on consumers
logical move of the conceptually public entity package into a public pkg
reigning in usecases/dataset and => pkg/modules/datasets, as well as broken off session_mgr into a logically distinct sessions module
the high level ABCI app not being muddled by the details of what it means to apply the transactions, etc. It just needs to produce responses to cometBFT (a client of our app in their model) and apply state only when it's expected, atomically across all ~~modules~~ state stores.

It demands redoing certain things, most notably to me the ABCI application type, who's job is just to implement the abci/types.Application interface, and thus validator set + vote progress tracking (which I'm doing anyway) as that key information returned to comet. Together with some other separation of concerns issues, this is probably what really made this refactor a priority for CometBFT release.

The scope of the refactor gives me some anxiety, but it feels right and I'm already stuttering on a task because I don't want to do it in what I'm already thinking of as the "old framework".

The driver + integ test issue is one that I don't really have a handle on, primarily because I have yet to become familiar with the integ tests, but also because I'm just not sure if the delays that we are creating in these integ test are things that an application using one of the Kwil SDKs would really hide from the user, or at least make less unpleasant.

The line between public (top pkg) and internal is still fuzzy sometimes to me, but moving focus to the public pkg feels right at this point.

In terms of source control and eliminating some uncertainty for us developers, my current thinking is this:

we're decided about cometBFT, at least for the next major release, so my feeling is we should get kwil-cometbft-main onto main asap. The test breakage on main at this stage in the project (particularly in between releases with weeks to go), seems acceptable. We start moving forward, allowing integration of other features or fixes that don't need to build on this refactor.
@charithabandi and myself would probably build on this branch once it's rebased on an updated main, although I felt differently this morning. I'm already thinking about the consumer of the validator store being a well-encapsulated validator module rather than those tightly coupled types and logic in the application.
after the next release we can talk about approaches to major features that might both allow frequent integration and minimize breakage

…ogic regarding usecases, abci, and our public api into pkg created an example of how abci can be implemented with various data stores / modules implemented sessions, still need to add tests implemented sql client for atomic comitter added in closing changesets added implementation for engine changed up sessions a bit more, added more unit tests cleaned up kwild config, deleted old sql packages, deleted chain syncer cleaned up pkg/client refactored tx service refactored driver. this is certainly broken now, and should be re-thought. Our old driver was predicated on synchronous calls; the attempt here was pretty dang good, but I dont think we should be including time.Sleep in our driver just to make it mock the old synchronous version started on dependency injection minor changes to database module minor config fix deleted addrebook

sonarqubecloud · 2023-08-09T16:16:27Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
21 Code Smells

No Coverage information
2.2% Duplication

…ogic regarding usecases, abci, and our public api into pkg (#163) created an example of how abci can be implemented with various data stores / modules implemented sessions, still need to add tests implemented sql client for atomic comitter added in closing changesets added implementation for engine changed up sessions a bit more, added more unit tests cleaned up kwild config, deleted old sql packages, deleted chain syncer cleaned up pkg/client refactored tx service refactored driver. this is certainly broken now, and should be re-thought. Our old driver was predicated on synchronous calls; the attempt here was pretty dang good, but I dont think we should be including time.Sleep in our driver just to make it mock the old synchronous version started on dependency injection minor changes to database module minor config fix deleted addrebook

jchappelow reviewed Aug 7, 2023

View reviewed changes

brennanjl force-pushed the kwil-cometbft-main branch from d455dcf to d71ad02 Compare August 9, 2023 15:29

Base automatically changed from kwil-cometbft-main to main August 9, 2023 15:33

brennanjl force-pushed the main branch 2 times, most recently from 9ea07a6 to b7d0cd7 Compare August 9, 2023 15:54

brennanjl force-pushed the abci-refactor branch from d5a3782 to 7135da1 Compare August 9, 2023 16:11

brennanjl force-pushed the abci-refactor branch from 7135da1 to b24cd4a Compare August 9, 2023 16:15

brennanjl merged commit fe1c007 into main Aug 9, 2023

brennanjl deleted the abci-refactor branch August 9, 2023 16:16

jchappelow mentioned this pull request Aug 10, 2023

Comments/Concerns for concurrency (abci.go and sessions.go) #176

Closed

jchappelow added this to the v0.6.0 milestone Sep 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Abci refactor #163

Abci refactor #163

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		// payloadEncoder is the encoder that encodes and decodes payloads
		payloadEncoder PayloadEncoder

		ctx := context.Background()
		err := a.committer.Commit(context.Background())

Abci refactor #163

Abci refactor #163

Uh oh!

Conversation

Logical separation of functional responsibilities (Usecases / modules)

Atomic commits across data stores

Overview

Where to Look

What needs to happen

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

What's changed

No More Usecases

No More Entity

Prioriting pkg over internal

ABCI in pkg

No More ChainClient

What's Not Building / Won't Work

Not Building

Not Working

What works but should be addressed

Fin

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Prioriting `pkg` over `internal`

ABCI in `pkg`