8000 Use Merkle trees in ACE by pct960 · Pull Request #253 · pgEdge/cli · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Use Merkle trees in ACE #253

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 69 commits into from
Jun 25, 2025
Merged

Use Merkle trees in ACE #253

merged 69 commits into from
Jun 25, 2025

Conversation

pct960
Copy link
Member
@pct960 pct960 commented Jan 24, 2025

This PR introduces the use of Merkle trees in table-diff with mtree-diff. Furthermore, it also adds several optimisations to regular table-diff to enable it to run on medium-sized tables. To use this feature:

  1. Initialise the Merkle tree objects using:

    ./pgedge ace mtree build <cluster> <schema>.<table> --max-cpu-ratio=1 --analyse=true --recreate-objects=true
    

    --analyse=true can be very time-consuming depending on the size of the table. It's recommended to manually perform an ANALYZE <table> in Postgres and avoid using this flag.

  2. Diff can be performed using:

    ./pgedge ace mtree diff <cluster> <schema>.<table> --max-cpu-ratio=1
    

    Performing an mtree-diff automatically updates the Merkle tree before performing the diff. It's also possible to separately update the trees using:

    ./pgedge ace mtree update <cluster> <schema>.<table> --rebalance=true
    

    --rebalance=true will perform splits and merges of blocks based on changes in the underlying keyspace. It's not necessary to perform rebalancing unless it's essential. The default update operation in mtree-diff takes care of block splits and updates but defers merges. This is to preserve parent-child relationships and avoid costly recursions due to merged blocks.

pct960 added 30 commits January 24, 2025 18:15
* Also minimise critical section in compare_checksums
* Merges happen only when --rebalance=true
is passed
@pct960 pct960 marked this pull request as ready for review June 10, 2025 20:44
@pct960 pct960 requested a review from mmols June 10, 2025 20:45
Copy link
Member
@mmols mmols left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionality worked well for me - although I exercised it on a very small table. Great work

My feedback is mostly around documentation / presentation.

@mmols mmols merged commit 040fce3 into main Jun 25, 2025
9 checks passed
@mmols mmols deleted the ace/merkle-trees branch June 25, 2025 20:17
mmols added a commit that referenced this pull request Jun 26, 2025
* Added build-mtree to build a merkle tree for a table

* Fix node_list population

* Also minimise critical section in compare_checksums

* Use same block ranges on all nodes

* Breakthrough: extending a merkle tree on new inserts works

* Sweeping optimisations in table-diff

* Use an optimised sql query instead of get_pkey_offsets

* Fix pkey offsets query

* Added feature to merge blocks on large deletes

* Handle blocks splits and merges inside ACE

* Fix some rebalancing bugs

* Integrate merkle trees into ACE cli

* Update mtree only splits blocks

* Merges happen only when --rebalance=true
is passed

* Adding mtree diff

* First version of mtree diff ready

* Add progress bar for splits and merges

* Use always trigger for tracking dirty blocks

* Update cryptography to address CVE

* No longer using OrderedSet for comparisons

* Handle mismatching tree levels

* Use generic trigger functions

* Initialise mtree objects once per DB node

* More cleanup

* Tweaks to get all base tests to pass


A3D4
* Use SQL composables; fix rebalance issues

* Unify pkey offset computations

* Add support for specifying a ranges file

* Remove node len check for mtree build

* Update leaf hash during mtree update

* Temp fix for range boundary issue

* Address boundary issues using a lookup table

* Use prepared statements

* Sunset batches option and async rerun

* Add block boundary and repset-diff tests

* Remove explicit stmt.close()

* Add support for composite keys; use stmt triggers

* SQL cleanup

* Add mtree init, teardown; fix block size usage

* Added merkle tree tests

* Use mogrify during repairs

* Prior use of executemany would internally call
execute multiple times, thereby making repairs slow
because of statement triggers from mtrees

* Fix write-ranges to use str by default

* Add support for non-numeric datatypes in tracking triggers

* Fix conn establishment in cleanup

* Address codacy issues

* Addressed more codacy issues

* String literal fix

* Codacy fix #4

* Codacy fix #5

* Use nosemgrep

* More nosemgreps

* Add mtree cli helptext

* Fix metadata task type

* Add metadata tracking for mtree modules

* Move error codes out of config file

* Use a separate consts file

* Revamp ACE CLI invocation

* Group mtree cmds into a sub-cmd
* Treat each of table-diff, -repair, etc. as a top-level sub-cmd

* Update tests

* Fix table names in mtree test

* Minor fixes

* Help texts mostly fixed

* fix fire.py helptext generation for mtree submodule

* generate help for ace mtree submodule

* Backward compatibility fixes

* Rename 'override-block-size' to 'skip-block-size-check' to free up -o
* Add back 'block_rows' as an alias of 'block_size'
* Add back 'behavior' in rerun and mark as deprecated
* Improve merkle tree help text

* Fix tests

* update generated helptext

* Ensure pgcrypto is present

* Fix spock diff

---------

Co-authored-by: Matthew Mols <matt@pgedge.com>
mmols added a commit that referenced this pull request Jun 27, 2025
* Added build-mtree to build a merkle tree for a table

* Fix node_list population

* Also minimise critical section in compare_checksums

* Use same block ranges on all nodes

* Breakthrough: extending a merkle tree on new inserts works

* Sweeping optimisations in table-diff

* Use an optimised sql query instead of get_pkey_offsets

* Fix pkey offsets query

* Added feature to merge blocks on large deletes

* Handle blocks splits and merges inside ACE

* Fix some rebalancing bugs

* Integrate merkle trees into ACE cli

* Update mtree only splits blocks

* Merges happen only when --rebalance=true
is passed

* Adding mtree diff

* First version of mtree diff ready

* Add progress bar for splits and merges

* Use always trigger for tracking dirty blocks

* Update cryptography to address CVE

* No longer using OrderedSet for comparisons

* Handle mismatching tree levels

* Use generic trigger functions

* Initialise mtree objects once per DB node

* More cleanup

* Tweaks to get all base tests to pass

* Use SQL composables; fix rebalance issues

* Unify pkey offset computations

* Add support for specifying a ranges file

* Remove node len check for mtree build

* Update leaf hash during mtree update

* Temp fix for range boundary issue

* Address boundary issues using a lookup table

* Use prepared statements

* Sunset batches option and async rerun

* Add block boundary and repset-diff tests

* Remove explicit stmt.close()

* Add support for composite keys; use stmt triggers

* SQL cleanup

* Add mtree init, teardown; fix block size usage

* Added merkle tree tests

* Use mogrify during repairs

* Prior use of executemany would internally call
execute multiple times, thereby making repairs slow
because of statement triggers from mtrees

* Fix write-ranges to use str by default

* Add support for non-numeric datatypes in tracking triggers

* Fix conn establishment in cleanup

* Address codacy issues

* Addressed more codacy issues

* String literal fix

* Codacy fix #4

* Codacy fix #5

* Use nosemgrep

* More nosemgreps

* Add mtree cli helptext

* Fix metadata task type

* Add metadata tracking for mtree modules

* Move error codes out of config file

* Use a separate consts file

* Revamp ACE CLI invocation

* Group mtree cmds into a sub-cmd
* Treat each of table-diff, -repair, etc. as a top-level sub-cmd

* Update tests

* Fix table names in mtree test

* Minor fixes

* Help texts mostly fixed

* fix fire.py helptext generation for mtree submodule

* generate help for ace mtree submodule

* Backward compatibility fixes

* Rename 'override-block-size' to 'skip-block-size-check' to free up -o
* Add back 'block_rows' as an alias of 'block_size'
* Add back 'behavior' in rerun and mark as deprecated
* Improve merkle tree help text

* Fix tests

* update generated helptext

* Ensure pgcrypto is present

* Fix spock diff

---------

Co-authored-by: Matthew Mols <matt@pgedge.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0