-
Notifications
You must be signed in to change notification settings - Fork 116
fix: memento format #927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: memento format #927
Conversation
WalkthroughA new PostgreSQL function, Changes
Sequence Diagram(s)sequenceDiagram
participant MigrationRunner
participant PostgreSQL
MigrationRunner->>PostgreSQL: Register and execute "Add json_compact function" migration
PostgreSQL->>PostgreSQL: Create or replace function public.json_compact(JSON, INTEGER)
Note right of PostgreSQL: Enables recursive, whitespace-free JSON compaction
MigrationRunner->>PostgreSQL: Execute batch update script on logs table
loop For each 1000-row batch
PostgreSQL->>PostgreSQL: Update memento column based on log type
PostgreSQL->>PostgreSQL: Notify progress via pg_notify
end
PostgreSQL->>PostgreSQL: Drop txs_view table if exists
Suggested reviewers
Poem
Note ⚡️ AI Code Reviews for VS Code, Cursor, WindsurfCodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback. Note ⚡️ Faster reviews with cachingCodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure ✨ Finishing Touches
🧪 Generate Unit Tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #927 +/- ##
==========================================
- Coverage 82.27% 82.27% -0.01%
==========================================
Files 141 141
Lines 7781 7786 +5
==========================================
+ Hits 6402 6406 +4
Misses 1055 1055
- Partials 324 325 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
154f988
to
53e9152
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (3)
internal/storage/bucket/migrations/37-fix-memento-format/up.sql (3)
1-7
: Add explicit PL/pgSQL language declaration and limit search_path scope
It’s clearer to specifyLANGUAGE plpgsql
on the$$
block and useSET LOCAL
so thesearch_path
change only applies within this block.Proposed diff:
do $$ declare _offset integer := 0; _batch_size integer := 1000; begin - set search_path = '{{ .Schema }}'; + set local search_path = '{{ .Schema }}'; ... end -$$; +$$ LANGUAGE plpgsql;
8-10
: Prefer EXISTS over COUNT(*) for empty-table check
For very large tables,SELECT COUNT(*)
scans all rows. You can short-circuit with:IF NOT EXISTS (SELECT 1 FROM logs LIMIT 1) THEN RETURN; END IF;This avoids a full table scan.
76-81
: NestedREPLACE
calls can be optimized
Chaining threeREPLACE
functions is less efficient and harder to maintain. Consider a singleregexp_replace
or leverage JSONB encoding options to escape<
,>
, and&
.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
internal/storage/bucket/migrations/37-fix-memento-format/notes.yaml
is excluded by!**/*.yaml
📒 Files selected for processing (2)
internal/storage/bucket/migrations/37-fix-memento-format/up.sql
(1 hunks)internal/storage/system/migrations.go
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/storage/system/migrations.go
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Dirty
- GitHub Check: Tests
🔇 Additional comments (5)
internal/storage/bucket/migrations/37-fix-memento-format/up.sql (5)
22-50
:NEW_TRANSACTION
JSON compact logic looks solid
The CTE andUPDATE ... SET memento = convert_to(public.json_compact(...))
correctly builds and compacts the JSON forNEW_TRANSACTION
entries.
51-58
:SET_METADATA
transformation is correct
The metadata is stripped of nulls and keys are ordered, then passed throughjson_compact
. Implementation matches expectations.
82-83
: Batch join condition is correct
UPDATE ... FROM data WHERE logs.seq = data.seq
ensures only the CTE rows are updated each iteration.
94-94
: Confirm the object type before dropping
Iftxs_view
is actually a view,DROP TABLE IF EXISTS txs_view;
will error. Verify its type and useDROP VIEW IF EXISTS txs_view;
if appropriate.
95-96
: PL/pgSQL block closure is correct
TheEND
and$$;
delimiters correctly terminate theDO
block.
return; | ||
end if; | ||
|
||
perform pg_notify('migrations-{{ .Schema }}', 'init: ' || (select count(*) from logs)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Transaction control & notifications will not work inside DO
PL/pgSQL DO
blocks disallow COMMIT
, and pg_notify
only emits on transaction commit. As written, neither in-flight commits nor per-batch notifications will succeed.
You must either:
- Move batch updates out of a single
DO
block into separate statements so you canCOMMIT
between batches, or - Remove the in-loop
COMMIT
/pg_notify
and accept a single transaction (and single notification) at the end.
Would you like assistance refactoring this migration into multiple external statements?
Also applies to: 85-92
🤖 Prompt for AI Agents
In internal/storage/bucket/migrations/37-fix-memento-format/up.sql at line 12
and lines 85-92, the use of pg_notify inside a DO block prevents notifications
from being emitted because DO blocks disallow COMMIT and pg_notify only triggers
on transaction commit. To fix this, refactor the migration by moving batch
updates and pg_notify calls out of the DO block into separate SQL statements so
that COMMIT can occur between batches, enabling notifications to be sent
properly. Alternatively, remove the in-loop COMMIT and pg_notify calls and
perform a single COMMIT and notification after all batches complete.
when logs.type = 'REVERTED_TRANSACTION' then public.json_compact(json_build_object( | ||
'revertedTransactionID', (logs.data->>'revertedTransactionID')::numeric, | ||
'transaction', json_strip_nulls(json_build_object( | ||
'postings', ( | ||
select to_json(array_agg(public.json_compact(json_build_object('source', v->>'source', 'destination', v->>'destination', 'amount', (v->>'amount')::numeric, 'asset', v->>'asset')))) | ||
from json_array_elements((logs.data->'transaction'->'postings')::json) v | ||
), | ||
'metadata', ( | ||
select json_object_agg(key, value order by key) | ||
from json_each_text((logs.data->'transaction'->'metadata')::json) | ||
), | ||
'timestamp', logs.data->'transaction'->'timestamp', | ||
'reference', logs.data->'transaction'->'reference', | ||
'id', logs.data->'transaction'->'id', | ||
'reverted', false | ||
)) | ||
))::varchar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
flag hard-coded to false in REVERTED_TRANSACTION
Likely a logic error: for reversed transactions, the JSON should indicate "reverted": true
.
Suggested change:
- 'reverted', false
+ 'reverted', true
🤖 Prompt for AI Agents
In internal/storage/bucket/migrations/37-fix-memento-format/up.sql around lines
59 to 75, the 'reverted' flag in the JSON object for 'REVERTED_TRANSACTION' logs
is hard-coded to false, which is incorrect. Update the 'reverted' field to true
to accurately reflect that these transactions are reversed.
loop | ||
with data as ( | ||
select * | ||
from logs | ||
order by seq | ||
offset _offset | ||
limit _batch_size | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Avoid OFFSET/LIMIT pagination on large tables
Offsetting rows becomes increasingly expensive as _offset
grows. Switch to keyset pagination using the last processed seq
, e.g.:
- OFFSET _offset
+ WHERE seq > _last_seq
ORDER BY seq
LIMIT _batch_size
This yields consistent performance.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In internal/storage/bucket/migrations/37-fix-memento-format/up.sql around lines
14 to 21, the current query uses OFFSET/LIMIT for pagination, which is
inefficient for large tables. Replace this with keyset pagination by using a
WHERE clause to filter rows with seq greater than the last processed seq value,
and order by seq with a LIMIT to fetch batches. This approach avoids scanning
skipped rows and improves performance.
53e9152
to
b1cc5c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (6)
internal/storage/bucket/migrations/34-fix-memento-format/up.sql (6)
1-6
: Sanitize schema name in search_path
Using{{ .Schema }}
directly inSET search_path
can lead to SQL injection if the schema name isn’t strictly validated. Consider safely quoting the identifier:- set search_path = '{{ .Schema }}'; + execute format('SET search_path = %I', '{{ .Schema }}');Please ensure that
.Schema
contains only valid identifier characters.
8-11
: Use EXISTS instead of COUNT for emptiness check
SELECT count(*)
scans the entirelogs
table; for a simple emptiness check,EXISTS
is more efficient:- IF (SELECT count(*) FROM logs) = 0 THEN - RETURN; - END IF; + IF NOT EXISTS (SELECT 1 FROM logs) THEN + RETURN; + END IF;
14-21
: Avoid OFFSET-based pagination for large tables
UsingOFFSET _offset
repeatedly can become very slow as_offset
grows. Consider keyset pagination or a cursor:- WITH data AS ( - SELECT * - FROM logs - ORDER BY seq - OFFSET _offset - LIMIT _batch_size - ) + FOR rec IN + SELECT * + FROM logs + WHERE seq > _last_seq + ORDER BY seq + LIMIT _batch_size + LOOP + -- handle rec.seq here + END LOOP;This scales much better for high-volume tables.
23-27
: Encapsulate nested REPLACE logic for readability
The triple-nestedREPLACE
calls can be hard to maintain. Consider extracting the character-escaping into a helper function (e.g.,escape_json_unicode(text)
) or using a singleREGEXP_REPLACE
with a mapping:
27-50
: Review JSON construction for NEW_TRANSACTION
Thejson_build_object
withjson_strip_nulls
and orderedjson_object_agg
looks correct and yields deterministic, compact JSON.
Consider switching tojsonb_build_object
andjsonb_strip_nulls
iflogs.data
isJSONB
—this would avoid repeated casts to::json
and may improve performance.
59-75
: Validate REVERTED_TRANSACTION case logic
This branch mirrors theNEW_TRANSACTION
logic and includesrevertedTransactionID
.
Note: Only the innertransaction
object is passed throughjson_strip_nulls
. If you need to remove outer nulls as well, consider applyingjson_strip_nulls
to the top-leveljson_build_object
.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (4)
internal/storage/bucket/migrations/34-fix-memento-format/notes.yaml
is excluded by!**/*.yaml
internal/storage/bucket/migrations/35-accounts-recreate-unique-index/notes.yaml
is excluded by!**/*.yaml
internal/storage/bucket/migrations/36-clean-database/notes.yaml
is excluded by!**/*.yaml
internal/storage/bucket/migrations/37-logs-async-hash-procedure/notes.yaml
is excluded by!**/*.yaml
📒 Files selected for processing (2)
internal/storage/bucket/migrations/34-fix-memento-format/up.sql
(1 hunks)internal/storage/system/migrations.go
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/storage/system/migrations.go
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Dirty
- GitHub Check: Tests
🔇 Additional comments (4)
internal/storage/bucket/migrations/34-fix-memento-format/up.sql (4)
12-12
: Notification of migration start
Good use ofpg_notify
to emit the initial row count for monitoring progress.
51-58
: Verify SET_METADATA transformation
ReconstructingSET_METADATA
with ordered key aggregation is correct.
However,(logs.data->>'targetId')::json
may yield a JSON string rather than a number or object. Please confirm the expectedtargetId
format.
76-83
: Handle non-targeted log types correctly
Falling back toconvert_from(logs.memento, 'utf-8')
preserves existing entries, then re-encodes them.
Ensure thatlogs.memento
is of typebytea
; otherwise,convert_from
/convert_to
will fail.
94-94
: Confirm object type fortxs_view
You’re usingDROP TABLE IF EXISTS txs_view;
, which only affects tables. Iftxs_view
is actually a view or materialized view, use:DROP VIEW IF EXISTS txs_view; -- or DROP MATERIALIZED VIEW IF EXISTS txs_view;
exit when not found; | ||
|
||
_offset = _offset + _batch_size; | ||
|
||
perform pg_notify('migrations-{{ .Schema }}', 'continue: ' || _batch_size); | ||
|
||
commit; | ||
end loop; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: COMMIT inside DO block is not allowed
PL/pgSQL anonymous blocks (DO $$ … $$
) cannot include COMMIT
; attempting to do so raises:
ERROR: COMMIT cannot be executed from a PL/pgSQL function or within a DO block
To perform batched commits, you must run the loop in client-side SQL (psql) or break the script into separate transactions outside the DO
block.
Please refactor to remove the in-block COMMIT
.
🤖 Prompt for AI Agents
In internal/storage/bucket/migrations/34-fix-memento-format/up.sql around lines
85 to 92, the COMMIT statement inside the DO block is invalid and causes an
error. Remove the COMMIT from the DO block and refactor the script to handle
batching and commits outside the DO block, either by running the loop in
client-side SQL or splitting the script into separate transactions to ensure
commits happen outside the PL/pgSQL anonymous block.
No description provided.