8000 fix: memento format by gfyrag · Pull Request #927 · formancehq/ledger · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

fix: memento format #927

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 19, 2025
Merged

fix: memento format #927

merged 1 commit into from
May 19, 2025

Conversation

gfyrag
Copy link
Contributor
@gfyrag gfyrag commented May 19, 2025

No description provided.

@gfyrag gfyrag requested a review from a team as a code owner May 19, 2025 10:19
Copy link
coderabbitai bot commented May 19, 2025

Walkthrough

A new PostgreSQL function, json_compact, was introduced via a migration in the system's migrator. Additionally, a separate migration script was added to batch-update and sanitize JSON data in the logs table, reconstructing the memento field for specific log types and ensuring safe encoding. No changes were made to other logic or exported entities.

Changes

File(s) Change Summary
internal/storage/system/migrations.go Added a migration that registers and creates the json_compact PostgreSQL function for compacting JSON values.
internal/storage/bucket/migrations/34-fix-memento-format/up.sql Added a migration script to batch-update the logs table, reconstructing and sanitizing the memento column.

Sequence Diagram(s)

sequenceDiagram
    participant MigrationRunner
    participant PostgreSQL

    MigrationRunner->>PostgreSQL: Register and execute "Add json_compact function" migration
    PostgreSQL->>PostgreSQL: Create or replace function public.json_compact(JSON, INTEGER)
    Note right of PostgreSQL: Enables recursive, whitespace-free JSON compaction

    MigrationRunner->>PostgreSQL: Execute batch update script on logs table
    loop For each 1000-row batch
        PostgreSQL->>PostgreSQL: Update memento column based on log type
        PostgreSQL->>PostgreSQL: Notify progress via pg_notify
    end
    PostgreSQL->>PostgreSQL: Drop txs_view table if exists
Loading

Suggested reviewers

  • paul-nicolas
  • fguery

Poem

In the warren of data, where JSON once sprawled,
A compacting function hops in, whitespace forestalled.
Batches of logs, now neat and precise,
Mementos reformed, like carrots sliced nice.
With a twitch of the nose and a database cheer,
The code is now cleaner—hip hop, engineers! 🥕

Note

⚡️ AI Code Reviews for VS Code, Cursor, Windsurf

CodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback.
Learn more here.


Note

⚡️ Faster reviews with caching

CodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure Review - Disable Cache at either the organization or repository level. If you prefer to disable all data retention across your organization, simply turn off the Data Retention setting under your Organization Settings.
Enjoy the performance boost—your workflow just got faster.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate Unit Tests
  • Create PR with Unit Tests
  • Commit Unit Tests in branch fix/fix-memento-format
  • Post Copyable Unit Tests in Comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
codecov bot commented May 19, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.27%. Comparing base (cd0a182) to head (b1cc5c4).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #927      +/-   ##
==========================================
- Coverage   82.27%   82.27%   -0.01%     
==========================================
  Files         141      141              
  Lines        7781     7786       +5     
==========================================
+ Hits         6402     6406       +4     
  Misses       1055     1055              
- Partials      324      325       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@gfyrag gfyrag force-pushed the fix/fix-memento-format branch from 154f988 to 53e9152 Compare May 19, 2025 10:31
Copy link
@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
internal/storage/bucket/migrations/37-fix-memento-format/up.sql (3)

1-7: Add explicit PL/pgSQL language declaration and limit search_path scope
It’s clearer to specify LANGUAGE plpgsql on the $$ block and use SET LOCAL so the search_path change only applies within this block.

Proposed diff:

 do $$
  declare
  	_offset integer := 0;
  	_batch_size integer := 1000;
  begin
-	set search_path = '{{ .Schema }}';
+	set local search_path = '{{ .Schema }}';
  	...
 end
-$$;
+$$ LANGUAGE plpgsql;

8-10: Prefer EXISTS over COUNT(*) for empty-table check
For very large tables, SELECT COUNT(*) scans all rows. You can short-circuit with:

IF NOT EXISTS (SELECT 1 FROM logs LIMIT 1) THEN
  RETURN;
END IF;

This avoids a full table scan.


76-81: Nested REPLACE calls can be optimized
Chaining three REPLACE functions is less efficient and harder to maintain. Consider a single regexp_replace or leverage JSONB encoding options to escape <, >, and &.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 154f988 and 53e9152.

⛔ Files ignored due to path filters (1)
  • internal/storage/bucket/migrations/37-fix-memento-format/notes.yaml is excluded by !**/*.yaml
📒 Files selected for processing (2)
  • internal/storage/bucket/migrations/37-fix-memento-format/up.sql (1 hunks)
  • internal/storage/system/migrations.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/storage/system/migrations.go
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Dirty
  • GitHub Check: Tests
🔇 Additional comments (5)
internal/storage/bucket/migrations/37-fix-memento-format/up.sql (5)

22-50: NEW_TRANSACTION JSON compact logic looks solid
The CTE and UPDATE ... SET memento = convert_to(public.json_compact(...)) correctly builds and compacts the JSON for NEW_TRANSACTION entries.


51-58: SET_METADATA transformation is correct
The metadata is stripped of nulls and keys are ordered, then passed through json_compact. Implementation matches expectations.


82-83: Batch join condition is correct
UPDATE ... FROM data WHERE logs.seq = data.seq ensures only the CTE rows are updated each iteration.


94-94: Confirm the object type before dropping
If txs_view is actually a view, DROP TABLE IF EXISTS txs_view; will error. Verify its type and use DROP VIEW IF EXISTS txs_view; if appropriate.


95-96: PL/pgSQL block closure is correct
The END and $$; delimiters correctly terminate the DO block.

return;
end if;

perform pg_notify('migrations-{{ .Schema }}', 'init: ' || (select count(*) from logs));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Transaction control & notifications will not work inside DO
PL/pgSQL DO blocks disallow COMMIT, and pg_notify only emits on transaction commit. As written, neither in-flight commits nor per-batch notifications will succeed.

You must either:

  • Move batch updates out of a single DO block into separate statements so you can COMMIT between batches, or
  • Remove the in-loop COMMIT/pg_notify and accept a single transaction (and single notification) at the end.
    Would you like assistance refactoring this migration into multiple external statements?

Also applies to: 85-92

🤖 Prompt for AI Agents
In internal/storage/bucket/migrations/37-fix-memento-format/up.sql at line 12
and lines 85-92, the use of pg_notify inside a DO block prevents notifications
from being emitted because DO blocks disallow COMMIT and pg_notify only triggers
on transaction commit. To fix this, refactor the migration by moving batch
updates and pg_notify calls out of the DO block into separate SQL statements so
that COMMIT can occur between batches, enabling notifications to be sent
properly. Alternatively, remove the in-loop COMMIT and pg_notify calls and
perform a single COMMIT and notification after all batches complete.

Comment on lines 59 to 75
when logs.type = 'REVERTED_TRANSACTION' then public.json_compact(json_build_object(
'revertedTransactionID', (logs.data->>'revertedTransactionID')::numeric,
'transaction', json_strip_nulls(json_build_object(
'postings', (
select to_json(array_agg(public.json_compact(json_build_object('source', v->>'source', 'destination', v->>'destination', 'amount', (v->>'amount')::numeric, 'asset', v->>'asset'))))
from json_array_elements((logs.data->'transaction'->'postings')::json) v
),
'metadata', (
select json_object_agg(key, value order by key)
from json_each_text((logs.data->'transaction'->'metadata')::json)
),
'timestamp', logs.data->'transaction'->'timestamp',
'reference', logs.data->'transaction'->'reference',
'id', logs.data->'transaction'->'id',
'reverted', false
))
))::varchar
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

reverted flag hard-coded to false in REVERTED_TRANSACTION
Likely a logic error: for reversed transactions, the JSON should indicate "reverted": true.

Suggested change:

- 'reverted', false
+ 'reverted', true
🤖 Prompt for AI Agents
In internal/storage/bucket/migrations/37-fix-memento-format/up.sql around lines
59 to 75, the 'reverted' flag in the JSON object for 'REVERTED_TRANSACTION' logs
is hard-coded to false, which is incorrect. Update the 'reverted' field to true
to accurately reflect that these transactions are reversed.

Comment on lines 14 to 21
loop
with data as (
select *
from logs
order by seq
offset _offset
limit _batch_size
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid OFFSET/LIMIT pagination on large tables
Offsetting rows becomes increasingly expensive as _offset grows. Switch to keyset pagination using the last processed seq, e.g.:

- OFFSET _offset
+ WHERE seq > _last_seq
  ORDER BY seq
  LIMIT _batch_size

This yields consistent performance.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In internal/storage/bucket/migrations/37-fix-memento-format/up.sql around lines
14 to 21, the current query uses OFFSET/LIMIT for pagination, which is
inefficient for large tables. Replace this with keyset pagination by using a
WHERE clause to filter rows with seq greater than the last processed seq value,
and order by seq with a LIMIT to fetch batches. This approach avoids scanning
skipped rows and improves performance.

@gfyrag gfyrag force-pushed the fix/fix-memento-format branch from 53e9152 to b1cc5c4 Compare May 19, 2025 10:37
Copy link
@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
internal/storage/bucket/migrations/34-fix-memento-format/up.sql (6)

1-6: Sanitize schema name in search_path
Using {{ .Schema }} directly in SET search_path can lead to SQL injection if the schema name isn’t strictly validated. Consider safely quoting the identifier:

- set search_path = '{{ .Schema }}';
+ execute format('SET search_path = %I', '{{ .Schema }}');

Please ensure that .Schema contains only valid identifier characters.


8-11: Use EXISTS instead of COUNT for emptiness check
SELECT count(*) scans the entire logs table; for a simple emptiness check, EXISTS is more efficient:

- IF (SELECT count(*) FROM logs) = 0 THEN
-     RETURN;
- END IF;
+ IF NOT EXISTS (SELECT 1 FROM logs) THEN
+     RETURN;
+ END IF;

14-21: Avoid OFFSET-based pagination for large tables
Using OFFSET _offset repeatedly can become very slow as _offset grows. Consider keyset pagination or a cursor:

- WITH data AS (
-   SELECT * 
-   FROM logs
-   ORDER BY seq
-   OFFSET _offset
-   LIMIT _batch_size
- )
+ FOR rec IN
+   SELECT * 
+   FROM logs
+   WHERE seq > _last_seq
+   ORDER BY seq
+   LIMIT _batch_size
+ LOOP
+   -- handle rec.seq here
+ END LOOP;

This scales much better for high-volume tables.


23-27: Encapsulate nested REPLACE logic for readability
The triple-nested REPLACE calls can be hard to maintain. Consider extracting the character-escaping into a helper function (e.g., escape_json_unicode(text)) or using a single REGEXP_REPLACE with a mapping:


27-50: Review JSON construction for NEW_TRANSACTION
The json_build_object with json_strip_nulls and ordered json_object_agg looks correct and yields deterministic, compact JSON.
Consider switching to jsonb_build_object and jsonb_strip_nulls if logs.data is JSONB—this would avoid repeated casts to ::json and may improve performance.


59-75: Validate REVERTED_TRANSACTION case logic
This branch mirrors the NEW_TRANSACTION logic and includes revertedTransactionID.
Note: Only the inner transaction object is passed through json_strip_nulls. If you need to remove outer nulls as well, consider applying json_strip_nulls to the top-level json_build_object.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 53e9152 and b1cc5c4.

⛔ Files ignored due to path filters (4)
  • internal/storage/bucket/migrations/34-fix-memento-format/notes.yaml is excluded by !**/*.yaml
  • internal/storage/bucket/migrations/35-accounts-recreate-unique-index/notes.yaml is excluded by !**/*.yaml
  • internal/storage/bucket/migrations/36-clean-database/notes.yaml is excluded by !**/*.yaml
  • internal/storage/bucket/migrations/37-logs-async-hash-procedure/notes.yaml is excluded by !**/*.yaml
📒 Files selected for processing (2)
  • internal/storage/bucket/migrations/34-fix-memento-format/up.sql (1 hunks)
  • internal/storage/system/migrations.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/storage/system/migrations.go
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Dirty
  • GitHub Check: Tests
🔇 Additional comments (4)
internal/storage/bucket/migrations/34-fix-memento-format/up.sql (4)

12-12: Notification of migration start
Good use of pg_notify to emit the initial row count for monitoring progress.


51-58: Verify SET_METADATA transformation
Reconstructing SET_METADATA with ordered key aggregation is correct.
However, (logs.data->>'targetId')::json may yield a JSON string rather than a number or object. Please confirm the expected targetId format.


76-83: Handle non-targeted log types correctly
Falling back to convert_from(logs.memento, 'utf-8') preserves existing entries, then re-encodes them.
Ensure that logs.memento is of type bytea; otherwise, convert_from/convert_to will fail.


94-94: Confirm object type for txs_view
You’re using DROP TABLE IF EXISTS txs_view;, which only affects tables. If txs_view is actually a view or materialized view, use:

DROP VIEW IF EXISTS txs_view;
-- or
DROP MATERIALIZED VIEW IF EXISTS txs_view;

Comment on lines +85 to +92
exit when not found;

_offset = _offset + _batch_size;

perform pg_notify('migrations-{{ .Schema }}', 'continue: ' || _batch_size);

commit;
end loop;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Critical: COMMIT inside DO block is not allowed
PL/pgSQL anonymous blocks (DO $$ … $$) cannot include COMMIT; attempting to do so raises:

ERROR: COMMIT cannot be executed from a PL/pgSQL function or within a DO block

To perform batched commits, you must run the loop in client-side SQL (psql) or break the script into separate transactions outside the DO block.

Please refactor to remove the in-block COMMIT.

🤖 Prompt for AI Agents
In internal/storage/bucket/migrations/34-fix-memento-format/up.sql around lines
85 to 92, the COMMIT statement inside the DO block is invalid and causes an
error. Remove the COMMIT from the DO block and refactor the script to handle
batching and commits outside the DO block, either by running the loop in
client-side SQL or splitting the script into separate transactions to ensure
commits happen outside the PL/pgSQL anonymous block.

@gfyrag gfyrag added this pull request to the merge queue May 19, 2025
Merged via the queue into main with commit c752c1b May 19, 2025
10 checks passed
@gfyrag gfyrag deleted the fix/fix-memento-format branch May 19, 2025 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0