8000 Escape non-printable characters in JSON output by riemass · Pull Request #1949 · actor-framework/actor-framework · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Escape non-printable characters in JSON output #1949

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 29, 2024
Merged

Escape non-printable characters in JSON output #1949

merged 5 commits into from
Oct 29, 2024

Conversation

riemass
Copy link
Member
@riemass riemass commented Sep 15, 2024

Fixes #1937.

@riemass riemass requested a review from Neverlord September 15, 2024 20:29
Copy link
codecov bot commented Sep 15, 2024

Codecov Report

Attention: Patch coverage is 75.82418% with 22 lines in your changes missing coverage. Please review.

Project coverage is 71.12%. Comparing base (f7effcf) to head (a59eaab).
Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
libcaf_core/caf/detail/json.cpp 81.25% 13 Missing and 2 partials ⚠️
libcaf_core/caf/json_reader.cpp 25.00% 6 Missing ⚠️
libcaf_core/caf/detail/print.hpp 66.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1949      +/-   ##
==========================================
+ Coverage   70.94%   71.12%   +0.18%     
==========================================
  Files         658      658              
  Lines       29031    29072      +41     
  Branches     3154     3158       +4     
==========================================
+ Hits        20595    20677      +82     
+ Misses       6649     6602      -47     
- Partials     1787     1793       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Neverlord
Copy link
Member

@riemass what about the JOSN reader? Do we already support the \xxxx syntax there? I would like to see a roundtrip test. If that would require touching the parser and you don't want to touch it, that's OK too. I would put it onto my todo list in that case.

@riemass
Copy link
Member Author
riemass commented Sep 18, 2024

I didn't think of it, good catch. I'll update the PR once ready.

@riemass
Copy link
Member Author
riemass commented Sep 21, 2024

@Neverlord question regarding the \u escapes. According to the specs, every code point might be escaped and written as either
\uxxxx or \uxxxx\uxxxx, which corresponds to the way UTF16 code points are encoded.
How do we threat string encodings in CAF, should I translate the escaped code point to a UTF-8 variable length encoding every time, or do it somehow test what encoding the system uses? Do some systems still use UTF-16 by default?

@Neverlord
Copy link
Member

Using local encoding would go against the spec (see 8.1):

JSON text exchanged between systems that are not part of a closed ecosystem MUST be encoded using UTF-8 [RFC3629].

If a user wants to render a received JSON string in local encoding, they should to the (most likely lossy) conversion themselves.

(...) might be escaped and written as either \uxxxx or \uxxxx\uxxxx

The way I read it, the latter syntax is only used if a character exceeds a single 16 bit character in UTF-16. Either way, I think the output the API should produce needs to be UTF-8 according to the spec.

Comment on lines 121 to 122
10000
if (std::distance(i, e) > 6 && i[1] == '\\' && i[2] == 'u') {
i += 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the iterator be over the \ at this point? Then it would be i[0] holding \ and i[1] holding u.

If i[2] would hold u, then it should be i += 3, so something is off. Also wonder why no unit test catches this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The iterator is in fact pointing to the last character of the last sequence (before \). This is done so that the ++i below the switch statement can be executed properly (even if there is no second surrogate).
The read_ascii function uses ++c instead of c++ so that i+=2 is in fact correct. I see the confusion here, I'll refactor.

@riemass riemass requested a review from Neverlord October 27, 2024 18:13
@riemass riemass requested a review from Neverlord October 29, 2024 09:46
@Neverlord Neverlord merged commit 7a079ec into main Oct 29, 2024
10 checks passed
@Neverlord Neverlord deleted the issue/1937 branch October 29, 2024 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

print.hpp: Control characters not escaped correctly
2 participants
0