8000 Fix unicode text width calculation · Issue #1271 · ratatui/ratatui · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Fix unicode text width calculation #1271

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
SUPERCILEX opened this issue Aug 3, 2024 · 16 comments
Open

Fix unicode text width calculation #1271

SUPERCILEX opened this issue Aug 3, 2024 · 16 comments

Comments

@SUPERCILEX
Copy link
Contributor

See #1249 (comment) and #1270 (comment)

UnicodeWidthStr::width counts newlines as one unit which is not what we want for terminal rendering. We need to figure out how we want to fix this. Some options:

  1. Slap some one-off fixes in Line/Span. Clearly a bad idea as there are 45 usages of width so things wouldn't be consistent.
  2. Replace width with our own extension (maybe called renderable_width) which does the right thing for ratatui.
  3. Match rendering to whatever width says, e.g. render a space for a newline.

I don't really have a preference between 2 and 3, but I think most people prefer 2?

cc: @joshka @EdJoPaTo

@EdJoPaTo
Copy link
Contributor
EdJoPaTo commented Aug 4, 2024

Not sure where the problem is. Rendering ignores graphemes of width 0 and control characters. Only after that the width is relevant for rendering. At that point \n is gone and doesn't influence anything.

The only thing I see here are multiple places of truth, but that should be fixed by reducing the amount of render functions to ideally 1. Currently, set_stringn feels like the most basic one which should be used (indirectly) in all the other places. (For example #1259 starts to get rid of some of them.)

@joshka
Copy link
Member
joshka commented Aug 4, 2024

Option 2, but it we don't need an extension. Text/Line/Span::width are just methods. They should return the width that will be rendered, not the width including control characters.

@SUPERCILEX
Copy link
Contributor Author

Not sure where the problem is.

I guess my concern is that there a number of usages of UnicodeWidthStr that may or may not be correct. Some examples:

If we only touch Span, I'm worried it'll be a game of wack-a-mole with the other uses.

@joshka
Copy link
Member
joshka commented Aug 4, 2024

I guess my concern is that there a number of usages of UnicodeWidthStr that may or may not be correct. Some examples:

  1. Scrollbar - there's a possibility that it would make sense to make this actually be a Span instead of a string. That would make it possible to do colored scrollbars. Noone has asked for this yet though, and I think it might be difficult to change this easily.
  2. That table code is in the example - which doesn't really need to change for this (though we should make some docs about how unicode width calculates things differently from how they're displayed and add that to the website perhaps)
  3. The buffer change is likely correct, it's acting on symbols that have already been filtered to not contain newlines at that point. It's possible we should do the symbol counting earlier however, and we need to fix this up for better filtering (so we can do things like let ansi codes pass through correctly which is a larger issue than just calculating width).

The other three meaningful places in a simple search are:

  1. Paragraph / Reflow - this probably needs to be thought about a bit regardless
  2. Bar value labels - should probably be Span/Line instead of &str
  3. List highlight symbols - should probably also be Span / Line instead of &str

@EdJoPaTo
Copy link
Contributor
EdJoPaTo commented Aug 5, 2024

Off-topic:

  1. there's a possibility that it would make sense to make this actually be a Span instead of a string. That would make it possible to do colored scrollbars.

Scrollbar has a bunch of style options. If it should use Span, then remove the style options at the same time.

@joshka
Copy link
Member
joshka commented Aug 6, 2024

Off-topic:

  1. there's a possibility that it would make sense to make this actually be a Span instead of a string. That would make it possible to do colored scrollbars.

Scrollbar has a bunch of style options. If it should use Span, then remove the style options at the same time.

I think I'd lean to maybe not there, as these seem still useful as the scrollbar sets will likely still be unstyled, and it's easier to call 4 methods to style the track/thumb/begin/end than to create a whole new object for this. Especially if you just want one of them styled. But that's a weak opinion that would be contradicted by showing that the interface is easier with spans.

@SUPERCILEX
Copy link
Contributor Author

Ok, so to summarize everybody would be ok with just fixing Span::width?

@orhun
Copy link
Member
orhun commented Sep 25, 2024

Coming back to this after #1385 - related discussion;

To summarize, newlines are treated as 0 length in 0.1 and 1 in 0.2: unicode-rs/unicode-width@9eaafa5#diff-2ad10836ccce5ac2056d5679cc92449d9ff9094d4ff5c5803f65b5dd1d52ef19R426

However, that's not why our CI is currently failing. It's due to unicode-rs/unicode-width@afab363

Simple repro:

assert_eq!("👩".width(), 2); // Woman
assert_eq!("🔬".width(), 2); // Microscope
assert_eq!("👩‍🔬".width(), 4); // Woman scientist -> should be 4 but it expect 2

Zero width joiner (U+200D) is no longer taken into account it seems.


So, how should we proceed with this? I see two options:

  • Upgrade to 0.2 and go with the 2nd option originally suggested above for patching things.
  • Don't upgrade.

Thoughts?

@orhun orhun changed the title Discussion: how to fix text width calculation Fix unicode text width calculation Sep 25, 2024
@kdheepak
Copy link
Collaborator
- assert_eq!("👩‍🔬".width(), 4);
+ assert_eq!("👩‍🔬".width(), 2); 

This new behaviour is what we want, right?

I would vote for upgrading to v0.2 but pinning the dependency, and manually upgrading every now and then. Is it possible to pin a version like that? It would be nice to not have a minor version upgrade break unexpectedly because we can put the time and effort into solving the problem when we get a chance to plan for it.

@orhun
Copy link
Member
orhun commented Sep 25, 2024

This new behaviour is what we want, right?

I don't know if we want it, there is #925 and #75 which are related but I might still be lacking some context.

I would vote for upgrading to v0.2 but pinning the dependency, and manually upgrading every now and then.

Sounds reasonable, but note that we need to update our tests accordingly and this will probably be a breaking change.

Is it possible to pin a version like that? It would be nice to not have a minor version upgrade break unexpectedly because we can put the time and effort into solving the problem when we get a chance to plan for it.

Yup, I think it's possible.

@joshka
Copy link
Member
joshka commented Sep 25, 2024

The problem with a pin is that it means that any code that also uses unicode-width will need to also be pinned (I htink). This might lead to a situation where an app calls our code and some other code that calls unicode width expecting certain behavior. I might be wrong about that though. It's worth testing to see what happens there.

@EdJoPaTo
Copy link
Contributor

Unicode defines Emojis as width == 2. Joined Emojis are width 2. Most Terminals display them by printing the participating emojis rather than the joined one, so it's not a bug of ratatui. (Unicode allows for this case, but its not the one someone should aim for)

Also, unicode-width states:

Relying on any character producing a stable width in this crate is likely the sign of a bug.

Basically: When the implementation details of unicode-width are relevant, then the code is likely wrong.

@orhun
Copy link
Member
8000 orhun commented Sep 25, 2024

So does that give us to freedom to just follow unicode-width (thus the unicode standard) and if something happens we lay the blame on unicode 🤔 👀

@joshka
Copy link
Member
joshka commented Sep 27, 2024

This might lead to a situation where an app calls our code and some other code that calls unicode width expecting certain behavior. I might be wrong about that though. It's worth testing to see what happens there.

I was not wrong - pinning to a specific point revisions negatively affects our dependents who might make different choices about which point version to pin: zhiburt/tabled#423 (comment)

However this means that you can't use tabled if you have dependencies that also use unicode-width, like ratatui:

error: failed to select a version for `unicode-width`.                                                                                                                                                                                                                                                             
    ... required by package `papergrid v0.12.0`
    ... which satisfies dependency `papergrid = "^0.12"` of package `tabled v0.16.0`
    ... which satisfies dependency `tabled = "^0.16.0"`
versions that meet the requirements `=0.1.11` are: 0.1.11

all possible versions conflict with previously selected packages.

  previously selected package `unicode-width v0.1.13`
    ... which satisfies dependency `unicode-width = "^0.1.13"` of package `ratatui v0.28.0`
    ... which satisfies dependency `ratatui = "^0.28.0"`

Moving to 0.2.0 would make it possible to have two versions, but would still run into the same problem if we then pinned the patch version. I think this is probably reasonable though, and we should just make it clear why we're doing this. That perspective is that (for the docs):

It's reasonable for the unicode-width crate to change so that it more closely adheres to the unicode standard, but doing that in patch versions makes our usages of this crate subject to changes that will happen without notice. This can often affect our dependents and any apps they build using Ratatui, and often this means end users of those apps are the first impacted parties, not the app or library developers. So to avoid that we choose to pin unicode-width at a specific patch version.

So I also agree that we should move to 0.2.0 and pin at the latest patch version, we should manually check the changes in each patch version to understand their impact on rendering widgets.

Separately we should look at the usages of unicode-width in ratatui and make sure that we understand when this usage should be avoided because it will cause incorrect calculations.

@kdheepak
Copy link
Collaborator

If we vendor a version of unicode-width, that would solve the problem, right? I'm not saying we should do that though because it seems extreme. It would have been nice if there was an alternative to pinning that that basically did the equivalent of "vendor".

@joshka
Copy link
Member

I don't like the vendoring approach in general as it has its own set of problems. Not sure that we need to go too deep on the problems of pinning until they rear their head again. Likely it would be better worth our time to spend it on understanding the changes that impact how unicode-width affects rendering.

joshka added a commit that referenced this issue Oct 15, 2024
We pin unicode-width to avoid breaking applications when there are breaking changes in the library.

Discussion in #1271

Fixes: #1385

Co-authored-by: Josh McKinney <joshka@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
0