8000 Encoding conversion in the Term field to standard UTF-8 · Issue #591 · LuteOrg/lute-v3 · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Encoding conversion in the Term field to standard UTF-8 #591

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
voothi opened this issue Mar 9, 2025 · 4 comments
Open

Encoding conversion in the Term field to standard UTF-8 #591

voothi opened this issue Mar 9, 2025 · 4 comments

Comments

@voothi
Copy link
voothi commented Mar 9, 2025

Description

I select a substring from the text ("per​ ​E​-​Mail​ ​gefragt").
It appears in the Term field.
I copy the text from this field ("per​ ​E​-​Mail​ ​gefragt").
I paste it as is into the search bar of GoldenDict-ng.
The program cannot find any matches in the database and translates incorrectly using Google Translate.
I type this manually into GoldenDict-ng.
Now it finds the match and the translation is correct.

The feeling that something is wrong with copying this field.

To Reproduce

Steps to reproduce the behavior, e.g.:

Described above.

Screenshots

Copied a substring from Lute / Term.
Image

Copied a substring from Yandex Translate.
Image

Extra software info, if not already included in the Description:

  • OS (e.g., iOS, windows): Windows 11. Local: UTF-8 (systemwide).
  • Browser (e.g., chrome, safari): Version 133.0.6943.142 (Official Build) (64-bit)
  • How you've installed Lute (Docker, python, source) Python 3.12. Installed with venv.
  • Version: Starting Lute version 3.10.0.
@voothi
Copy link
Author
voothi commented Mar 9, 2025

It would be nice if I could simply copy this line from the Term field to the clipboard using a keyboard shortcuts, in UTF-8 encoding format.

@jzohrab
Copy link
Collaborator
jzohrab commented Mar 9, 2025

Hi @voothi -- the issue here is that the term field actually contains zero-width spaces which separate the parsed tokens, so "per​ ​E​-​Mail​ ​gefragt" is actually "per/ /E-mail/ /gefragt", where "/" is the zero-width space.

Removing those zero-width spaces for the form is extremely tough! I spent a fair amount of time trying to handle this, but couldn't get something working easily. You wouldn't think it would be tough, but it potentially creates ambiguities for users with different languages, where the parsing of tokens is context-dependent.

If you hold down "shift" and then hold the mouse down to drag across multiple words, you can copy the selected text, eg.:

Image

this copies the text without the zero-width spaces, so if you were to paste it immediately, you'd get what you expect ("has multiple spaces", not "has/ /multiple/ /spaces")

@jzohrab
Copy link
Collaborator
jzohrab commented Mar 9, 2025

Related to #371, "Removing zero-width chars from term for forms"

@voothi
Copy link
Author
voothi commented Mar 9, 2025

Hello!
First of all, thank you for the wonderful program.
Thank you for your answers.
Until this moment, I did not know about the existence of the function of copying with the Shift key held down. It turned out to be very useful for me, but did not completely solve the problem.
I have a suggestion for improving this mechanism.
I have outlined my ideas in #593.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0