8000 Make exe fully Unicode · Issue #11214 · oracle/graal · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Make exe fully Unicode #11214

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sergeevabc opened this issue May 16, 2025 · 5 comments
Open

Make exe fully Unicode #11214

sergeevabc opened this issue May 16, 2025 · 5 comments
Assignees
Labels

Comments

@sergeevabc
Copy link
sergeevabc commented May 16, 2025

Windows 7 SP1 x64, GraalVM 24, MSVC 14.36.17.6, SDK 26100.

Source: crip.jar

$ chcp
Active code page: 866

$ java -jar crip.jar export pem -u=https://yahoo.com
Exported 3 certificates to C:\Проверка

$ chcp 1251

$ java -jar crip.jar export pem -u=https://yahoo.com
Exported 3 certificates to C:\Проверка

$ chcp 65001

$ java -jar crip.jar export pem -u=https://yahoo.com
Exported 3 certificates to C:\Проверка

Great! Now let's compile that jar and see what happens.

$ native-image --no-fallback -march=compatibility -Ob --enable-http --enable-https -jar crip.jar
(success)

$ chcp
Active code page: 866

$ crip.exe export pem -u=https://yahoo.com
Exported 3 certificates to C:\╨Я╤А╨╛╨▓╨╡╤А╨║╨░

$ chcp 1251

$ crip.exe export pem -u=https://yahoo.com
Exported 3 certificates to C:\Проверка
 
$ chcp 65001

$ crip.exe export pem -u=https://yahoo.com
Exported 3 certificates to C:\Проверка

The output is correct only when chcp 65001 is specified. What can be done to ensure that the output of the compiled exe is the same with any chcp, as is the case with Java? In other words, how can I force native-image to add support for all code pages to exe?

Related:

@selhagani
Copy link
Member

Hi @sergeevabc,

Thank you for reaching out to us about this.

Would you mind uploading a reproducer to a GitHub repository? That would allow me to test it on my end. Unfortunately, downloading JAR files directly is against our policy.

@sergeevabc
Copy link
Author
sergeevabc commented May 19, 2025

Would you mind uploading a reproducer to a GitHub repository?

A reproducer? You mean a minimal working example? But I am an ordinary user, not a developer, who can trim the specified app to the required state. @Hakky54, it's your toy and you have skills, could you post some helloworld-like short piece of code that does nothing but prints “Certificates exported to [PATH]” to show these Oracle guys with policies what happens with Unicode in native-image generated Windows binaries?

@Hakky54
Copy link
Hakky54 commented May 19, 2025

So @sergeevabc discovered this issue when using certificate ripper. This tool can extract server certificates from the CLI. To reproduce the issue follow these steps:

  1. Use Windows, any version is fine
  2. Make a directory in the root (C:/) directory, name it: Проверка
  3. Clone the project, https://github.com/Hakky54/certificate-ripper.git in the root directory (C:/)
  4. cd to certificate ripper
  5. run mvn clean install -DskipTests -Pnative-image
  6. In command line go to directory C:\Проверка
  7. run the following command: cmd /K ..\certificate-ripper\target\crip.exe export pem -u=https://google.com

Analyse the output.
The actual output will be something like this:

Exported 3 certificates to C:\Проверка

but the actual output is:

Exported 3 certificates to C:\????????

It seems that GraalVM has trouble displaying the Cyrillic script.
During our investigation, which you can find here: Hakky54/certificate-ripper#76 we found out that the default charset is windows-1252 when building the native image. I managed to change that to UTF-8, however it was still not able to display the correct text. Setting it to UTF-8 gave the following output:

Exported 3 certificates to C:\ðƒÐÇð¥ð▓ðÁÐÇð║ð░

With just Java it is showing the foldername correctly. With native compiled executable it is failing to show the correct foldername even when the default charset is set to UTF-8

@selhagani
Copy link
Member

Hi @Hakky54,

Thank you for sharing the reproducer with me.
To be sure that this is a GraalVM issue, I tested using both

Java(TM) SE Runtime Environment (build 24.0.1+9-30)
Java HotSpot(TM) 64-Bit Server VM (build 24.0.1+9-30, mixed mode, sharing)

java 24.0.1 2025-04-15
Java(TM) SE Runtime Environment Oracle GraalVM 24.0.1+9.1 (build 24.0.1+9-jvmci-b01)
Java HotSpot(TM) 64-Bit Server VM Oracle GraalVM 24.0.1+9.1 (build 24.0.1+9-jvmci-b01, mixed mode, sharing) 

While using windows 11 Pro

My findings were different from those you shared.
I found that when running the jar file without native image

PS C:\Проверка> java --version
java 24.0.1 2025-04-15
Java(TM) SE Runtime Environment (build 24.0.1+9-30)
Java HotSpot(TM) 64-Bit Server VM (build 24.0.1+9-30, mixed mode, sharing)
PS C:\Проверка> chcp
Active code page: 437
PS C:\Проверка> chcp 866
Active code page: 866
PS C:\Проверка> chcp
Active code page: 866
PS C:\Проверка> java -jar C:\Users\Soufiane\Desktop\supportIusses\11214\certificate-ripper\target\crip.jar export pem -u=https://yahoo.com/
Certificate ripper statistics:
- Certificate count
  * 3: https://yahoo.com/
         [cn=yahoocom_o=yahoo-holdings-inc_l=new-york_st=new-york_c=us]
         [cn=digicert-sha2-high-assurance-server-ca_ou=wwwdigicertcom_o=digicert-inc_c=us]
         [cn=digicert-high-assurance-ev-root-ca_ou=wwwdigicertcom_o=digicert-inc_c=us]
Extracted 3 certificates.
It has been exported to C:\Åα«óÑα¬á
PS C:\Проверка> chcp 1251
Active code page: 1251
PS C:\Проверка> java -jar C:\Users\Soufiane\Desktop\supportIusses\11214\certificate-ripper\target\crip.jar export pem -u=https://yahoo.com/
Certificate ripper statistics:
- Certificate count
  * 3: https://yahoo.com/
         [cn=yahoocom_o=yahoo-holdings-inc_l=new-york_st=new-york_c=us]
         [cn=digicert-sha2-high-assurance-server-ca_ou=wwwdigicertcom_o=digicert-inc_c=us]
         [cn=digicert-high-assurance-ev-root-ca_ou=wwwdigicertcom_o=digicert-inc_c=us]
Extracted 3 certificates.
It has been exported to C:\╧≡εΓσ≡Ωα
PS C:\Проверка> chcp 65001
Active code page: 65001
PS C:\Проверка> java -jar C:\Users\Soufiane\Desktop\supportIusses\11214\certificate-ripper\target\crip.jar export pem -u=https://yahoo.com/
Certificate ripper statistics:
- Certificate count
  * 3: https://yahoo.com/
         [cn=yahoocom_o=yahoo-holdings-inc_l=new-york_st=new-york_c=us]
         [cn=digicert-sha2-high-assurance-server-ca_ou=wwwdigicertcom_o=digicert-inc_c=us]
         [cn=digicert-high-assurance-ev-root-ca_ou=wwwdigicertcom_o=digicert-inc_c=us]
Extracted 3 certificates.
It has been exported to C:\Проверка

And when using native image I get similar results except for

C:\Проверка>chcp 65001
Active code page: 65001
C:\Проверка>cmd /K C:\Users\Soufiane\Desktop\supportIusses\11214\certificate-ripper\target\crip.exe export pem -u=https://yahoo.com/
Certificate ripper statistics:
- Certificate count
  * 3: https://yahoo.com/
         [cn=yahoocom_o=yahoo-holdings-inc_l=new-york_st=new-york_c=us]
         [cn=digicert-sha2-high-assurance-server-ca_ou=wwwdigicertcom_o=digicert-inc_c=us]
         [cn=digicert-high-assurance-ev-root-ca_ou=wwwdigicertcom_o=digicert-inc_c=us]
Extracted 3 certificates.
It has been exported to C:\Проверка

The fact that it didn't work on my end as you mentioned when using the jar file makes me doubt that this is a graalvm issue.
Any idea why we might be getting different results? Perhaps using different windows versions has something to do with it?

@Hakky54
Copy link
Hakky54 commented May 30, 2025

Thank you @selhagani for testing this on your side. I was suprised from your test results and did also some investigation on my side and it seems that if I set chcp 65001 on my side and then build the project to a native image it runs well and has no issues with UTF-8 characters such as Проверка. The previous executables which I distributed was probably not built while chcp 65001 was active, so that explains the issue. So I would say there is no issue GraalVM regading unicode support etc and it seems like I need to do adjustments on my side.

However, there is another topic which the OP mentioned, not quite sure whether that can be fixed/resolved. He mentioned this:

The output is correct only when chcp 65001 is specified. What can be done to ensure that the output of the compiled exe is the same with any chcp

So imagine I have built this app with chcp 65001 being actie in my terminal. The resulting binary can display the russion characters, without any issue. However someone has a different chcp being active, which is 866. When something in russian languange needs to be printed such as folder names it would be not human readbable right? So the end-user needs to set chcp to 65001 in their terminal to get the proper outut. Any idea whether it can somehow just work out of the box without the need for the end-user to change chcp?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants
0