8000 No way to retrieving the "raw" filename without trying to probe its encoding by libarchive · Issue #2594 · libarchive/libarchive · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
No way to retrieving the "raw" filename without trying to probe its encoding by libarchive #2594
Open
@BLumia

Description

@BLumia

Request an API to retrieve a raw filename without trying to probe its encoding, e.g. archive_entry_pathname_raw().

The original discussion:

I'm also interested in retrieving the raw filename so I can throw the name at detect encoding feature of ICU or uchardet so I'll leave some comment here and see if we can get any progress.

Trying to detect and use a codepage is something the application program should do. If you insist, the library could also do that, but please make it optional so developers will have the ability to do this on their own.

End users might download ZIP files from the Internet, which might be created on computers with different locales, so when OEM codepage is used then the codepage could be anything. Applications like Bandizip offers the ability to change codepage from the UI so if the app detect the wrong codepage, user could easily change the codepage to something else by knowing where they downloaded the ZIP file from and it will then work as intended. If a "raw" pathname getter function is provided, then application developer could just simply store the raw data, and display the string based on the currently selected codepage/encoding.

As libarchive, the user don't know the underlying representation.

I think it's okay to provide the ability to let user process the archive without knowing the underlying representation, but it could be much more useful to enable advanced user make use of the underlying representation to do something better.

Introducing a way to retrieve raw filename so we don't need to use setlocale as mentioned in #587 (comment) to achieve that since setlocale might mess up the other non-libarchive part of the program.

I considered having a "utf8" and a "raw" pathname on each entry object.

For the underlying representation I think just "raw" will be fine. You can do the UTF-8 conversion when calling archive_entry_pathname_utf8 if preferred, and return NULL if the conversion fails. In that case, user would know something wrong happens and may be able to implement a fallback solution with the "raw" data.

Originally posted by @BLumia in #587

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0