Description
Request an API to retrieve a raw filename without trying to probe its encoding, e.g. archive_entry_pathname_raw()
.
The original discussion:
I'm also interested in retrieving the raw filename so I can throw the name at detect encoding feature of ICU or uchardet so I'll leave some comment here and see if we can get any progress.
Trying to detect and use a codepage is something the application program should do. If you insist, the library could also do that, but please make it optional so developers will have the ability to do this on their own.
End users might download ZIP files from the Internet, which might be created on computers with different locales, so when OEM codepage is used then the codepage could be anything. Applications like Bandizip offers the ability to change codepage from the UI so if the app detect the wrong codepage, user could easily change the codepage to something else by knowing where they downloaded the ZIP file from and it will then work as intended. If a "raw" pathname getter function is provided, then application developer could just simply store the raw data, and display the string based on the currently selected codepage/encoding.
As libarchive, the user don't know the underlying representation.
I think it's okay to provide the ability to let user process the archive without knowing the underlying representation, but it could be much more useful to enable advanced user make use of the underlying representation to do something better.
Introducing a way to retrieve raw filename so we don't need to use
setlocale
as mentioned in #587 (comment) to achieve that sincesetlocale
might mess up the other non-libarchive part of the program.I considered having a "utf8" and a "raw" pathname on each entry object.
For the underlying representation I think just "raw" will be fine. You can do the UTF-8 conversion when calling
archive_entry_pathname_utf8
if preferred, and return NULL if the conversion fails. In that case, user would know something wrong happens and may be able to implement a fallback solution with the "raw" data.