Case-insensitive ext4
Case-insensitive ext4
Posted Mar 28, 2019 8:39 UTC (Thu) by nim-nim (subscriber, #34454)In reply to: Case-insensitive ext4 by clugstj
Parent article: Case-insensitive ext4
So any shared filesystem will need to export to userspace the encoding used for each part of its tree (either a single encoding for everything, or separate encodings per subtree).
Casing is something else but once you get past the encoding point casing becomes a less harder to tackle.
Posted Mar 28, 2019 15:58 UTC (Thu)
by nybble41 (subscriber, #55106)
[Link] (1 responses)
Not much less. Casing rules depend not just on encoding but also locale, and while it may be practical to enforce a single universal encoding and normalization scheme you're definitely not going to get away with enforcing a single universal locale.
The logical way to handle normalization is to simply disallow non-normalized filenames. The kernel doesn't change the encoding or compare different normal forms, it just verifies that the names of new files are in a particular normal form and returns an error if they aren't. Since all names are already in the same normal form comparisons reduce to exact binary matches. The equivalent for case would be to disallow either lowercase or uppercase characters in filenames (assuming you could even clearly define what is "uppercase" or "lowercase"—it depends on the locale). People put up with that in the DOS era but I don't think it would be considered acceptable today.
The odds that encoding or normalization would be permitted to vary per-filesystem or per-subtree are negligible. Applications aren't prepared to deal with that, nor should they be expected to do so. Any conversions needed for shared filesystems should be handled at the lowest layers of the filesystem, between the storage or network and the kernel.
Posted Mar 29, 2019 10:52 UTC (Fri)
by nim-nim (subscriber, #34454)
[Link]
That's the part people object to, because they are used to the simplicity of pushing encoding problems somewhere else, with "filenames are streams of bytes". Which was not true even for original UNIX. Actual original Unix filename bytes were 7bit ASCII bytes and nothing else.
But 7bit ASCII is useless in a modern i18n world. So you need to record other pivot encoding(s) in filesystems¹.
¹ Record, not reproduce the mistake of original UNIX, that assumed there was a single encoding that would never evolve so there was no need to make it explicit; easy mistake to made in the simpler computer age they lived in; inexcusable mistake to make today.
Case-insensitive ext4
Case-insensitive ext4