-
Notifications
You must be signed in to change notification settings - Fork 476
Improve BCP-47 matching in name table #930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Unicode defines an algorithm to minimize BCP47 language identifiers, with supporting data in CLDR. For example, this will transform all of ICU has implemented this algorithm, so we could call ICU. Alternatively, we could write a pure Python implementation somewhere in fonttools.misc. I think we should do the latter, because the algorithm isn’t all that complicated, and ICU would add a rather hefty dependency to fonttools. I wouldn’t recommend blindly stripping language tags until we find a match in the old-style enums. The very point of BCP47 is to go beyond limited enums, and the reason for adding |
I agree a fontTools.bcp47 would be nice. |
Here’s two independent Python implementations. They both seem a bit heavy to me, but I’m putting the links here for reference. |
I see that varLib's fvar builder is not using that |
Agree, I think the current function is OK for now. It can always be improved later. |
The The If we un-comment that line and use the former instead of the latter, that means we would no longer add both sets of nameIDs for fvar axes names, but only one set (most often, the Windows ones). Would that be ok? @robmck-ms reported an issue with OSX requiring platformID=1 names for variable fonts: #683 Is it only the |
Uhm, why not try and see what works? |
In the context of #683 I think As it stands, |
I agree, let's make it add both by default |
I can make a PR. |
thanks! |
…elname' now that addMultilingualName method also adds mac names by default, we can use it in varLib instead of addName. The language identifiers are expected to be minimized, i.e. 8000 not contain default script/region subtags -- until we implement the minimizeSubtags algorithm from ICU/CLDR: fonttools#930
here a few useful links, in case we work on this someday. we want an equivalent of ICU
The algorithm is described here: The CLDR "likelySubtags" data can be found here: |
Here’s the upstream source for the likelySubtags data: https://www.unicode.org/repos/cldr/trunk/common/supplemental/likelySubtags.xml |
I think we want this in HarfBuzz as well. Maybe something @dscorbett can look into? |
And here’s the upstream source for aliasing deprecated tags ( https://www.unicode.org/repos/cldr/trunk/common/supplemental/supplementalMetadata.xml Note that ISO 639 and therefore also the IANA language subtag registry have a concept of macro-languages. When ICU sees a language subtag |
The likelySubtags map Latin in Iran to Turkmen. In reality, it would most probably be just Latin transcription of Persian. I assume other scripts / countries have same issue... |
But that's something for CLDR, not us... |
Bug reports welcome: https://unicode.org/cldr/trac/newticket |
Currently name table thinks "fa-IR" cannot be encoded in Windows names, because the table only has a "fa" entry. How should this be resolved? Keep dropping last item in the langcode while there's no match?
The text was updated successfully, but these errors were encountered: