10000 Home · Mazurs/mt-words Wiki · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Rūdolfs Mazurs edited this page May 12, 2018 · 4 revisions

Overview of internals

Processing a translation unit

  1. Find variables, URLs etc., separating the string into translatable and non-translatable fragments
  2. Process each translatable fragment:
    1. Find and remove any accelerator symbols
    2. Find the words in the fragment (using \w)
    3. Detect the words unchanged in translation (e.g., "GNOME") [future: list of unchanged words to be translated anyway?]
    4. Replace each word with its translation:
      1. Determine the case and convert the word to lowercase
      2. Search for the word in the dictionary
        • if found, convert the translation to the source case and insert it
        • if not found, leave the source word unchanged and mark the string
      3. Insert the removed accelerator symbols back into the fragment
  3. Concatenate the fragments into a string
  4. Write the new translated string into the translation unit

Dictionary conceptual structure

+----+ ¹    +---------------+
|Word| ----<| Translation   |
+----*      +---------------+
            |is_problematic |
            |prevalence     |
            |needs_review   |
            |tags [optional]|
            +---------------+

A word can have many translations, which can be single words or phrases. Translations can have these properties:

  • is_problematic: indicates that using this translation may warrant sentence review
  • prevalence: indicates the likelihood that this particular translation is more appropriate than the alternatives (only useful, if there are alternative translations)
  • needs_review: indicates that the translation is new and not approved yet
  • tags: list of additional strings, that might convey useful information, maybe

Example in code:

{"tulkojums":[
    {transl:"translation",is_problematic:True,prevalence:10,tags:["A","B"]},
    {transl:"short one",is_problematic:False,prevalence:1},
]}

If is_problematic is omitted, it is assumed to be False. If prevalence is omitted when multiple translations are available, any one of those can be chosen. In this example "translation" will be picked, because it has higher prevalence.

This software depends on translate-toolkit libraries and should work if Ubuntu 18.04 has python3-translate package installed.

0