[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Page MenuHomePhabricator

TestLinktrails.test_has_linktrail fails for hrwiki
Closed, ResolvedPublicBUG REPORT

Description

18:43:44 _____ TestLinktrails.test_has_linktrail (site=APISite('hr', 'wikipedia')) ______
18:43:44 
18:43:44 obj = APISite('hr', 'wikipedia')
18:43:44 
18:43:44     @wraps(fn)
18:43:44     def wrapper(obj: object, *, force=False) -> Any:
18:43:44         cache_name = '_' + fn.__name__
18:43:44         if force:
18:43:44             with suppress(AttributeError):
18:43:44                 delattr(obj, cache_name)
18:43:44         try:
18:43:44 >           return getattr(obj, cache_name)
18:43:44 
18:43:44 pywikibot/tools/__init__.py:771: 
18:43:44 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
18:43:44 
18:43:44 self = APISite('hr', 'wikipedia'), name = '_linktrail'
18:43:44 
18:43:44     def __getattr__(self, name: str):
18:43:44         """Delegate undefined methods calls to the Family object.
18:43:44     
18:43:44         .. versionchanged:: 9.0
18:43:44            Only delegate to public Family methods which have ``code`` as
18:43:44            first parameter.
18:43:44         """
18:43:44         if not name.startswith('_'):
18:43:44             obj = getattr(self.family, name, None)
18:43:44             if inspect.ismethod(obj):
18:43:44                 params = inspect.signature(obj).parameters
18:43:44                 if params:
18:43:44                     parameter = next(iter(params))
18:43:44                     if parameter == 'code':
18:43:44                         method = functools.partial(obj, self.code)
18:43:44                         if hasattr(obj, '__doc__'):
18:43:44                             method.__doc__ = obj.__doc__
18:43:44                         return method
18:43:44     
18:43:44 >       raise AttributeError(f'{type(self).__name__} instance has no '
18:43:44                              f'attribute {name!r}') from None
18:43:44 E       AttributeError: APISite instance has no attribute '_linktrail'. Did you mean: 'linktrail'?
18:43:44 
18:43:44 pywikibot/site/_basesite.py:217: AttributeError
18:43:44 
18:43:44 During handling of the above exception, another exception occurred:
18:43:44 
18:43:44 self = <tests.site_tests.TestLinktrails testMethod=test_has_linktrail>
18:43:44 
18:43:44     def test_has_linktrail(self):
18:43:44         """Verify that every code has a linktrail.
18:43:44     
18:43:44         Test all smallest wikis and the others randomly.
18:43:44         """
18:43:44         size = 20
18:43:44         small_wikis = self.site.family.languages_by_size[-size:]
18:43:44         great_wikis = self.site.family.languages_by_size[:-size]
18:43:44         great_wikis = random.sample(great_wikis, size)
18:43:44         for code in sorted(small_wikis + great_wikis):
18:43:44             site = pywikibot.Site(code, self.family)
18:43:44             with self.subTest(site=site):
18:43:44 >               self.assertIsInstance(site.linktrail(), str)
18:43:44 
18:43:44 tests/site_tests.py:1021: 
18:43:44 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
18:43:44 pywikibot/tools/__init__.py:773: in wrapper
18:43:44     val = fn(obj)
18:43:44 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
18:43:44 
18:43:44 self = APISite('hr', 'wikipedia')
18:43:44 
18:43:44     @cached
18:43:44     def linktrail(self) -> str:
18:43:44         """Build linktrail regex from siteinfo linktrail.
18:43:44     
18:43:44         Letters that can follow a wikilink and are regarded as part of
18:43:44         this link. This depends on the linktrail setting in LanguageXx.php
18:43:44     
18:43:44         .. versionadded:: 7.3
18:43:44     
18:43:44         :return: The linktrail regex.
18:43:44         """
18:43:44         unresolved_linktrails = {
18:43:44             'br': '(?:[a-zA-ZàâçéèêîôûäëïöüùñÇÉÂÊÎÔÛÄËÏÖÜÀÈÙÑ]'
18:43:44                   "|[cC]['’]h|C['’]H)*",
18:43:44             'ca': "(?:[a-zàèéíòóúç·ïü]|'(?!'))*",
18:43:44             'kaa': "(?:[a-zıʼ’“»]|'(?!'))*",
18:43:44         }
18:43:44         linktrail = self.siteinfo['general']['linktrail']
18:43:44         if linktrail == '/^()(.*)$/sD':  # empty linktrail
18:43:44             return ''
18:43:44     
18:43:44         match = re.search(r'\((?:\:\?|\?\:)?\[(?P<pattern>.+?)\]'
18:43:44                           r'(?P<letters>(\|.)*)\)?\+\)', linktrail)
18:43:44         if not match:
18:43:44             with suppress(KeyError):
18:43:44                 return unresolved_linktrails[self.code]
18:43:44 >           raise KeyError(f'"{self.code}": No linktrail pattern extracted '
18:43:44                            f'from "{linktrail}"')
18:43:44 E           KeyError: '"hr": No linktrail pattern extracted from "/^(\\p{L}+)(.*)$/sDu"'
18:43:44 
18:43:44 pywikibot/site/_apisite.py:880: KeyError

Source: https://integration.wikimedia.org/ci/job/pywikibot-core-tox-deeptest-py312/138/console

Event Timeline

The pattern was changed recently in 1081429 due to T360745. But I've no glue what this regex means: '/^(\\p{L}+)(.*)$/sDu' maintly that \\p{L} and how this can be converted to Python regex. But I found this: https://stackoverflow.com/questions/17595979/how-to-implement-pl-in-python-regex

Xqt triaged this task as High priority.
Xqt changed the subtype of this task from "Task" to "Bug Report".

I think you'll have better python - php compatibility if you simply switch from re.search to regex.search.

regex is backwards-compatible with the standard re

I see you already have it installed:
21:33:25 regex (/src/.tox/deeptest-py312/lib/python3.12/site-packages/regex) = 2.5.147

I think you'll have better python - php compatibility if you simply switch from re.search to regex.search.

regex is backwards-compatible with the standard re

I see you already have it installed:
21:33:25 regex (/src/.tox/deeptest-py312/lib/python3.12/site-packages/regex) = 2.5.147

Thank you for this hint. regex package is installed together with wikitextparser package but both are not mandatory for Pywikibot and adding it as dependeny would be a breaking change for the current release and I've not glue what happens on toolforge if Pywikibot needs it as a whole with next major release. It is also the case that this issue only occurs with hr language code.

I don't think we can get it exactly without regex. [^\W\d_] with re.U could be an approximation for \p{L} based on https://stackoverflow.com/a/32040699. (Read the comments.)

After a quick test, I think we could live with \w (with re.U) too, which still covers a great number of accented characters if this is a problem for you guys in Pywikibot.

This does include things like [[link]]42 → [[link|link42]] and [[link]]_trail → [[link|link_trail]], but that shouldn't be unexpected. There's no regular use case that I can think of with \ds and _s.

The reason why I chose \p{L} is because I wanted to see if it could be used for (many) other languages that need it. All these regex patterns were, more of less, c&p'd from one MessageXX.php to another a long time ago, though things are not really working. MessagesEn.php did include a \p{L&}, but that failed for some other reasons at the time.

So if you're not willing to experiment with \p{L} and if anyone reading this can give me a quick +2, I'll prepare a patch.

Change #1085592 had a related patch set uploaded (by Xqt; author: Xqt):

[pywikibot/core@master] [bugfix] extract linktrail for hr-wiki

https://gerrit.wikimedia.org/r/1085592

Change #1085592 merged by Xqt:

[pywikibot/core@master] [bugfix] extract linktrail for hr-wiki

https://gerrit.wikimedia.org/r/1085592