Property talk:P9070
identifier for an article in the Internet Encyclopedia of Ukraine
List of violations of this constraint: Database reports/Constraint violations/P9070#Unique value, SPARQL (every item), SPARQL (by value)
[A-Z]\\[A-Z]\\[A-Za-z0-9]+
”: value must be formatted using this pattern (PCRE syntax). (Help)List of violations of this constraint: Database reports/Constraint violations/P9070#Format, SPARQL
List of violations of this constraint: Database reports/Constraint violations/P9070#Single value, SPARQL
List of violations of this constraint: Database reports/Constraint violations/P9070#Scope, SPARQL
List of violations of this constraint: Database reports/Constraint violations/P9070#Entity types
This property is being used by: Please notify projects that use this property before big changes (renaming, deletion, merge with another property, etc.) |
|
Backslash decoding problem
editThere’s a problem with the parsing of ID’s to generate URLs. The formatter URL contains a backslash which does not get URL percent-encoded when the URL is generated. I presume this is a result of backslashes aren’t handled properly by simple regex replacement.
Here’s an example to illustrate the problem:
- The formatter URL is
http://www.encyclopediaofukraine.com/display.asp?linkpath=pages\$1
- Wikidata item: history of Ukraine (Q210701) has the ID
H\I\historyofukraine
- The generated link should be
http://www.encyclopediaofukraine.com/display.asp?linkpath=pages\
+H\I\historyofukraine
+.htm
- And it should be URL-encoded and sent as http://www.encyclopediaofukraine.com/display.asp?linkpath=pages%5CH%5CI%5CHistoryofUkraine.htm, with every backslash <\> changed to <%5C>
But what actually gets sent is http://www.encyclopediaofukraine.com/display.asp?linkpath=pages\H%5CI%5Chistoryofukraine with the first backslash in literal text. This happens to work for me, I presume because the server happens to accept it. But it is unintended, random and messy, and I believe it is technically a malformed URL.
I propose correcting this by broadening the definition of the ID so that all backslashes will get processed. Maybe it’s neater and safer to include the entire unit of the linkpath parameter, for example, pages\H\I\historyofukraine
.
Unfortunately, this requires changing the ID for over 300 items. I am glad to make the update and do the work. Is there a best sequence for the changeover?
Please let me know if this sounds good. —Michael Z. 20:41, 17 February 2021 (UTC)
- The encoding is being worked on at the moment, see phabricator:T271126. I'm not sure if it will change anything for backslashes. Ghouston (talk) 21:14, 17 February 2021 (UTC)
- I think it probably will, since the "rawurlencode" PHP function does seem to encode "all non-alphanumeric characters except -_.~": https://www.php.net/manual/en/function.rawurlencode.php. Ghouston (talk) 21:18, 17 February 2021 (UTC)
- Could the backslash be changed in the formatter? There seems to be nowhere to test, as Sandbox-External identifier (P2536) is stuck with a cached formatter, and formatter URLs are not converted into links at test.wikidata.org. Peter James (talk) 13:12, 18 February 2021 (UTC)