UllmannBot

Talk page for 'bot (or pseudo-bot) run by User:Robert Ullmann

See user page for task description.

Eumhun

Latest comment: 18 years ago2 comments2 people in discussion

Bot uses parameter eumhun, but should be emhun. Example taken from 齋: {{ko-hanja|hangeul=재|eumhun=엄숙할 재, 집 재, 상복 자, 재계할 재, 공부방 재|rv=jae|mr=chae|y=cay}}

Either fix the template to use eumhun, or make the bot use emhun. FWIW, only eumhun is correct by the ROK MCT's 2000 Revised Romanization. Thanks – Dustsucker 22:56, 24 November 2006 (UTC)Reply

Um, I know the correct spelling ... it worked when I coded them. Must have then fixed the bot and somehow not managed to fix the template? *sigh* Robert Ullmann 23:23, 24 November 2006 (UTC)Reply

Trad/simplified characters

Latest comment: 17 years ago5 comments2 people in discussion

Currently your bot puts trad and simplified characters on separate lines:

貶: decrease, lower; censure, criticize
贬: decrease, lower; censure, criticize

I think it should put them on the same line, something like this:

貶 / 贬: decrease, lower; censure, criticize

Kappa 02:19, 8 December 2006 (UTC)Reply

That would be cool, if it had the information. Some of the entries have that information from Nanshu's attempt to interpret the Unihan database, but he also sometimes got it wrong. And it isn't that simple of course, look at this case: (at biān)

There is the simplified form, the shinjitai, a traditional form Z-axis variant (presumably less common) that isn't considered to have been simplified to the first form, and the traditional form that does correspond to the first. But I had to look all that up to confirm it, and I don't know how to tell the bot that much. I could probably get to:

or just assume that it ought to combine any consecutive characters that have the same definition:

边, 辺, 邉, 邊: edge, margin, side, border

although that might have some odd effects. (Besides having the shinjitai and the less common form in between.) Look at 唄 (at bài), what does "pathaka" mean? There are two different definitions. And the forms aren't always consecutive in UCS sort order.

It really ought to be something like:

邊／边: edge, margin, side, border
辺 (shinjitai): edge, margin, side, border
邉 (less common trad. form): edge, margin, side, border

But that is way out beyond what the bot can do (barring a complete analysis of some external DB). And it still doesn't really explain what the shinjitai is doing listed under Mandarin Pinyin. Sorting the common characters to the top would be useful. (If an entry exists, the bot leaves the lines in order, only adding missing definitions, then adds any missing characters at the end.) Robert Ullmann 05:04, 8 December 2006 (UTC)Reply

OK the bot doesn't have enough information ATM so some manual adjustments will be good. I'm inclined to remove shinjitai entries entirely. Kappa 05:20, 8 December 2006 (UTC)Reply

Consider (at biāo)

标: mark, symbol, label, sign; stand the bole of a tree
標: a mark, symbol, label, sign; standard

# [[标]]: [[mark]], [[symbol]], [[label]], [[sign]]; stand the bole of a tree
# [[標]]: a mark, symbol, label, sign; [[standard]]

The bot can know these are a sim/tra pair, the info is in the entries. But then what does it do? (;-) This is pretty common, as people have wikilinked the entries in various ways, fixed the definitions in one and not the other. (And the definitions are not always the same, especially when another, rarer, traditional character simplifies to the same form.) Still thinking about what might be done a bit better. Robert Ullmann 06:16, 8 December 2006 (UTC)Reply

Please look at biǎn (and biān). I've combined any that are together, with the same definition. This is better, and I don't think we can get to perfect ;-) Thanks for looking at these; it is not easy sometimes to get anyone to check on what you are doing. Robert Ullmann 08:27, 8 December 2006 (UTC)Reply

`{{t}}` again

Latest comment: 17 years ago2 comments2 people in discussion

Hi, I saw you are misusing your bot to experiment with your new versions of {t}. That’s ok with me, but it seems to miss some stuff: on die, the Catalan word is not linked to its section, neither is the Spanish. I suppose this is because the word for morir contains only Spanish, but eventually, it is to contain Catalan as well. I do not really know how to handle this. Maybe just like you do: leave out the ls, the robot will add it when eventually the Catalan information is entered. H. (talk) 15:37, 1 April 2007 (UTC)Reply

It should get its own bot name at some point ;-) Yes, it only adds the name for the section link when needed, and will update it when another section is added. Robert Ullmann 13:30, 30 August 2007 (UTC)Reply

`{{t}}` once again

Latest comment: 17 years ago2 comments2 people in discussion

Hi Robert, I thought the bot was going to introduce the t template as well, not only updating it. Are you working on this? Will it happen in the near future? Also, is there a tag to place to request an update run? H. (talk) 13:15, 30 August 2007 (UTC)Reply

This is the first time in a month or two I've gone back to this; there are a lot more t templates than when experimenting before. The update run should immediately (day or so) follow an XML dump.

Introducing the template has a lot of tricky cases; the simple ones are not hard, but a very large percentage are complex. The most serious problem is that a bot can't tell what is the FL word, what is some sort of gloss, and what else consistently, even when it is "obvious" to us humans. The existing program isn't intended to do that. Robert Ullmann 13:42, 30 August 2007 (UTC)Reply

translation-language

Latest comment: 17 years ago4 comments2 people in discussion

Hi Robert,

Why do you remove |lang=foo from calls to {{t}}?

—Ruakh_TALK 14:17, 30 August 2007 (UTC)Reply

I don't know exactly which entry you are referring to, but the bot code sets lang if and only if the template needs a #section reference. In the majority of cases lang= isn't needed, and users shouldn't worry about it either way. (If we could get a language specific version of the #language parser function, we'd lose this parameter completely.) I had called it ls, but Connel asked to change it to lang=; it would be better if it was ls or X or something so users wouldn't worry about it ;-). Robert Ullmann 14:25, 30 August 2007 (UTC)Reply

It was aircraft. And while it doesn't really make a huge difference for Hebrew — generally only Aramaic can appear above Hebrew, so it's not like you have to scroll past twenty language sections — it seems that in general, language-segment fragment identifiers are either beneficial or neutral. I'm not saying UllmannBot should add them, necessarily, but it certainly seems wrong to remove them when human editors add them. —Ruakh_TALK 16:08, 30 August 2007 (UTC)Reply

Um, the idea is that the parameter (whatever it is) is just automated by the bot; otherwise you get humans going to a lot of trouble adding it when not needed. The bot is replacing lang= in every t template it sees; but of course that replacement is often a no-op, not changing it. (then if no page text change, not saving of course) And when we get some parser function or something, it will be stripping all of them. Or there may be some place in between; I have an idea or two. Robert Ullmann 16:29, 30 August 2007 (UTC)Reply

Request

Latest comment: 17 years ago2 comments2 people in discussion

Since it's my understanding you correct language names, why not also substitute language templates? See drinking water. DAVilla 14:46, 4 September 2007 (UTC)Reply

AF does that. If you'd added the trans-top gloss and just left the language templates it would have fixed all of them. Robert Ullmann 15:03, 4 September 2007 (UTC)Reply

User:Tbot

Template {t} update task transferred to Tbot.

I has a favor

Latest comment: 17 years ago3 comments3 people in discussion

I'm not sure if this is entirely possible, but if it is it would be pretty awesome. What I want to do is go through everything that links to {{ro-nounform}} and change the templates to {{ro-noun-def}}. One of the problems is that I changed the parameters to simplify it and it's kindof a bitch to do by hand. I basically just need "gend=x|num=s" changed to "1=xs" and little stuff like that. If it's possible to do this, I'll give you a couple more details :) — [ ric | opiaterein ] — 19:28, 29 September 2007 (UTC)Reply

I can do this; I recently wrote and ran a bot that did something fairly similar for French. Just let me know the details. —Ruakh_TALK 20:03, 29 September 2007 (UTC)Reply

If it can be spec'd rigorously; it isn't hard; one of us can do it. Robert Ullmann 21:07, 29 September 2007 (UTC)Reply

Wiktionary:Index to templates/languages

Latest comment: 16 years ago1 comment1 person in discussion

I think we may need to run an update to Wiktionary:Index to templates/languages as it seems a little out of date. Regards --Williamsayers79 21:44, 4 February 2008 (UTC)Reply

Bot Strangeness from July 5 2007

Latest comment: 16 years ago2 comments2 people in discussion

While running some validation code against an offline copy of the wiktionary, I noticed that this bot made some questionable edits to zhì, zhí, zhī, zhǐ and zú. All of the edits were on July 5 2007, and resulted in duplication of most of the article. I thought you might want to know in case it's a bug you can find and fix. I've been away from the wiktionary too long (several years) to feel comfortable making edits right now, or I'd have just fixed them myself. By the way, I'm writing a substantially improved wiktionary module that should be compatible with pywikipedia. It checks for most of the things that you seem to detect. I'm writing it for my own purposes, but I'd be happy to share it with you. If you're interested, let me know. -- CoryCohen2 04:01, 28 March 2008 (UTC)Reply

I see, thanks for pointing that out; User:Jusjih had added the audio in the wrong (nonstandard) place; it should be in a Pronunciation section. The bot code found the Pinyin header and headword template correctly, but then it wasn't followed by the definition lines as expected. I'll fix those. You might be interested in looking at User:AutoFormat/code. Robert Ullmann 07:08, 28 March 2008 (UTC)Reply

Language code list

Latest comment: 14 years ago2 comments1 person in discussion

Would it be possible the next time you generate Wiktionary:Index to templates/languages to include the dialect and language-family codes in the "etyl:*" area? Could they be added as separate tables? Thanks. --Bequw → ¢ • τ 19:22, 11 November 2009 (UTC)Reply

Nevermind, did it myself. --Bequw → ¢ • τ 21:46, 19 January 2010 (UTC)Reply

lacking conjugation

Latest comment: 14 years ago2 comments2 people in discussion

Hi. I have to bug you again for creation of the following pages, for our conjobots. User:Rising Sun/German verbs needing conjugation, User:Rising Sun/Spanish verbs needing conjugation, User:Rising Sun/Latin verbs needing conjugation and User:Rising Sun/Italian verbs needing conjugation --Rising Sun talk? contributions 13:21, 16 May 2010 (UTC)Reply

Well don't now as he's been blocked. I've moved his subpage to User:Mglovesfun/French verbs needing conjugation, could you update it? Ideally I'd like an AWB loadable text file so I can 'fix' them that way. Mglovesfun (talk) 08:25, 18 September 2010 (UTC)Reply

Add topic

UllmannBot

Eumhun

Trad/simplified characters

{{t}} again

{{t}} once again

translation-language

Request

User:Tbot

I has a favor

Wiktionary:Index to templates/languages

Bot Strangeness from July 5 2007

Language code list

lacking conjugation

`{{t}}` again

`{{t}}` once again