User Details
- User Since
- Oct 15 2019, 4:02 PM (267 w, 3 d)
- Availability
- Available
- IRC Nick
- rzl
- LDAP User
- RLazarus
- MediaWiki User
- RLazarus (WMF) [ Global Accounts ]
Wed, Nov 27
Thanks @Clement_Goubert! Yeah, --sort-by=.metadata.creationTimestamp is my go-to for ordering.
Tue, Nov 26
Drive-by: The reason is that sudo doesn't preserve the original environment by default -- so in
sudo FOO=bar command
the command sees $FOO, but in
FOO=bar sudo command
it doesn't.
Thu, Nov 14
Fri, Nov 8
As currently implemented, if the shell dies (or the network is disconnected, or anything else interrupts the stream) then stdin is closed in the container.
Tue, Nov 5
This is now supported, and documented at https://wikitech.wikimedia.org/wiki/Maintenance_scripts#Input_from_a_file.
Thu, Oct 31
Oct 30 2024
(For the avoidance of doubt: We'll need some form of solution to this problem before turning off the mwmaint hosts, but I'm not working on it as a mwscript-k8s feature.)
Oct 29 2024
Oct 25 2024
Oct 24 2024
Sorry this happened. Unfortunately it's kind of working as intended -- not because it's supposed to be hard to kill a job when you want to kill it, but because the job is supposed to keep working after the mwscript-k8s launcher terminates. (Thus preventing the "oops, I forgot to start it in a tmux and now I'm stuck" scenario.)
The shape of this sounds right to me. Similarly we can have the mw-script helmfiles gate on the same conftool value for defense-in-depth, but also read it in the wrapper script and exit early with a friendlier error message.
Oct 22 2024
Note that while the announcement is only for "manual maintenance scripts", it's probably safe to assume invoking mwscript from other scripts that invoke mwscript falls under a similar umbrella.
This is ready to use, and documented (including the JSON output format) at https://wikitech.wikimedia.org/wiki/Maintenance_scripts#Shelling_out_to_mwscript-k8s. @EBernhardson please give this a try and let me know how it works for you! Happy to iterate as needed.
Oct 17 2024
Thanks -- that was in reference to:
Oct 16 2024
Oct 15 2024
Instead of
Oct 10 2024
This is now supported!
Ha, attachAccount.php specifically blocks this from working:
Oct 9 2024
A mwscript-k8s flag to log to SAL is on my to-do list -- I hadn't gotten around to filing a task, thanks.
Oct 8 2024
And if for whatever reason, we end up with a different namespace from the currently implemented as systemd timers recurring scripts
Oct 7 2024
This is MWScript.php behavior, and it's actually unchanged:
If this changed in mwscript-k8s, it's unintended (but definitely not impossible) -- I'll dig into it today and get back to you.
Oct 4 2024
Oct 2 2024
rzl@deploy1003:~$ mwscript-k8s --attach -- shell.php ⏳ Starting shell.php on Kubernetes as job mw-script.codfw.9m47rjcq ... ⏳ Waiting for the container to start... 🚀 Job is running. ℹ️ Expecting a prompt but don't see it? Due to a race condition, the beginning of the output might be missing. Try pressing enter. 📜 Attached to stdin/stdout: error: unable to upgrade connection: container mediawiki-9m47rjcq-app not found in pod mw-script.codfw.9m47rjcq-fw2t7_mw-script ☠️ Command failed with status 1: /usr/bin/kubectl attach --quiet job/mw-script.codfw.9m47rjcq --container mediawiki-9m47rjcq-app -it For logs (may not work) run: K8S_CLUSTER=codfw KUBECONFIG=/etc/kubernetes/mw-script-deploy-codfw.config kubectl logs -f job/mw-script.codfw.9m47rjcq mediawiki-9m47rjcq-app
Oct 1 2024
Sep 27 2024
Sep 26 2024
Thanks! In general you shouldn't need to do this, even if the job was a mistake. Kubernetes cleans up the job automatically a week after it terminates -- whether it completed or failed -- and there's nothing wrong with leaving it until then.
The image version is now copied from the mw-web deployment.
Sep 9 2024
Sep 5 2024
Sep 4 2024
Aug 29 2024
So, mwscript-k8s is still in medium-early development, some snags still expected and it shouldn't be anyone's primary workflow yet. There'll be wider announcements when it's ready for adoption (Soon TM but Not Yet TM); in the meantime, thanks for giving it a try and feedback is still welcome, just don't panic when it isn't ready to use full-time.
Aug 22 2024
Thanks both! Sorry for missing the earlier task.
Aug 20 2024
Aug 16 2024
See also T359130 for the cookbook work. We aren't as far as I expected we'd be, so we can revisit which of those steps for cronjobs-on-k8s need to be accomplished before the switchover, but I agree it's a good idea to add this to mw-cli-wrapper.py in the meantime and start using it.
Aug 15 2024
Aug 14 2024
This looks related to https://gerrit.wikimedia.org/r/c/operations/docker-images/production-images/+/1060867 -- at a guess, should
Jul 30 2024
Yeah, the usual standard, for any time you change something on a debug host, is a note in #wikimedia-operations to say something like "I'm grabbing mwdebug1002 to test an Apache config change" -- that ensures no one else is testing a conflicting change at the same time, such as (for example) a deployment during a deployment window. :)
From -serviceops IRC logs, I think this was during the time @Ottomata was working on an unrelated Apache config change, and testing it on a mwdebug host. That would explain why the tests failed on one host only.
Jul 26 2024
I haven't dug for logs, but workers were saturated and latency was up: https://grafana.wikimedia.org/goto/h67ix_uSg
Jul 22 2024
Someone in serviceops probably knows the answer to this but I don't, at least not confidently. Here's a rough stab:
Jul 17 2024
Jul 15 2024
Jul 11 2024
Jul 9 2024
Script output is visible through kubectl logs, and mwscript-k8s can be invoked with -f to immediately start tailing the script output (under the hood, it just invokes that kubectl command). If you don't launch with -f it prints out the kubectl command so you can copy and paste it.
Jul 4 2024
This is fixed, thanks again for testing!
Jul 3 2024
Thanks for the report! It's actually not because of the successful exit; the script handles that.
Jul 2 2024
rzl@deploy1002:~$ echo 'https://office.wikimedia.org/wiki/User:RLazarus_(WMF)' | ./mwscript-k8s --attach -- purgeList.php ⏳ Starting purgeList.php on Kubernetes... [snip] ⏳ Waiting for the container to start... 🚀 Job is running. 📜 Attaching to stdin/stdout: Purging 1 urls Done!
Disregard the above scap, I got too carried away with "never run helmfile across all mw deployments, use scap instead" but obviously that rule doesn't apply here. :)
Jul 1 2024
Jun 10 2024
May 1 2024
Apr 24 2024
Thanks. At present the controller monitors all namespaces, but ignores pods other than in mw-script. So if I were estimating memory usage I'd base it on the total number of pod events in the cluster, not just in the namespace.
Apr 17 2024
That sounds reasonable! Note for the future that helm diff has a --suppress-output-line-regex which does exactly what you'd like it to do, but it's not available in the version we're currently running.
Apr 16 2024
If we're really worried about that race condition, is it plausible to do this?
Apr 4 2024
Clinic duty SRE here -- I/F, can you start investigating this at the MTA end? Triaging this to High in case it's widespread, but feel free to decrease if it turns out it's not.
rzl@mwmaint1002:~$ ldapsearch -x cn=wmf | grep ospingou member: uid=ospingou,ou=people,dc=wikimedia,dc=org
Apr 3 2024
@AndyRussG Welcome back!