[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Page MenuHomePhabricator

Upgrade s5 to Bullseye
Closed, ResolvedPublic

Description

  • dbstore1003 T299481
  • db2137
  • db2128
  • db2123 (master)
  • db2113
  • db2111
  • db2101 T299876
  • db2094
  • db2089
  • db2075
  • db1161 sanitarium master
  • db1154 sanitarium host
  • db1150 T299876
  • db1144
  • db1130 (master)
  • db1113
  • db1110
  • db1100
  • db1096
  • clouddb1021 T299480
  • clouddb1020 T299480
  • clouddb1016 T299480

Event Timeline

Marostegui triaged this task as Medium priority.
Marostegui added a project: DBA.
Marostegui moved this task from Triage to In progress on the DBA board.
Marostegui updated the task description. (Show Details)

Change 758288 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] s5 codfw hosts: Disable notifications

https://gerrit.wikimedia.org/r/758288

Change 758288 merged by Marostegui:

[operations/puppet@production] s5 codfw hosts: Disable notifications

https://gerrit.wikimedia.org/r/758288

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db2137.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db2128.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db2137.codfw.wmnet with OS bullseye completed:

  • db2137 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201310654_marostegui_32737_db2137.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db2113.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db2128.codfw.wmnet with OS bullseye completed:

  • db2128 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201310655_marostegui_477_db2128.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db2111.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db2075.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db2113.codfw.wmnet with OS bullseye completed:

  • db2113 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201310729_marostegui_23914_db2113.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db2111.codfw.wmnet with OS bullseye completed:

  • db2111 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201310732_marostegui_24247_db2111.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change 758419 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db2123: Disable notifications

https://gerrit.wikimedia.org/r/758419

Change 758419 merged by Marostegui:

[operations/puppet@production] db2123: Disable notifications

https://gerrit.wikimedia.org/r/758419

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db2123.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db2075.codfw.wmnet with OS bullseye completed:

  • db2075 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201310739_marostegui_26861_db2075.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

All codfw s5 core hosts upgraded to Bullseye.

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db2123.codfw.wmnet with OS bullseye completed:

  • db2123 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201310809_marostegui_4894_db2123.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change 758465 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1154: Disable notifications

https://gerrit.wikimedia.org/r/758465

Change 758465 merged by Marostegui:

[operations/puppet@production] db1154: Disable notifications

https://gerrit.wikimedia.org/r/758465

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1154.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1154.eqiad.wmnet with OS bullseye completed:

  • db1154 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201311306_marostegui_12113_db1154.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change 758719 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1110: Disable notifications

https://gerrit.wikimedia.org/r/758719

Mentioned in SAL (#wikimedia-operations) [2022-02-01T06:21:11Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1110 for reimage T300473', diff saved to https://phabricator.wikimedia.org/P19716 and previous config saved to /var/cache/conftool/dbconfig/20220201-062111-marostegui.json

Change 758719 merged by Marostegui:

[operations/puppet@production] db1110: Disable notifications

https://gerrit.wikimedia.org/r/758719

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1110.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1110.eqiad.wmnet with OS bullseye completed:

  • db1110 (FAIL)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202010624_marostegui_28187_db1110.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
    • Failed to get Netbox script results, try manually: https://netbox.wikimedia.org/api/extras/job-results/2404824/

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1110.eqiad.wmnet with OS bullseye executed with errors:

  • db1110 (FAIL)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202010624_marostegui_28187_db1110.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
    • Failed to get Netbox script results, try manually: https://netbox.wikimedia.org/api/extras/job-results/2404824/
    • The reimage failed, see the cookbook logs for the details

Change 758782 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1100: Disable notifications

https://gerrit.wikimedia.org/r/758782

Mentioned in SAL (#wikimedia-operations) [2022-02-01T08:10:51Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1100 for reimage T300473', diff saved to https://phabricator.wikimedia.org/P19747 and previous config saved to /var/cache/conftool/dbconfig/20220201-081050-marostegui.json

Change 758782 merged by Marostegui:

[operations/puppet@production] db1100: Disable notifications

https://gerrit.wikimedia.org/r/758782

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1100.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1100.eqiad.wmnet with OS bullseye executed with errors:

  • db1100 (FAIL)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1100.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1100.eqiad.wmnet with OS bullseye completed:

  • db1100 (WARN)
    • Downtimed on Icinga
    • Unable to disable Puppet, the host may have been unreachable
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202010833_marostegui_21361_db1100.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
Ladsgroup subscribed.

db1144 is not a backup source according to puppet, it's only a core multiinstance. I upgraded it to bullseye as part of T302950

Change 770875 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1166: Disable notifications

https://gerrit.wikimedia.org/r/770875

Change 770875 merged by Marostegui:

[operations/puppet@production] db1161: Disable notifications

https://gerrit.wikimedia.org/r/770875

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1161.eqiad.wmnet with OS bullseye

Only the master pending T303798

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1161.eqiad.wmnet with OS bullseye completed:

  • db1161 (WARN)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202203150813_marostegui_1181908_db1161.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change 775722 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1130: Disable notifications

https://gerrit.wikimedia.org/r/775722

Change 775722 merged by Marostegui:

[operations/puppet@production] db1130: Disable notifications

https://gerrit.wikimedia.org/r/775722

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1130.eqiad.wmnet with OS bullseye

Marostegui updated the task description. (Show Details)

Old master done

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1130.eqiad.wmnet with OS bullseye completed:

  • db1130 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202204040511_marostegui_2744843_db1130.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB