[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Page MenuHomePhabricator

lmata (Leo Mata)
SRE

Projects (11)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
May 14 2020, 7:26 PM (238 w, 1 d)
Availability
Available
IRC Nick
lmata
LDAP User
LMata
MediaWiki User
LMata (WMF) [ Global Accounts ]

Recent Activity

Tue, Nov 26

andrea.denisse awarded T343020: Converting MediaWiki Metrics to StatsLib a Love token.
Tue, Nov 26, 5:51 PM · Patch-For-Review, SRE Observability (FY2024/2025-Q2), Observability-Metrics

Wed, Nov 20

lmata updated subscribers of T371244: VictorOps paged batphone immediately rather than after 5m.

Case 3622388 created! on splunk support added @andrea.denisse @colewhite and @herron as watchers in case support responds while i'm away.

Wed, Nov 20, 6:52 PM · SRE Observability, SRE-OnFire, SRE
lmata assigned T380022: Alert in need of triage: JobUnavailable to tappof.
Wed, Nov 20, 3:13 PM · SRE Observability (FY2024/2025-Q2), sre-alert-triage
lmata edited projects for T379831: upgrade oauth2-proxy, added: Observability-Tracing; removed observability.
Wed, Nov 20, 3:09 PM · Observability-Tracing

Mon, Nov 18

lmata moved T372437: Harden corto systemd service from Inbox to Radar on the SRE Observability board.
Mon, Nov 18, 1:13 PM · SRE Observability, SRE-OnFire
lmata edited projects for T372437: Harden corto systemd service, added: SRE Observability; removed Incident Tooling.
Mon, Nov 18, 1:13 PM · SRE Observability, SRE-OnFire

Thu, Nov 14

lmata added a project to T355837: Add Prometheus support to statsd.js via mw.track(): MediaWiki-Engineering.
Thu, Nov 14, 4:57 PM · MW-1.44-notes (1.44.0-wmf.8; 2024-12-17), MediaWiki-Platform-Team, MediaWiki-Engineering, Patch-For-Review, Event-Platform, Data-Engineering, Grafana, MediaWiki-extensions-WikimediaEvents, Observability-Metrics

Wed, Nov 13

lmata added a project to T367211: Log unactionable errors to statslib/prometheus and set alert instead of using logstash: SRE Observability.
Wed, Nov 13, 3:09 PM · Growth-Team (Maintenance), SRE Observability, observability
lmata moved T376790: Split the permission to access Logstash from the cn=wmf and cn=nda groups from Inbox to Radar on the SRE Observability board.
Wed, Nov 13, 3:09 PM · SRE Observability, Infrastructure-Foundations, SRE
lmata moved T367211: Log unactionable errors to statslib/prometheus and set alert instead of using logstash from Inbox to Radar on the SRE Observability board.
Wed, Nov 13, 3:08 PM · Growth-Team (Maintenance), SRE Observability, observability

Sat, Nov 9

lmata added a comment to T369122: On-call batphone escalation configuration holidays FY2024/25.

overrides for the week in case they are reset with batphone change

Sat, Nov 9, 10:46 PM · SRE Observability (FY2024/2025-Q2)
lmata updated the task description for T369122: On-call batphone escalation configuration holidays FY2024/25.
Sat, Nov 9, 10:46 PM · SRE Observability (FY2024/2025-Q2)

Fri, Nov 8

lmata awarded T359293: Alert in need of triage: ProbeDown (instance centrallog1002:6514) a Like token.
Fri, Nov 8, 8:17 PM · SRE Observability, sre-alert-triage

Thu, Nov 7

lmata added a comment to T378837: PHP Logstash errors should distinguish between mobile and desktop domain.

Hi @Jdlrobson, We reviewed this task during this week's team meeting. It wasn't clear to us if/that MW is emitting the (client|desktop) domain data to the logs.

Thu, Nov 7, 11:35 PM · SRE Observability
lmata awarded T308467: implementing an incident response workflow automation tool for SRE a Love token.
Thu, Nov 7, 10:59 PM · Incident Tooling, SRE-OnFire

Nov 6 2024

lmata added a comment to T355837: Add Prometheus support to statsd.js via mw.track().

FWIW, with the event platform consolidation proposal, metrics owners do not have to choose. All of these metrics will go both to Prometheus and to the Data Lake automatically.

Nov 6 2024, 3:42 PM · MW-1.44-notes (1.44.0-wmf.8; 2024-12-17), MediaWiki-Platform-Team, MediaWiki-Engineering, Patch-For-Review, Event-Platform, Data-Engineering, Grafana, MediaWiki-extensions-WikimediaEvents, Observability-Metrics

Nov 5 2024

lmata edited projects for T351710: ossl rsyslog errors post-migration, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Nov 5 2024, 5:16 PM · SRE Observability (FY2024/2025-Q2), Observability-Logging, User-fgiunchedi, Patch-For-Review, SRE, observability
lmata edited projects for T368867: Investigate high rate of log indexing failures, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Nov 5 2024, 5:11 PM · SRE Observability (FY2024/2025-Q2), Observability-Logging
lmata edited projects for T320594: Flapping probes for centrallog2001, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Nov 5 2024, 5:11 PM · SRE Observability (FY2024/2025-Q2), Observability-Logging
lmata edited projects for T321808: Port all Icinga checks to Prometheus/Alertmanager, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Nov 5 2024, 5:11 PM · SRE Observability (FY2024/2025-Q2), Observability-Alerting
lmata edited projects for T352756: Gap in metrics rendered from Thanos Rules, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Nov 5 2024, 5:11 PM · SRE Observability (FY2024/2025-Q2), Observability-Metrics, Machine-Learning-Team
lmata edited projects for T353457: Karma UI shows duplicate alerts, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Nov 5 2024, 5:11 PM · SRE Observability (FY2024/2025-Q2), User-fgiunchedi, cloud-services-team, Observability-Alerting
lmata edited projects for T358037: Migrate wikibase.repo.dispatchChangesJob* to statslib, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Nov 5 2024, 5:11 PM · SRE Observability (FY2024/2025-Q2), MediaWiki-Platform-Team, Patch-For-Review, Observability-Metrics
lmata edited projects for T354255: Alert in need of triage: AlertLintProblem (instance localhost:9123), added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Nov 5 2024, 5:11 PM · SRE Observability (FY2024/2025-Q2), sre-alert-triage
lmata edited projects for T266886: Augment NEL reports with a computed timestamp-of-generation, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Nov 5 2024, 5:11 PM · SRE Observability (FY2024/2025-Q2), Observability-Logging, Data-Engineering-Icebox, Analytics
lmata edited projects for T348796: MediaWiki: Define new metric type - Histogram, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Nov 5 2024, 5:11 PM · SRE Observability (FY2024/2025-Q2), Patch-For-Review, Observability-Metrics, MediaWiki-libs-Stats
lmata edited projects for T359385: Migrate MediaWiki.arclamp to statslib, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Nov 5 2024, 5:11 PM · SRE Observability (FY2024/2025-Q2), Patch-For-Review, Observability-Metrics
lmata edited projects for T359267: Migrate MediaWiki.timing to statslib, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Nov 5 2024, 5:11 PM · SRE Observability (FY2024/2025-Q2), MW-1.43-notes (1.43.0-wmf.25; 2024-10-01), Observability-Metrics
lmata edited projects for T364240: Investigate making TimingMetric unit-aware, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Nov 5 2024, 5:11 PM · SRE Observability (FY2024/2025-Q2), MW-1.43-notes (1.43.0-wmf.17; 2024-08-06), Patch-For-Review, MediaWiki-libs-Stats
lmata edited projects for T368088: upgrade prometheus-ipmi-exporter to 1.8.0, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Nov 5 2024, 5:11 PM · SRE Observability (FY2024/2025-Q2), Patch-For-Review, Infrastructure-Foundations, Packaging
lmata edited projects for T367370: Shift frack alerting to use alertmanager instead of icinga, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Nov 5 2024, 5:11 PM · SRE Observability (FY2024/2025-Q2), Fundraising-Backlog, fundraising-tech-ops
lmata edited projects for T369854: Occasional SLOMetricAbsent alerts, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Nov 5 2024, 5:11 PM · SRE Observability (FY2024/2025-Q2), Observability-Metrics
lmata edited projects for T370153: Move kafka-mirror Prometheus-based alerts from Icinga to alerts.git, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Nov 5 2024, 5:11 PM · SRE Observability (FY2024/2025-Q2), Patch-For-Review
lmata edited projects for T370506: Replace check_ripe_atlas with prometheus alert, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Nov 5 2024, 5:11 PM · SRE Observability (FY2024/2025-Q2), Patch-For-Review, Observability-Alerting
lmata edited projects for T370157: Port lists monitoring alerts to Alertmanager, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Nov 5 2024, 5:11 PM · SRE Observability (FY2024/2025-Q2), Observability-Alerting
lmata edited projects for T370526: Remove load_average check for ms-be/thanos-be, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Nov 5 2024, 5:11 PM · SRE Observability (FY2024/2025-Q2), SRE-swift-storage, Observability-Alerting
lmata edited projects for T370530: Clean up "git repo needs merge" checks, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Nov 5 2024, 5:11 PM · SRE Observability (FY2024/2025-Q2), Puppet, MW-on-K8s, Observability-Alerting
lmata edited projects for T370772: Prometheus eqiad/codfw hw expansion architecture options, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Nov 5 2024, 5:11 PM · SRE Observability (FY2024/2025-Q2), Observability-Metrics
lmata edited projects for T371080: Port disk space check for hadoop worker to Alertmanager, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Nov 5 2024, 5:11 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), SRE Observability (FY2024/2025-Q2), Observability-Alerting
lmata edited projects for T371083: Port profile::opensearch::monitoring::base_checks to Prometheus/Alertmanager, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Nov 5 2024, 5:11 PM · SRE Observability (FY2024/2025-Q2), Observability-Alerting
lmata edited projects for T371087: Configure Prometheus instance centrally, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Nov 5 2024, 5:11 PM · SRE Observability (FY2024/2025-Q2), Observability-Metrics
lmata edited projects for T371485: Grafana MySQL charts can be inconsistent when zooming out, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Nov 5 2024, 5:11 PM · SRE Observability (FY2024/2025-Q2), Data-Persistence, Grafana
lmata edited projects for T371885: Gaps in Grafana graphs using Thanos, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Nov 5 2024, 5:10 PM · SRE Observability (FY2024/2025-Q2), serviceops, MW-on-K8s, Grafana, Observability-Metrics
lmata edited projects for T373995: CPU thermal throttling: saturation panel isn't working as expected, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Nov 5 2024, 5:10 PM · SRE Observability (FY2024/2025-Q2)
lmata moved T366292: Determine and implement steps needed to facilitate read-only graphite in production from Inbox to Up Next on the SRE Observability (FY2024/2025-Q2) board.
Nov 5 2024, 5:07 PM · SRE Observability (FY2024/2025-Q2), Observability-Metrics
lmata moved T350694: Infrastructure Foundation Alerts to migrate from Epics In Progress to Radar on the SRE Observability (FY2024/2025-Q2) board.
Nov 5 2024, 5:07 PM · SRE Observability (FY2024/2025-Q2), Patch-For-Review, Infrastructure-Foundations, Observability-Alerting
lmata moved T343020: Converting MediaWiki Metrics to StatsLib from Inbox to Epics In Progress on the SRE Observability (FY2024/2025-Q2) board.
Nov 5 2024, 5:07 PM · Patch-For-Review, SRE Observability (FY2024/2025-Q2), Observability-Metrics
lmata moved T350592: EPIC: migrate in use metrics and dashboards to statslib from Inbox to Epics In Progress on the SRE Observability (FY2024/2025-Q2) board.
Nov 5 2024, 5:07 PM · SRE Observability (FY2024/2025-Q2), MW-1.43-notes (1.43.0-wmf.21; 2024-09-03), Epic, MW-1.42-notes (1.42.0-wmf.15; 2024-01-23), MediaWiki-Platform-Team (Radar), Observability-Metrics
lmata moved T372457: Remove librenms -> graphite integration, replace with gnmi from Inbox to Up Next on the SRE Observability (FY2024/2025-Q2) board.
Nov 5 2024, 5:07 PM · Cloud-VPS, SRE Observability (FY2024/2025-Q2), cloud-services-team
lmata moved T369122: On-call batphone escalation configuration holidays FY2024/25 from Inbox to In Progress on the SRE Observability (FY2024/2025-Q2) board.
Nov 5 2024, 5:07 PM · SRE Observability (FY2024/2025-Q2)
lmata moved T353912: Observability Bookworm upgrades from Inbox to Epics In Progress on the SRE Observability (FY2024/2025-Q2) board.
Nov 5 2024, 5:06 PM · SRE Observability (FY2024/2025-Q2), Patch-For-Review
lmata moved T356386: Move all o11y services to discovery.wmnet from Inbox to Up Next on the SRE Observability (FY2024/2025-Q2) board.
Nov 5 2024, 5:06 PM · SRE Observability (FY2024/2025-Q2), Patch-For-Review, Observability-Metrics
lmata edited projects for T365265: Create a per-release deployment of statsd-exporter for mw-on-k8s, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Nov 5 2024, 5:06 PM · SRE Observability (FY2024/2025-Q2), Patch-For-Review, MW-on-K8s, serviceops, Observability-Metrics
lmata edited projects for T350694: Infrastructure Foundation Alerts to migrate, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Nov 5 2024, 5:05 PM · SRE Observability (FY2024/2025-Q2), Patch-For-Review, Infrastructure-Foundations, Observability-Alerting
lmata edited projects for T288622: All Prometheus based alerts move from Icinga to alert manager exclusively, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Nov 5 2024, 5:04 PM · SRE Observability (FY2024/2025-Q2)
lmata moved T365265: Create a per-release deployment of statsd-exporter for mw-on-k8s from Inbox to Epics In Progress on the SRE Observability (FY2024/2025-Q1) board.
Nov 5 2024, 5:03 PM · SRE Observability (FY2024/2025-Q2), Patch-For-Review, MW-on-K8s, serviceops, Observability-Metrics
lmata moved T372418: Put the alert1002 and alert2002 hosts in production from Inbox to Done on the SRE Observability (FY2024/2025-Q1) board.
Nov 5 2024, 5:03 PM · Patch-For-Review, SRE Observability (FY2024/2025-Q1), Observability-Alerting
lmata moved T372607: Decommission the alert1001 and alert2001 hosts from Inbox to Done on the SRE Observability (FY2024/2025-Q1) board.
Nov 5 2024, 5:03 PM · SRE, ops-eqiad, ops-codfw, DC-Ops, decommission-hardware, SRE Observability (FY2024/2025-Q1), Observability-Alerting
lmata moved T373651: Put logging-sd[12]00[1-4] in service from Inbox to Done on the SRE Observability (FY2024/2025-Q1) board.
Nov 5 2024, 5:03 PM · SRE Observability (FY2024/2025-Q1), Observability-Logging
lmata moved T371403: New VictorOps user request from Inbox to Done on the SRE Observability (FY2024/2025-Q1) board.
Nov 5 2024, 5:03 PM · SRE Observability (FY2024/2025-Q1)
lmata moved T326657: Add prometheus-https load balancer from Inbox to Done on the SRE Observability (FY2024/2025-Q1) board.
Nov 5 2024, 5:03 PM · Patch-For-Review, SRE Observability (FY2024/2025-Q1), Observability-Metrics
lmata moved T370636: Grafana dashboard for RL startup manifest size is not updated since March 2024 from Inbox to Done on the SRE Observability (FY2024/2025-Q1) board.
Nov 5 2024, 5:03 PM · SRE Observability (FY2024/2025-Q1), MW-1.43-notes (1.43.0-wmf.16; 2024-07-30), MediaWiki-libs-Stats, MediaWiki-Platform-Team
lmata moved T366710: Switch k8s logs to their own kafka topics from Inbox to Done on the SRE Observability (FY2024/2025-Q1) board.
Nov 5 2024, 5:03 PM · SRE Observability (FY2024/2025-Q1), Observability-Logging
lmata moved T366571: enable navtiming and statsv services on systemd to start at boot from Inbox to Done on the SRE Observability (FY2024/2025-Q1) board.
Nov 5 2024, 5:03 PM · SRE Observability (FY2024/2025-Q1), Observability-Metrics
lmata moved T366573: Make sure burrow starts at boot from Inbox to Done on the SRE Observability (FY2024/2025-Q1) board.
Nov 5 2024, 5:03 PM · SRE Observability (FY2024/2025-Q1), Observability-Metrics
lmata moved T356788: thanos-query probedown due to OOM of both eqiad titan frontends from Inbox to Done on the SRE Observability (FY2024/2025-Q1) board.
Nov 5 2024, 5:03 PM · SRE Observability (FY2024/2025-Q1), Sustainability (Incident Followup), SRE, observability
lmata moved T359261: mw.track() Migrate MediaWiki.TwoColConflict.* to statslib from Inbox to Done on the SRE Observability (FY2024/2025-Q1) board.
Nov 5 2024, 5:03 PM · MW-1.43-notes (1.43.0-wmf.25; 2024-10-01), SRE Observability (FY2024/2025-Q1), Two-Column-Edit-Conflict-Merge, Observability-Metrics
lmata moved T368186: ecs-k8s indices are unmanaged from Inbox to Done on the SRE Observability (FY2024/2025-Q1) board.
Nov 5 2024, 5:03 PM · SRE Observability (FY2024/2025-Q1), Patch-For-Review, Observability-Logging
lmata moved T357614: Alert in need of triage: Number of requests triggering circuit breakers due to excessive memory usage (instance graphite1005) from Inbox to Done on the SRE Observability (FY2024/2025-Q1) board.
Nov 5 2024, 5:03 PM · SRE Observability (FY2024/2025-Q1), sre-alert-triage
lmata moved T359640: mediawiki_resourceloader_build_seconds_bucket big metric on Prometheus ops from Inbox to Done on the SRE Observability (FY2024/2025-Q1) board.
Nov 5 2024, 5:02 PM · SRE Observability (FY2024/2025-Q1), Patch-For-Review, MediaWiki-Platform-Team (Radar), Observability-Metrics
lmata moved T342451: ECS labels field: OpenSearch attempts to detect field type by content from Inbox to Done on the SRE Observability (FY2024/2025-Q1) board.
Nov 5 2024, 5:02 PM · SRE Observability (FY2024/2025-Q1), Observability-Logging
lmata moved T373980: Hosts using nftables are not reachable via ssh from alert[12]002. Reboot needed. from Inbox to Done on the SRE Observability (FY2024/2025-Q1) board.
Nov 5 2024, 5:02 PM · collaboration-services, Infrastructure-Foundations, SRE Observability (FY2024/2025-Q1), Observability-Alerting
lmata moved T350597: Audit and prioritize metrics for conversion to statslib that are used for graphite-based alerting from Inbox to Done on the SRE Observability (FY2024/2025-Q1) board.
Nov 5 2024, 5:02 PM · SRE Observability (FY2024/2025-Q1), User-fgiunchedi, Observability-Metrics
lmata moved T374340: No ntp query ACL for new alert hosts from Inbox to Done on the SRE Observability (FY2024/2025-Q1) board.
Nov 5 2024, 5:02 PM · Traffic, SRE Observability (FY2024/2025-Q1), Observability-Alerting
lmata moved T374821: Replace or delete dumps_store_load_average from Inbox to Done on the SRE Observability (FY2024/2025-Q1) board.
Nov 5 2024, 5:02 PM · Data-Platform-SRE (2024.09.28 - 2024.10.18), SRE Observability (FY2024/2025-Q1), Observability-Alerting
lmata moved T374860: Retire mw_wikiversion_difference check from Inbox to Done on the SRE Observability (FY2024/2025-Q1) board.
Nov 5 2024, 5:02 PM · serviceops, SRE Observability (FY2024/2025-Q1), Observability-Alerting
lmata moved T374540: Degraded RAID on prometheus1008 from Inbox to Done on the SRE Observability (FY2024/2025-Q1) board.
Nov 5 2024, 5:02 PM · SRE Observability (FY2024/2025-Q1), SRE, ops-eqiad, DC-Ops
lmata moved T375085: mtail 3.0.0~rc50-1+b6 leaks memory on centrallog2002 from Inbox to Done on the SRE Observability (FY2024/2025-Q1) board.
Nov 5 2024, 5:02 PM · SRE Observability (FY2024/2025-Q1)
lmata moved T375138: The prometheus-alertmanager service does not start at boot from Inbox to Done on the SRE Observability (FY2024/2025-Q1) board.
Nov 5 2024, 5:02 PM · Patch-For-Review, SRE Observability (FY2024/2025-Q1), Observability-Alerting
lmata moved T375139: The corto service fails to start after the alert hosts failover from Inbox to Done on the SRE Observability (FY2024/2025-Q1) board.
Nov 5 2024, 5:02 PM · SRE Observability (FY2024/2025-Q1), Observability-Alerting
lmata moved T269333: Switch default Grafana datasource to Thanos from Up next to Done on the SRE Observability (FY2024/2025-Q1) board.
Nov 5 2024, 5:02 PM · SRE Observability (FY2024/2025-Q1), Observability-Metrics
lmata moved T361257: Document Icinga Migration Strategy and Communicate to SRE Teams from Up next to Done on the SRE Observability (FY2024/2025-Q1) board.
Nov 5 2024, 5:02 PM · SRE Observability (FY2024/2025-Q1)
lmata moved T375447: Put logging-hd[12]00[45] in service from Inbox to Done on the SRE Observability (FY2024/2025-Q1) board.
Nov 5 2024, 5:02 PM · SRE Observability (FY2024/2025-Q1), Observability-Logging
lmata added a comment to T379043: Login through Grafana using the login link do not work.

hi, @Peter, thanks for the report; we're investigating; meanwhile, please try https://grafana-rw.wikimedia.org instead while we address this issue.

Nov 5 2024, 3:07 PM · SRE Observability (FY2024/2025-Q2), Grafana, Observability-Metrics

Nov 4 2024

lmata added a comment to T378650: Corto: Bot needs a registered nick.

I'm not sure I follow; I read those perms as "no one outside the wmf can post", unless we're willing to change that, I think we'll need a different address.

Nov 4 2024, 12:57 PM · Incident Tooling, SRE-OnFire

Nov 1 2024

lmata added a comment to T378650: Corto: Bot needs a registered nick.

[NOTICE] root@wikimedia.org has too many accounts registered.

Whelp.

Nov 1 2024, 9:40 PM · Incident Tooling, SRE-OnFire
lmata added a comment to T378650: Corto: Bot needs a registered nick.

I've registered cortobot on liberachat. It's currently registered against my wikimedia email address, suggestions for something more generally available welcome (root@wikimedia.org maybe?).

Nov 1 2024, 8:00 PM · Incident Tooling, SRE-OnFire
lmata added a comment to T355837: Add Prometheus support to statsd.js via mw.track().

Thank you for the proposal; it seems we have some options forward. To move us towards a decision, I’d like to raise a couple of points for consideration:

Nov 1 2024, 3:21 PM · MW-1.44-notes (1.44.0-wmf.8; 2024-12-17), MediaWiki-Platform-Team, MediaWiki-Engineering, Patch-For-Review, Event-Platform, Data-Engineering, Grafana, MediaWiki-extensions-WikimediaEvents, Observability-Metrics

Oct 31 2024

lmata triaged T357384: Simplify Grafana failovers as Medium priority.
Oct 31 2024, 9:27 PM · Observability-Metrics
lmata edited projects for T356386: Move all o11y services to discovery.wmnet, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Oct 31 2024, 9:26 PM · SRE Observability (FY2024/2025-Q2), Patch-For-Review, Observability-Metrics
lmata moved T350694: Infrastructure Foundation Alerts to migrate from Inbox to Epics In Progress on the SRE Observability (FY2024/2025-Q1) board.
Oct 31 2024, 9:26 PM · SRE Observability (FY2024/2025-Q2), Patch-For-Review, Infrastructure-Foundations, Observability-Alerting
lmata moved T288622: All Prometheus based alerts move from Icinga to alert manager exclusively from Inbox to Epics In Progress on the SRE Observability (FY2024/2025-Q1) board.
Oct 31 2024, 9:26 PM · SRE Observability (FY2024/2025-Q2)
lmata edited projects for T353912: Observability Bookworm upgrades, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Oct 31 2024, 9:25 PM · SRE Observability (FY2024/2025-Q2), Patch-For-Review
lmata removed a project from T234565: Standardize the logging format: SRE Observability (FY2024/2025-Q1).
Oct 31 2024, 9:25 PM · Observability-Logging, Patch-For-Review
lmata removed a project from T288621: Logs and events produced by the WMF are consumed using the Elastic Common Schema by OpenSearch: SRE Observability (FY2024/2025-Q1).
Oct 31 2024, 9:25 PM · Observability-Logging
lmata moved T353912: Observability Bookworm upgrades from Inbox to Epics In Progress on the SRE Observability (FY2024/2025-Q1) board.
Oct 31 2024, 9:24 PM · SRE Observability (FY2024/2025-Q2), Patch-For-Review
lmata edited projects for T372242: Alert on unscrapable pods, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Oct 31 2024, 9:24 PM · SRE Observability (FY2024/2025-Q2), serviceops, Kubernetes
lmata edited projects for T373510: opensearch general purpose dashboard, added: Observability-Logging; removed SRE Observability (FY2024/2025-Q1).
Oct 31 2024, 9:23 PM · Observability-Logging, Observability-Alerting
lmata edited projects for T373523: add args to disable/enable shards relocation before/after each restart operated by sre.opensearch.roll-restart-reboot, added: Observability-Logging; removed SRE Observability (FY2024/2025-Q1).
Oct 31 2024, 9:23 PM · Observability-Logging
lmata edited projects for T373814: differentiate prometheus jobs for scraping elastic and open search metrics, added: Observability-Logging, Observability-Metrics; removed SRE Observability (FY2024/2025-Q1).
Oct 31 2024, 9:22 PM · Observability-Metrics, Observability-Logging