SREGroup
ActivePublic

Recent Activity

Yesterday

Dzahn added a comment to T351202: stewards1001 / stewards2001: automatically subscribe stewards to mailman lists (was: Enable API access for Mailman3).

@Urbanecm https://gerrit.wikimedia.org/r/c/operations/puppet/+/1053399 creates a defined type to sync the members of any list

Fri, Jul 12, 11:20 PM · Patch-For-Review, User-Urbanecm, collaboration-services, SRE, Wikimedia-Mailing-lists, Stewards-Onboarding-Tool
wiki_willy added a comment to T363576: Broadcom NICs with recent firmware fail to reimage.

Thanks for testing this out @Papaul. Since it appears that upgrading the WMF environment to PXELINUX version 6.04 may fix this issue, who would be the best person to help us get that upgraded?

Fri, Jul 12, 10:58 PM · DC-Ops, ops-codfw, Infrastructure-Foundations, SRE
thcipriani added a comment to T363957: deployment_server bullseye - mw-cgroup.service: Failed .

Thanks for documenting this, ran into the same thing in deployment prep (T327742), reboot also fixed it there.

Fri, Jul 12, 9:00 PM · serviceops, SRE
gerritbot added a comment to T351202: stewards1001 / stewards2001: automatically subscribe stewards to mailman lists (was: Enable API access for Mailman3).

Change #1053399 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] mailman3: add defined type to sync list members (WIP)

https://gerrit.wikimedia.org/r/1053399

Fri, Jul 12, 8:17 PM · Patch-For-Review, User-Urbanecm, collaboration-services, SRE, Wikimedia-Mailing-lists, Stewards-Onboarding-Tool
Jhancock.wm reassigned T367804: Q1:rack/setup/install frand200[12] from Jhancock.wm to Papaul.

idrac, bios, pwd are set. ports are as follows.

Fri, Jul 12, 5:32 PM · SRE, fundraising-tech-ops, ops-codfw, DC-Ops
Jhancock.wm reassigned T367816: Q1:rack/setup/install fransc2001 from Jhancock.wm to Papaul.

@Papaul
idrac, bios, and new pwd set. ports are as follows.
ETH0 <-> FASW-C8A eth-0/0/15
ETH1 <-> FASW-C8B eth-0/0/15

Fri, Jul 12, 5:29 PM · SRE, fundraising-tech-ops, ops-codfw, DC-Ops
gerritbot added a comment to T367949: Spin down api_appserver and appserver clusters.

Change #1053823 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/cookbooks@master] WIP: switchdc: prepare mediawiki cache warmup for bare-metal turndown

https://gerrit.wikimedia.org/r/1053823

Fri, Jul 12, 5:28 PM · Patch-For-Review, Release-Engineering-Team (Seen), SRE, Traffic, serviceops, MW-on-K8s
Jhancock.wm claimed T367819: Q1:rack/setup/install franio200[1-3].
Fri, Jul 12, 4:30 PM · SRE, fundraising-tech-ops, ops-codfw, DC-Ops
Jhancock.wm updated the task description for T367816: Q1:rack/setup/install fransc2001.
Fri, Jul 12, 4:29 PM · SRE, fundraising-tech-ops, ops-codfw, DC-Ops
Jhancock.wm claimed T367804: Q1:rack/setup/install frand200[12].
Fri, Jul 12, 4:29 PM · SRE, fundraising-tech-ops, ops-codfw, DC-Ops
Papaul added a comment to T363576: Broadcom NICs with recent firmware fail to reimage.

I checked on sretest2001 it's trying to boot with PXELINUX version 6.03

Fri, Jul 12, 4:10 PM · DC-Ops, ops-codfw, Infrastructure-Foundations, SRE
gerritbot added a comment to T367949: Spin down api_appserver and appserver clusters.

Change #1053805 merged by jenkins-bot:

[operations/deployment-charts@master] cxserver: update outdated comments on chart values

https://gerrit.wikimedia.org/r/1053805

Fri, Jul 12, 3:56 PM · Patch-For-Review, Release-Engineering-Team (Seen), SRE, Traffic, serviceops, MW-on-K8s
gerritbot added a comment to T367949: Spin down api_appserver and appserver clusters.

Change #1053801 merged by jenkins-bot:

[operations/software/spicerack@master] mediawiki: update siteinfo URL to use mw-api-int

https://gerrit.wikimedia.org/r/1053801

Fri, Jul 12, 3:49 PM · Patch-For-Review, Release-Engineering-Team (Seen), SRE, Traffic, serviceops, MW-on-K8s
ssingh added a comment to T369366: Migrate DNS depooling of sites from operations/dns (git) to confctl.

Final (famous last words) form:

Fri, Jul 12, 2:50 PM · Patch-For-Review, SRE, Traffic
ssingh added a comment to T369366: Migrate DNS depooling of sites from operations/dns (git) to confctl.

Thanks for the feedback @Joe!

Fri, Jul 12, 2:43 PM · Patch-For-Review, SRE, Traffic
Jhancock.wm claimed T367816: Q1:rack/setup/install fransc2001.
Fri, Jul 12, 2:16 PM · SRE, fundraising-tech-ops, ops-codfw, DC-Ops
Joe added a comment to T369366: Migrate DNS depooling of sites from operations/dns (git) to confctl.

A couple of notes:

Fri, Jul 12, 2:04 PM · Patch-For-Review, SRE, Traffic
gerritbot added a comment to T367439: No unicast IP ranges announced to peers from eqdfw.

Change #1053935 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/homer/public@master] Adjust route generation for Anycast ranges at eqord

https://gerrit.wikimedia.org/r/1053935

Fri, Jul 12, 1:58 PM · Patch-For-Review, Infrastructure-Foundations, netops, SRE
Papaul added a comment to T362824: Q#:rack/setup/install dbproxy200[5-8].

@Marostegui thank you for checking. You are right looks like the host still has it's IPV6 or we remove it after the re-image in netbox.

2: ens1f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 7c:c2:55:97:5c:ce brd ff:ff:ff:ff:ff:ff
    altname enp138s0f0np0
    inet 10.192.23.11/24 brd 10.192.23.255 scope global ens1f0np0
       valid_lft forever preferred_lft forever
    inet6 2620:0:860:113:10:192:23:11/64 scope global
       valid_lft 2591992sec preferred_lft 604792sec
    inet6 fe80::7ec2:55ff:fe97:5cce/64 scope link
       valid_lft forever preferred_lft forever

i will try to re-image it again but in the main time can you try to remove that IP6s entry and reboot the server.
Thank you

Fri, Jul 12, 1:54 PM · SRE, ops-codfw, Data-Persistence, DC-Ops
gerritbot added a comment to T369366: Migrate DNS depooling of sites from operations/dns (git) to confctl.

Change #1053929 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] P:dns::auth::update: maintain admin_state via confd

https://gerrit.wikimedia.org/r/1053929

Fri, Jul 12, 1:40 PM · Patch-For-Review, SRE, Traffic
fgiunchedi added a comment to T369826: 10gbit nic option for centrallog2002.

Thank you @Jhancock.wm that's great! Please LMK a day and time of next week that would work for you

Fri, Jul 12, 1:37 PM · SRE, ops-codfw, DC-Ops
Jhancock.wm reassigned T362824: Q#:rack/setup/install dbproxy200[5-8] from Jhancock.wm to Papaul.
Fri, Jul 12, 1:36 PM · SRE, ops-codfw, Data-Persistence, DC-Ops
fgiunchedi added a comment to T369825: 10gbit nic option for centrallog1002.

@wiki_willy Yes, I was able to locate one. @fgiunchedi is there an estimated time and date for us to bring the server down and install the it?

Fri, Jul 12, 1:36 PM · SRE, ops-eqiad, DC-Ops
Jhancock.wm added a comment to T362824: Q#:rack/setup/install dbproxy200[5-8].

dbproxy2006 temp 1G -> B7 lsw port 47
dbproxy2007 temp 1G -> C7 asw port 43
dbproxy2008 temp 1G -> D4 asw port 43

Fri, Jul 12, 1:35 PM · SRE, ops-codfw, Data-Persistence, DC-Ops
Stashbot added a comment to T367439: No unicast IP ranges announced to peers from eqdfw.

Mentioned in SAL (#wikimedia-operations) [2024-07-12T13:10:20Z] <topranks> pushing updated BGP policy to cr2-eqord and cr2-eqdfw to announce Anycast ranges from network pops (T367439)

Fri, Jul 12, 1:10 PM · Patch-For-Review, Infrastructure-Foundations, netops, SRE
gerritbot added a comment to T367439: No unicast IP ranges announced to peers from eqdfw.

Change #1052086 merged by jenkins-bot:

[operations/homer/public@master] Announce Anycast ranges from Network POPs

https://gerrit.wikimedia.org/r/1052086

Fri, Jul 12, 1:09 PM · Patch-For-Review, Infrastructure-Foundations, netops, SRE
cmooney closed T365996: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-f1-eqiad as Resolved.
Fri, Jul 12, 11:37 AM · SRE-swift-storage, DBA, Data-Persistence, Infrastructure-Foundations, netops, SRE
cmooney closed T365996: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-f1-eqiad , a subtask of T348977: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2, as Resolved.
Fri, Jul 12, 11:37 AM · Infrastructure-Foundations, netops, SRE
ArielGlenn closed T368911: Request for Kerb credentials for Ariel Glenn as Resolved.

Hey Daniel, I'd just assumed that getting added to the analytics-privatedata-users group would be redundant so thanks for catching that.

Fri, Jul 12, 11:07 AM · SRE, SRE-Access-Requests, Data-Engineering
Krd added a comment to T359901: VRT wiki fails to create account.

The problem occurs just now, created one account, cannot create another one.

Fri, Jul 12, 10:18 AM · serviceops, SRE
phaultfinder updated the task description for T368766: ManagementSSHDown.
Fri, Jul 12, 9:52 AM · SRE, DC-Ops, ops-eqiad
gerritbot added a comment to T366882: Move GitLab behind the CDN.

Change #1053879 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] gitlab: switch gitlab-replica-b from iptables to nftables

https://gerrit.wikimedia.org/r/1053879

Fri, Jul 12, 9:12 AM · Patch-For-Review, Release-Engineering-Team, Traffic, collaboration-services, SRE
gerritbot added a comment to T366882: Move GitLab behind the CDN.

Change #1053877 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] gitlab: replace ferm::service with firewall::service

https://gerrit.wikimedia.org/r/1053877

Fri, Jul 12, 8:57 AM · Patch-For-Review, Release-Engineering-Team, Traffic, collaboration-services, SRE
gerritbot added a comment to T366882: Move GitLab behind the CDN.

Change #1053306 merged by Jelto:

[operations/puppet@production] gitlab: switch gitlab-replica-b from iptables to nftables

https://gerrit.wikimedia.org/r/1053306

Fri, Jul 12, 8:12 AM · Patch-For-Review, Release-Engineering-Team, Traffic, collaboration-services, SRE
Marostegui added a comment to T362824: Q#:rack/setup/install dbproxy200[5-8].

@Papaul I cannot access the host via ssh remotely, but the host is up and has network. I've connected via supermicro idrac and I think it is related to the DNS

root@dbproxy2005:~# host dbproxy2005.codfw.wmnet
Host dbproxy2005.codfw.wmnet not found: 3(NXDOMAIN)
Fri, Jul 12, 5:58 AM · SRE, ops-codfw, Data-Persistence, DC-Ops
phaultfinder updated the task description for T368766: ManagementSSHDown.
Fri, Jul 12, 5:52 AM · SRE, DC-Ops, ops-eqiad
Papaul updated subscribers of T363576: Broadcom NICs with recent firmware fail to reimage.

@wiki_willy I did more tests on this pxe boot issue we are having with the 10G Dell NIC card by taking one of the decommissioned server we have and putting a 10G NIC card inside. I connected the server into my lab and pxe boot the server with the Firmware @ version 22.21.06.80

Broadcom Adv. Dual 10Gb Ethernet - 00:0A:F7:F0:0C:10	22.21.06.80
Broadcom Adv. Dual 10Gb Ethernet - 00:0A:F7:F0:0C:11	22.21.06.80

The server was able to pxe boot without an issue: below out put from my install server

Jul 11 22:52:34 install1001 dhcpd[3408477]: DHCPDISCOVER from 00:0a:f7:f0:0c:10 via 10.192.64.1
Jul 11 22:52:34 install1001 dhcpd[3408477]: DHCPOFFER on 10.192.64.21 to 00:0a:f7:f0:0c:10 via 10.192.64.1
Jul 11 22:52:38 install1001 dhcpd[3408477]: DHCPREQUEST for 10.192.64.21 (10.192.16.5) from 00:0a:f7:f0:0c:10 via 10.192.64.1
Jul 11 22:52:38 install1001 dhcpd[3408477]: DHCPACK on 10.192.64.21 to 00:0a:f7:f0:0c:10 via 10.192.64.1
Jul 11 22:52:38 install1001 atftpd[529]: Serving bullseye-installer/debian-installer/amd64/pxelinux.0 to 10.192.64.21:2070
Jul 11 22:52:38 install1001 atftpd[529]: Serving bullseye-installer/debian-installer/amd64/pxelinux.0 to 10.192.64.21:2071
Jul 11 22:52:38 install1001 atftpd[529]: Serving bullseye-installer/ldlinux.c32 to 10.192.64.21:49152
Jul 11 22:52:38 install1001 atftpd[529]: Serving bullseye-installer/pxelinux.cfg/ttys1-115200 to 10.192.64.21:49153
Jul 11 22:52:38 install1001 atftpd[529]: Serving bullseye-installer/pxelinux.cfg/boot.txt to 10.192.64.21:49154
Jul 11 22:52:48 install1001 atftpd[529]: Serving bullseye-installer/debian-installer/amd64/linux to 10.192.64.21:49155
Jul 11 22:52:49 install1001 atftpd[529]: Serving bullseye-installer/debian-installer/amd64/initrd.gz to 10.192.64.21:49156

what I saw also on the console of the server while booting up was in my lab environment I am using "PXELINUX 6.04" or in WMF environment we are using "PXELINUX 6.03". I will have to double check this tomorrow when I am back online.

Fri, Jul 12, 4:47 AM · DC-Ops, ops-codfw, Infrastructure-Foundations, SRE
Papaul updated subscribers of T363576: Broadcom NICs with recent firmware fail to reimage.
Fri, Jul 12, 4:02 AM · DC-Ops, ops-codfw, Infrastructure-Foundations, SRE
gerritbot added a comment to T367949: Spin down api_appserver and appserver clusters.

Change #1053819 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] mediawiki-cache-warmup: prepare for bare-metal turndown

https://gerrit.wikimedia.org/r/1053819

Fri, Jul 12, 1:03 AM · Patch-For-Review, Release-Engineering-Team (Seen), SRE, Traffic, serviceops, MW-on-K8s

Thu, Jul 11

gerritbot added a comment to T367949: Spin down api_appserver and appserver clusters.

Change #1053809 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/deployment-charts@master] kserve-inference: update references to deprecated services in fixtures

https://gerrit.wikimedia.org/r/1053809

Thu, Jul 11, 10:43 PM · Patch-For-Review, Release-Engineering-Team (Seen), SRE, Traffic, serviceops, MW-on-K8s
gerritbot added a comment to T367949: Spin down api_appserver and appserver clusters.

Change #1053808 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/deployment-charts@master] wikifeeds: update references to deprecated services

https://gerrit.wikimedia.org/r/1053808

Thu, Jul 11, 10:43 PM · Patch-For-Review, Release-Engineering-Team (Seen), SRE, Traffic, serviceops, MW-on-K8s
gerritbot added a comment to T367949: Spin down api_appserver and appserver clusters.

Change #1053807 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/deployment-charts@master] push-notifications: update references to deprecated services

https://gerrit.wikimedia.org/r/1053807

Thu, Jul 11, 10:43 PM · Patch-For-Review, Release-Engineering-Team (Seen), SRE, Traffic, serviceops, MW-on-K8s
gerritbot added a comment to T367949: Spin down api_appserver and appserver clusters.

Change #1053806 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/deployment-charts@master] mobileapps: update references to deprecated services

https://gerrit.wikimedia.org/r/1053806

Thu, Jul 11, 10:43 PM · Patch-For-Review, Release-Engineering-Team (Seen), SRE, Traffic, serviceops, MW-on-K8s
gerritbot added a comment to T367949: Spin down api_appserver and appserver clusters.

Change #1053805 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/deployment-charts@master] cxserver: update outdated comments on chart values

https://gerrit.wikimedia.org/r/1053805

Thu, Jul 11, 10:43 PM · Patch-For-Review, Release-Engineering-Team (Seen), SRE, Traffic, serviceops, MW-on-K8s
Papaul added a comment to T362824: Q#:rack/setup/install dbproxy200[5-8].

@Marostegui like we discussed this morning, I was able to install dbproxy2005 using the workaround of using the 1G NIC for the install and switch to 10G after the install. Please check if all looks good on dbproxy2005 so I can proceed with the others.

Thu, Jul 11, 10:33 PM · SRE, ops-codfw, Data-Persistence, DC-Ops
gerritbot added a comment to T367949: Spin down api_appserver and appserver clusters.

Change #1053804 had a related patch set uploaded (by Scott French; author: Scott French):

[mediawiki/services/example-node-api@master] example-node-api: remove deprecated service from example prod config

https://gerrit.wikimedia.org/r/1053804

Thu, Jul 11, 10:28 PM · Patch-For-Review, Release-Engineering-Team (Seen), SRE, Traffic, serviceops, MW-on-K8s
Papaul added a comment to T362824: Q#:rack/setup/install dbproxy200[5-8].

@Jhancock.wm i think you missed @Marostegui comment about not setting IPV6 for those hosts. I fixed it.

Thu, Jul 11, 10:26 PM · SRE, ops-codfw, Data-Persistence, DC-Ops
gerritbot added a comment to T367949: Spin down api_appserver and appserver clusters.

Change #1053803 had a related patch set uploaded (by Scott French; author: Scott French):

[mediawiki/services/mobileapps@master] mobileapps: remove deprecated services from example prod config

https://gerrit.wikimedia.org/r/1053803

Thu, Jul 11, 10:26 PM · Patch-For-Review, Release-Engineering-Team (Seen), SRE, Traffic, serviceops, MW-on-K8s
gerritbot added a comment to T367949: Spin down api_appserver and appserver clusters.

Change #1053802 had a related patch set uploaded (by Scott French; author: Scott French):

[mediawiki/services/wikifeeds@master] wikifeeds: remove deprecated services from example prod config

https://gerrit.wikimedia.org/r/1053802

Thu, Jul 11, 10:25 PM · Patch-For-Review, Release-Engineering-Team (Seen), SRE, Traffic, serviceops, MW-on-K8s
gerritbot added a comment to T367949: Spin down api_appserver and appserver clusters.

Change #1053801 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/software/spicerack@master] mediawiki: update siteinfo URL to use mw-api-int

https://gerrit.wikimedia.org/r/1053801

Thu, Jul 11, 10:23 PM · Patch-For-Review, Release-Engineering-Team (Seen), SRE, Traffic, serviceops, MW-on-K8s