User Details
- User Since
- Jul 24 2019, 8:11 PM (259 w, 2 d)
- Availability
- Available
- LDAP User
- Jclark-ctr
- MediaWiki User
- Jclark-ctr [ Global Accounts ]
Thu, Jul 11
Wed, Jul 3
@Papaul if you get a chance can you look at this one?
Tue, Jul 2
@BTullis if you get a chance to update files. These are ready to be imaged and handed over
duplicate of T362033
@VRiley-WMF if you can update with 2nd network connection then hand over to @cmooney
Mon, Jul 1
Fri, Jun 28
cloudcephosd1039
2nd cable serial#20220008 port 1
cloudcephosd1040
2nd cable serial#20220043 port 5
cloudcephosd1041
2nd cable serial#20220011 port 7
Did mgmt ip address get update for any maintenance you preformed?
Reseated psu
duplicate T362033
duplicate T362033
Thu, Jun 27
@BTullis can you update preseed.yam and site.pp file for these servers
@akosiaris please update Site.pp file for this server
Tue, Jun 25
@MoritzMuehlenhoff would you be able to update site.pp file for this server?
Thu, Jun 20
No faults since jun 9th
Have not seen any errors return on this closing this ticket
duplicate T362033
duplicate T362033
Updated Idrac for all servers listed to iDRAC Firmware Version 7.00.00.171
Replaced cable
Tue, Jun 18
@Clement_Goubert did you need just idrac updated we can do that easily. Bios requires reboot
Mon, Jun 17
@Marostegui Updated idrac and bios firmware
resolved with T367071
Jun 11 2024
@MoritzMuehlenhoff after replacing failed drive looked like it might boot but still fails. Might need to be reimaged I do not have root access so unable to proceed passed this
failed drive was replaced also
@MoritzMuehlenhoff Replaced Dimm.
@MoritzMuehlenhoff Can i take server down to replace dimm?
DIMM B1
BankLabel:
B
CacheSize:
Information Not Available
CurrentOperatingSpeed:
2400 MHz
DeviceDescription:
DIMM B1
DeviceType:
Memory
FQDD:
DIMM.Socket.B1
InstanceID:
DIMM.Socket.B1
LastSystemInventoryTime:
2024-06-10T19:54:17
LastUpdateTime:
2022-02-02T00:40:40
ManufactureDate:
Mon Sep 10 07:00:00 2018 UTC
Manufacturer:
Micron Technology
MemoryTechnology:
DRAM
MemoryType:
DDR-4
Model:
DDR4 DIMM
NonVolatileSize:
Information Not Available
PartNumber:
36ASF4G72PZ-2G6E1
PrimaryStatus:
Ok
Rank:
Double Rank
RemainingRatedWriteEndurance:
Information Not Available
SerialNumber:
1E5C734E
Size:
32768 MB
Speed:
2666 MHz
SystemEraseCapability:
Not Supported
VolatileSize:
32768 MB
The system memory has faced an uncorrectable multi-bit memory errors in the non-execution path of a memory device at the location DIMM_B1.
This server is out of warranty Will check decom servers to see if we have any suitable dimms
Jun 6 2024
Installed cross connect link came up on port. cableid #5229
Jun 5 2024
replaced broken cable server went 2 weeks with out fault returning
Jun 4 2024
manually updated firmware
iDRAC Firmware Version 7.00.00.171
BIOS Version 2.21.1
Jun 3 2024
May 30 2024
@akosiaris kafka-main1010 has imaged but is still failing cookbook for me would you be able to try that one for me?
May 29 2024
@dcaro the drive was listed as ready in idrac Converted to non-raid should be visible now
May 28 2024
Replaced Failed Drive
@akosiaris still failing for same issue for kafka-main1010
May 24 2024
I was able to correct kafka-main1010 issue for dhcp but image fails still
@akosiaris did you have this issue with other servers?May 23 2024
relabled servers
You have successfully submitted request SR191070960.
Ordered replacement drive. will update when arrives
@aborrero I am stuck right now i did attempt to reimage with no luck. Unsure what version of grub we have installed but looks like the same as this bug. @Papaul do you have any insight on this? https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=987008
The replacement cable did just arrive yesterday. After multiple back and forth with dell Can we leave this open for 1 more week make sure error will not return. leave server running and I will reach out to you for downtime next week for replacement.
Replaced drive
@Marostegui I do have a spare disk can I swap it at anytime
May 22 2024
May 21 2024
May 15 2024
May 14 2024
@akosiaris could you please update preseed.yaml file? I did take care of site.pp file for codfw and eqiad
kafka-main1010
Rack: E 5
U 26
Cableid : 2013339101771
Port : 6
@Volans i also see this as a learning opportunity most of these are just logs. Some dcops members are very light on linux and we could be expanding knowledge and could be come more valuable members of the team. Although I do love cookbooks but sometimes they fail and would be nice if we could continue to teach and train coworkers
May 13 2024
@Volans The main purpose is for gathering debug information I would prefer to grep mesg /log files instead of searching throughout entire output. Mdadm commands would allow us to one day rebuild failed software raids
May 9 2024
@Marostegui you can put server back in rotation even though i uploaded multiple photos yesterday to Dell. They replied this morning requesting part number to send correct part
I attached the photo that was sent to dell. I do not expect it to arrive till next week. We can work on this at a later date.May 8 2024
I believe we are good to reimage server OS looks corrupt. if you could just wait till tomorrow to put back in production while i wait for Dell to respond if they will send out new cable.
I am powering it up now and will check idrac.
Replaced Backplane : cable that connects raid card<-> backplane / power control board. I did find a cable with a loose pin on the power control board (not replaced) but will be reaching out to Dell regarding it it has been reseated in connector and should be fine for the time being
May 7 2024
Friday dell agreed to replace Backplane and cables. shipped out Monday expected arrival Tuesday.
May 2 2024
@akosiaris idrac has stayed up for 4 days now possibly me relocating to a different port helped it. We wont know until it is put in use again. this server is out of warranty if it fails again we could look at swapping it with another decom server?
@andrea.denisse We have been having a few issues with software raids we are trying to pinpoint what slot these are in. Idrac is not listing the drives. I will message you for assistance
Apr 30 2024
@Marostegui "At the creation of ticket i requested to not repeat any troubleshooting steps the where not effective"
followed up with dell again they should be sending out parts shortly
Idrac is still up after almost 24 hours. i did move IDRAC port on switch to a different group of ports will monitor it