Dmesg full of I/O errors, smart ok, four disks affectedmdadm raid1 fails to resyncLinux external USB drive failure - corrupt filesystemmdadm raid1, [1/2] disks failed, safe to reboot?end_request: I/O error, dev sda, sector xxxxxxxxxMigrate SAS disks from E200i to B320i Smart ArrayDisk IO Errors when writing / Linux + Windows / HDD is OKZFS slow read speed on 8 drive 4 vdev striped mirrorsHow to find the cause of a single event file content corruption?dmesg errors not showing in ddrescueG-sense errors in new disks
Bash echo $-1 prints hb1. Why?
How can I convince my reader that I will not use a certain trope?
Why isn’t the tax system continuous rather than bracketed?
How can I check type T is among parameter pack Ts... in C++?
Signing using digital signatures?
Is there a short way to check uniqueness of values without using 'if' and multiple 'and's?
Why is a blank required between "[[" and "-e xxx" in ksh?
How to modify the uneven space between separate loop cuts, while they are already cut?
Averting Real Women Don’t Wear Dresses
What are good ways to spray paint a QR code on a footpath?
Why won't the ground take my seed?
How well known and how commonly used was Huffman coding in 1979?
How would a order of Monks that renounce their names communicate effectively?
Anagram Within an Anagram!
How can I create ribbons like these in Microsoft word 2010?
What's the point of DHS warning passengers about Manila airport?
SPI Waveform on Raspberry Pi Not clean and I'm wondering why
What does 2>&1 | tee mean?
Math PhD in US vs Master + PhD in Europe
“Transitive verb” + interrupter+ “object”?
Why is Madam Hooch not a professor?
Should I report a leak of confidential HR information?
A player is constantly pestering me about rules, what do I do as a DM?
One folder two different locations on ubuntu 18.04
Dmesg full of I/O errors, smart ok, four disks affected
mdadm raid1 fails to resyncLinux external USB drive failure - corrupt filesystemmdadm raid1, [1/2] disks failed, safe to reboot?end_request: I/O error, dev sda, sector xxxxxxxxxMigrate SAS disks from E200i to B320i Smart ArrayDisk IO Errors when writing / Linux + Windows / HDD is OKZFS slow read speed on 8 drive 4 vdev striped mirrorsHow to find the cause of a single event file content corruption?dmesg errors not showing in ddrescueG-sense errors in new disks
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I'm working on a remote server (Dell Poweredge) that was a new install. It has four drives (2TB) and 2 SSD's (250 GB). One SSD contains the OS (RHEL7) and the four mechanical disks are eventually going to contain an oracle database.
Trying to create a software RAID array led to disks constantly being marked as faulty. Checking dmesg outputs a slew of the following errors,
[127491.711407] blk_update_request: I/O error, dev sde, sector 3907026080
[127491.719699] sd 0:0:4:0: [sde] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[127491.719717] sd 0:0:4:0: [sde] Sense Key : Aborted Command [current]
[127491.719726] sd 0:0:4:0: [sde] Add. Sense: Logical block guard check failed
[127491.719734] sd 0:0:4:0: [sde] CDB: Read(32)
[127491.719742] sd 0:0:4:0: [sde] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[127491.719750] sd 0:0:4:0: [sde] CDB[10]: e8 e0 7c a0 e8 e0 7c a0 00 00 00 00 00 00 00 08
[127491.719757] blk_update_request: I/O error, dev sde, sector 3907026080
[127491.719764] Buffer I/O error on dev sde, logical block 488378260, async page read
[127497.440222] sd 0:0:5:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[127497.440240] sd 0:0:5:0: [sdf] Sense Key : Aborted Command [current]
[127497.440249] sd 0:0:5:0: [sdf] Add. Sense: Logical block guard check failed
[127497.440258] sd 0:0:5:0: [sdf] CDB: Read(32)
[127497.440266] sd 0:0:5:0: [sdf] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[127497.440273] sd 0:0:5:0: [sdf] CDB[10]: 00 01 a0 00 00 01 a0 00 00 00 00 00 00 00 00 08
[127497.440280] blk_update_request: I/O error, dev sdf, sector 106496
[127497.901432] sd 0:0:5:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[127497.901449] sd 0:0:5:0: [sdf] Sense Key : Aborted Command [current]
[127497.901458] sd 0:0:5:0: [sdf] Add. Sense: Logical block guard check failed
[127497.901467] sd 0:0:5:0: [sdf] CDB: Read(32)
[127497.901475] sd 0:0:5:0: [sdf] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[127497.901482] sd 0:0:5:0: [sdf] CDB[10]: e8 e0 7c a0 e8 e0 7c a0 00 00 00 00 00 00 00 08
[127497.901489] blk_update_request: I/O error, dev sdf, sector 3907026080
[127497.911003] sd 0:0:5:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[127497.911019] sd 0:0:5:0: [sdf] Sense Key : Aborted Command [current]
[127497.911029] sd 0:0:5:0: [sdf] Add. Sense: Logical block guard check failed
[127497.911037] sd 0:0:5:0: [sdf] CDB: Read(32)
[127497.911045] sd 0:0:5:0: [sdf] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[127497.911052] sd 0:0:5:0: [sdf] CDB[10]: e8 e0 7c a0 e8 e0 7c a0 00 00 00 00 00 00 00 08
[127497.911059] blk_update_request: I/O error, dev sdf, sector 3907026080
[127497.911067] Buffer I/O error on dev sdf, logical block 488378260, async page read
These errors occur for all of the four mechanical disks, (sdc/sdd/sde/sdf) SMARTctl passed all four disks, long and short tests. I'm currently running badblocks (write mode test ~35 hrs in, probably another 35 to go).
The following are the errors I've suspected/considered upon research
Failed HDD - Seems unlikely that 4 "refurbished" disks would be DOA doesn't it?
Storage Controller Issue (bad cable?) - Seems like it would affect the SSD's too?
- Kernel issue, The only change to the stock kernel was the addition of kmod-oracleasm. I really don't see how it would cause these faults, ASM isn't set up at all.
Another noteworthy event was when trying to zero the disks (part of early troubleshooting), using the command $ dd if=/dev/zero of=/dev/sdX yielded these errors,
dd: writing to ‘/dev/sdc’: Input/output error
106497+0 records in
106496+0 records out
54525952 bytes (55 MB) copied, 1.70583 s, 32.0 MB/s
dd: writing to ‘/dev/sdd’: Input/output error
106497+0 records in
106496+0 records out
54525952 bytes (55 MB) copied, 1.70417 s, 32.0 MB/s
dd: writing to ‘/dev/sde’: Input/output error
106497+0 records in
106496+0 records out
54525952 bytes (55 MB) copied, 1.71813 s, 31.7 MB/s
dd: writing to ‘/dev/sdf’: Input/output error
106497+0 records in
106496+0 records out
54525952 bytes (55 MB) copied, 1.71157 s, 31.9 MB/s
If anyone here could share some insight as to what might be causing this, I'd be grateful. I'm inclined to follow occam's razor here and go straight for the HDD's, the only doubt stems from the unlikelihood of four failed HDD's out of box.
I will be driving to the site tomorrow for a physical inspection & to report my assessment of this machine to the higher ups. If there's something I should physically inspect (beyond cables/connections/power supply) please let me know.
Thanks.
redhat hard-drive io
New contributor
add a comment |
I'm working on a remote server (Dell Poweredge) that was a new install. It has four drives (2TB) and 2 SSD's (250 GB). One SSD contains the OS (RHEL7) and the four mechanical disks are eventually going to contain an oracle database.
Trying to create a software RAID array led to disks constantly being marked as faulty. Checking dmesg outputs a slew of the following errors,
[127491.711407] blk_update_request: I/O error, dev sde, sector 3907026080
[127491.719699] sd 0:0:4:0: [sde] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[127491.719717] sd 0:0:4:0: [sde] Sense Key : Aborted Command [current]
[127491.719726] sd 0:0:4:0: [sde] Add. Sense: Logical block guard check failed
[127491.719734] sd 0:0:4:0: [sde] CDB: Read(32)
[127491.719742] sd 0:0:4:0: [sde] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[127491.719750] sd 0:0:4:0: [sde] CDB[10]: e8 e0 7c a0 e8 e0 7c a0 00 00 00 00 00 00 00 08
[127491.719757] blk_update_request: I/O error, dev sde, sector 3907026080
[127491.719764] Buffer I/O error on dev sde, logical block 488378260, async page read
[127497.440222] sd 0:0:5:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[127497.440240] sd 0:0:5:0: [sdf] Sense Key : Aborted Command [current]
[127497.440249] sd 0:0:5:0: [sdf] Add. Sense: Logical block guard check failed
[127497.440258] sd 0:0:5:0: [sdf] CDB: Read(32)
[127497.440266] sd 0:0:5:0: [sdf] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[127497.440273] sd 0:0:5:0: [sdf] CDB[10]: 00 01 a0 00 00 01 a0 00 00 00 00 00 00 00 00 08
[127497.440280] blk_update_request: I/O error, dev sdf, sector 106496
[127497.901432] sd 0:0:5:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[127497.901449] sd 0:0:5:0: [sdf] Sense Key : Aborted Command [current]
[127497.901458] sd 0:0:5:0: [sdf] Add. Sense: Logical block guard check failed
[127497.901467] sd 0:0:5:0: [sdf] CDB: Read(32)
[127497.901475] sd 0:0:5:0: [sdf] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[127497.901482] sd 0:0:5:0: [sdf] CDB[10]: e8 e0 7c a0 e8 e0 7c a0 00 00 00 00 00 00 00 08
[127497.901489] blk_update_request: I/O error, dev sdf, sector 3907026080
[127497.911003] sd 0:0:5:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[127497.911019] sd 0:0:5:0: [sdf] Sense Key : Aborted Command [current]
[127497.911029] sd 0:0:5:0: [sdf] Add. Sense: Logical block guard check failed
[127497.911037] sd 0:0:5:0: [sdf] CDB: Read(32)
[127497.911045] sd 0:0:5:0: [sdf] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[127497.911052] sd 0:0:5:0: [sdf] CDB[10]: e8 e0 7c a0 e8 e0 7c a0 00 00 00 00 00 00 00 08
[127497.911059] blk_update_request: I/O error, dev sdf, sector 3907026080
[127497.911067] Buffer I/O error on dev sdf, logical block 488378260, async page read
These errors occur for all of the four mechanical disks, (sdc/sdd/sde/sdf) SMARTctl passed all four disks, long and short tests. I'm currently running badblocks (write mode test ~35 hrs in, probably another 35 to go).
The following are the errors I've suspected/considered upon research
Failed HDD - Seems unlikely that 4 "refurbished" disks would be DOA doesn't it?
Storage Controller Issue (bad cable?) - Seems like it would affect the SSD's too?
- Kernel issue, The only change to the stock kernel was the addition of kmod-oracleasm. I really don't see how it would cause these faults, ASM isn't set up at all.
Another noteworthy event was when trying to zero the disks (part of early troubleshooting), using the command $ dd if=/dev/zero of=/dev/sdX yielded these errors,
dd: writing to ‘/dev/sdc’: Input/output error
106497+0 records in
106496+0 records out
54525952 bytes (55 MB) copied, 1.70583 s, 32.0 MB/s
dd: writing to ‘/dev/sdd’: Input/output error
106497+0 records in
106496+0 records out
54525952 bytes (55 MB) copied, 1.70417 s, 32.0 MB/s
dd: writing to ‘/dev/sde’: Input/output error
106497+0 records in
106496+0 records out
54525952 bytes (55 MB) copied, 1.71813 s, 31.7 MB/s
dd: writing to ‘/dev/sdf’: Input/output error
106497+0 records in
106496+0 records out
54525952 bytes (55 MB) copied, 1.71157 s, 31.9 MB/s
If anyone here could share some insight as to what might be causing this, I'd be grateful. I'm inclined to follow occam's razor here and go straight for the HDD's, the only doubt stems from the unlikelihood of four failed HDD's out of box.
I will be driving to the site tomorrow for a physical inspection & to report my assessment of this machine to the higher ups. If there's something I should physically inspect (beyond cables/connections/power supply) please let me know.
Thanks.
redhat hard-drive io
New contributor
When you say SMART "ok", do you just mean the overall health? Are any individual raw counters for reallocated or pending sectors non-zero? Drives don't immediately declare themselves failed on the first bad sector, even though it is unreadable. Usesmartctl -x /dev/sda
or something. But it's highly suspicious that it's the same LBA on all disks.
– Peter Cordes
Jun 18 at 6:42
add a comment |
I'm working on a remote server (Dell Poweredge) that was a new install. It has four drives (2TB) and 2 SSD's (250 GB). One SSD contains the OS (RHEL7) and the four mechanical disks are eventually going to contain an oracle database.
Trying to create a software RAID array led to disks constantly being marked as faulty. Checking dmesg outputs a slew of the following errors,
[127491.711407] blk_update_request: I/O error, dev sde, sector 3907026080
[127491.719699] sd 0:0:4:0: [sde] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[127491.719717] sd 0:0:4:0: [sde] Sense Key : Aborted Command [current]
[127491.719726] sd 0:0:4:0: [sde] Add. Sense: Logical block guard check failed
[127491.719734] sd 0:0:4:0: [sde] CDB: Read(32)
[127491.719742] sd 0:0:4:0: [sde] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[127491.719750] sd 0:0:4:0: [sde] CDB[10]: e8 e0 7c a0 e8 e0 7c a0 00 00 00 00 00 00 00 08
[127491.719757] blk_update_request: I/O error, dev sde, sector 3907026080
[127491.719764] Buffer I/O error on dev sde, logical block 488378260, async page read
[127497.440222] sd 0:0:5:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[127497.440240] sd 0:0:5:0: [sdf] Sense Key : Aborted Command [current]
[127497.440249] sd 0:0:5:0: [sdf] Add. Sense: Logical block guard check failed
[127497.440258] sd 0:0:5:0: [sdf] CDB: Read(32)
[127497.440266] sd 0:0:5:0: [sdf] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[127497.440273] sd 0:0:5:0: [sdf] CDB[10]: 00 01 a0 00 00 01 a0 00 00 00 00 00 00 00 00 08
[127497.440280] blk_update_request: I/O error, dev sdf, sector 106496
[127497.901432] sd 0:0:5:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[127497.901449] sd 0:0:5:0: [sdf] Sense Key : Aborted Command [current]
[127497.901458] sd 0:0:5:0: [sdf] Add. Sense: Logical block guard check failed
[127497.901467] sd 0:0:5:0: [sdf] CDB: Read(32)
[127497.901475] sd 0:0:5:0: [sdf] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[127497.901482] sd 0:0:5:0: [sdf] CDB[10]: e8 e0 7c a0 e8 e0 7c a0 00 00 00 00 00 00 00 08
[127497.901489] blk_update_request: I/O error, dev sdf, sector 3907026080
[127497.911003] sd 0:0:5:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[127497.911019] sd 0:0:5:0: [sdf] Sense Key : Aborted Command [current]
[127497.911029] sd 0:0:5:0: [sdf] Add. Sense: Logical block guard check failed
[127497.911037] sd 0:0:5:0: [sdf] CDB: Read(32)
[127497.911045] sd 0:0:5:0: [sdf] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[127497.911052] sd 0:0:5:0: [sdf] CDB[10]: e8 e0 7c a0 e8 e0 7c a0 00 00 00 00 00 00 00 08
[127497.911059] blk_update_request: I/O error, dev sdf, sector 3907026080
[127497.911067] Buffer I/O error on dev sdf, logical block 488378260, async page read
These errors occur for all of the four mechanical disks, (sdc/sdd/sde/sdf) SMARTctl passed all four disks, long and short tests. I'm currently running badblocks (write mode test ~35 hrs in, probably another 35 to go).
The following are the errors I've suspected/considered upon research
Failed HDD - Seems unlikely that 4 "refurbished" disks would be DOA doesn't it?
Storage Controller Issue (bad cable?) - Seems like it would affect the SSD's too?
- Kernel issue, The only change to the stock kernel was the addition of kmod-oracleasm. I really don't see how it would cause these faults, ASM isn't set up at all.
Another noteworthy event was when trying to zero the disks (part of early troubleshooting), using the command $ dd if=/dev/zero of=/dev/sdX yielded these errors,
dd: writing to ‘/dev/sdc’: Input/output error
106497+0 records in
106496+0 records out
54525952 bytes (55 MB) copied, 1.70583 s, 32.0 MB/s
dd: writing to ‘/dev/sdd’: Input/output error
106497+0 records in
106496+0 records out
54525952 bytes (55 MB) copied, 1.70417 s, 32.0 MB/s
dd: writing to ‘/dev/sde’: Input/output error
106497+0 records in
106496+0 records out
54525952 bytes (55 MB) copied, 1.71813 s, 31.7 MB/s
dd: writing to ‘/dev/sdf’: Input/output error
106497+0 records in
106496+0 records out
54525952 bytes (55 MB) copied, 1.71157 s, 31.9 MB/s
If anyone here could share some insight as to what might be causing this, I'd be grateful. I'm inclined to follow occam's razor here and go straight for the HDD's, the only doubt stems from the unlikelihood of four failed HDD's out of box.
I will be driving to the site tomorrow for a physical inspection & to report my assessment of this machine to the higher ups. If there's something I should physically inspect (beyond cables/connections/power supply) please let me know.
Thanks.
redhat hard-drive io
New contributor
I'm working on a remote server (Dell Poweredge) that was a new install. It has four drives (2TB) and 2 SSD's (250 GB). One SSD contains the OS (RHEL7) and the four mechanical disks are eventually going to contain an oracle database.
Trying to create a software RAID array led to disks constantly being marked as faulty. Checking dmesg outputs a slew of the following errors,
[127491.711407] blk_update_request: I/O error, dev sde, sector 3907026080
[127491.719699] sd 0:0:4:0: [sde] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[127491.719717] sd 0:0:4:0: [sde] Sense Key : Aborted Command [current]
[127491.719726] sd 0:0:4:0: [sde] Add. Sense: Logical block guard check failed
[127491.719734] sd 0:0:4:0: [sde] CDB: Read(32)
[127491.719742] sd 0:0:4:0: [sde] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[127491.719750] sd 0:0:4:0: [sde] CDB[10]: e8 e0 7c a0 e8 e0 7c a0 00 00 00 00 00 00 00 08
[127491.719757] blk_update_request: I/O error, dev sde, sector 3907026080
[127491.719764] Buffer I/O error on dev sde, logical block 488378260, async page read
[127497.440222] sd 0:0:5:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[127497.440240] sd 0:0:5:0: [sdf] Sense Key : Aborted Command [current]
[127497.440249] sd 0:0:5:0: [sdf] Add. Sense: Logical block guard check failed
[127497.440258] sd 0:0:5:0: [sdf] CDB: Read(32)
[127497.440266] sd 0:0:5:0: [sdf] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[127497.440273] sd 0:0:5:0: [sdf] CDB[10]: 00 01 a0 00 00 01 a0 00 00 00 00 00 00 00 00 08
[127497.440280] blk_update_request: I/O error, dev sdf, sector 106496
[127497.901432] sd 0:0:5:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[127497.901449] sd 0:0:5:0: [sdf] Sense Key : Aborted Command [current]
[127497.901458] sd 0:0:5:0: [sdf] Add. Sense: Logical block guard check failed
[127497.901467] sd 0:0:5:0: [sdf] CDB: Read(32)
[127497.901475] sd 0:0:5:0: [sdf] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[127497.901482] sd 0:0:5:0: [sdf] CDB[10]: e8 e0 7c a0 e8 e0 7c a0 00 00 00 00 00 00 00 08
[127497.901489] blk_update_request: I/O error, dev sdf, sector 3907026080
[127497.911003] sd 0:0:5:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[127497.911019] sd 0:0:5:0: [sdf] Sense Key : Aborted Command [current]
[127497.911029] sd 0:0:5:0: [sdf] Add. Sense: Logical block guard check failed
[127497.911037] sd 0:0:5:0: [sdf] CDB: Read(32)
[127497.911045] sd 0:0:5:0: [sdf] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[127497.911052] sd 0:0:5:0: [sdf] CDB[10]: e8 e0 7c a0 e8 e0 7c a0 00 00 00 00 00 00 00 08
[127497.911059] blk_update_request: I/O error, dev sdf, sector 3907026080
[127497.911067] Buffer I/O error on dev sdf, logical block 488378260, async page read
These errors occur for all of the four mechanical disks, (sdc/sdd/sde/sdf) SMARTctl passed all four disks, long and short tests. I'm currently running badblocks (write mode test ~35 hrs in, probably another 35 to go).
The following are the errors I've suspected/considered upon research
Failed HDD - Seems unlikely that 4 "refurbished" disks would be DOA doesn't it?
Storage Controller Issue (bad cable?) - Seems like it would affect the SSD's too?
- Kernel issue, The only change to the stock kernel was the addition of kmod-oracleasm. I really don't see how it would cause these faults, ASM isn't set up at all.
Another noteworthy event was when trying to zero the disks (part of early troubleshooting), using the command $ dd if=/dev/zero of=/dev/sdX yielded these errors,
dd: writing to ‘/dev/sdc’: Input/output error
106497+0 records in
106496+0 records out
54525952 bytes (55 MB) copied, 1.70583 s, 32.0 MB/s
dd: writing to ‘/dev/sdd’: Input/output error
106497+0 records in
106496+0 records out
54525952 bytes (55 MB) copied, 1.70417 s, 32.0 MB/s
dd: writing to ‘/dev/sde’: Input/output error
106497+0 records in
106496+0 records out
54525952 bytes (55 MB) copied, 1.71813 s, 31.7 MB/s
dd: writing to ‘/dev/sdf’: Input/output error
106497+0 records in
106496+0 records out
54525952 bytes (55 MB) copied, 1.71157 s, 31.9 MB/s
If anyone here could share some insight as to what might be causing this, I'd be grateful. I'm inclined to follow occam's razor here and go straight for the HDD's, the only doubt stems from the unlikelihood of four failed HDD's out of box.
I will be driving to the site tomorrow for a physical inspection & to report my assessment of this machine to the higher ups. If there's something I should physically inspect (beyond cables/connections/power supply) please let me know.
Thanks.
redhat hard-drive io
redhat hard-drive io
New contributor
New contributor
New contributor
asked Jun 17 at 11:52
Scu11yScu11y
434 bronze badges
434 bronze badges
New contributor
New contributor
When you say SMART "ok", do you just mean the overall health? Are any individual raw counters for reallocated or pending sectors non-zero? Drives don't immediately declare themselves failed on the first bad sector, even though it is unreadable. Usesmartctl -x /dev/sda
or something. But it's highly suspicious that it's the same LBA on all disks.
– Peter Cordes
Jun 18 at 6:42
add a comment |
When you say SMART "ok", do you just mean the overall health? Are any individual raw counters for reallocated or pending sectors non-zero? Drives don't immediately declare themselves failed on the first bad sector, even though it is unreadable. Usesmartctl -x /dev/sda
or something. But it's highly suspicious that it's the same LBA on all disks.
– Peter Cordes
Jun 18 at 6:42
When you say SMART "ok", do you just mean the overall health? Are any individual raw counters for reallocated or pending sectors non-zero? Drives don't immediately declare themselves failed on the first bad sector, even though it is unreadable. Use
smartctl -x /dev/sda
or something. But it's highly suspicious that it's the same LBA on all disks.– Peter Cordes
Jun 18 at 6:42
When you say SMART "ok", do you just mean the overall health? Are any individual raw counters for reallocated or pending sectors non-zero? Drives don't immediately declare themselves failed on the first bad sector, even though it is unreadable. Use
smartctl -x /dev/sda
or something. But it's highly suspicious that it's the same LBA on all disks.– Peter Cordes
Jun 18 at 6:42
add a comment |
1 Answer
1
active
oldest
votes
Your dd
tests show the four disks all failing at the same LBA address. As it is extremely improbable that four disks all fail at the exact same location, I strongly suspect it is due to controller or cabling issues.
1
It's difficult to tell without further testing. Anyway, the first think I would control/replace is the cables attaching the controller to the backplane.
– shodanshok
Jun 17 at 12:33
4
High data-rate cables, as 6/12 Gbs SATA/SAS ones, are not only about electrical continuity, but mainly about signal clearness and low noise. Try to physically clear the connectors and reseat the cables. If the error persists, try changing them and, finally, try a different controller.
– shodanshok
Jun 17 at 13:21
2
Same-LBA seems unlikely to be a cabling issue. Unless the data in that sector just happens to be some worst-case bit-sequence for some scrambling (to prevent extended runs of all-zero defeating self-clocking) or ECC over the SATA/SAS link. I'm not sure what encoding that link uses. Controller is plausible though; same LBA on each of multiple disks needs some kind of common factor explanation.
– Peter Cordes
Jun 18 at 6:40
3
@djsmiley2k It is difficult that all fourdd
ended cached on the same, failing RAM address. Moreover, PERC's DRAM is ECC protected and, while ECC RAM also fails, it is relatively uncommon. That said, the controller can be the source of the issues so, if changing cables does not help, the OP should try swapping the controller.
– shodanshok
Jun 18 at 12:55
2
Well my friends, you were right. Cables + controllers swapped and now 600GB into a dd zeroing process and no errors thus far. Looks like everything's working correctly now. Thanks again for all the knowledge you've shared. I'm always grateful to this community for your expertise and willingness to share it. :)
– Scu11y
Jun 19 at 21:27
|
show 7 more comments
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "2"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Scu11y is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f971722%2fdmesg-full-of-i-o-errors-smart-ok-four-disks-affected%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Your dd
tests show the four disks all failing at the same LBA address. As it is extremely improbable that four disks all fail at the exact same location, I strongly suspect it is due to controller or cabling issues.
1
It's difficult to tell without further testing. Anyway, the first think I would control/replace is the cables attaching the controller to the backplane.
– shodanshok
Jun 17 at 12:33
4
High data-rate cables, as 6/12 Gbs SATA/SAS ones, are not only about electrical continuity, but mainly about signal clearness and low noise. Try to physically clear the connectors and reseat the cables. If the error persists, try changing them and, finally, try a different controller.
– shodanshok
Jun 17 at 13:21
2
Same-LBA seems unlikely to be a cabling issue. Unless the data in that sector just happens to be some worst-case bit-sequence for some scrambling (to prevent extended runs of all-zero defeating self-clocking) or ECC over the SATA/SAS link. I'm not sure what encoding that link uses. Controller is plausible though; same LBA on each of multiple disks needs some kind of common factor explanation.
– Peter Cordes
Jun 18 at 6:40
3
@djsmiley2k It is difficult that all fourdd
ended cached on the same, failing RAM address. Moreover, PERC's DRAM is ECC protected and, while ECC RAM also fails, it is relatively uncommon. That said, the controller can be the source of the issues so, if changing cables does not help, the OP should try swapping the controller.
– shodanshok
Jun 18 at 12:55
2
Well my friends, you were right. Cables + controllers swapped and now 600GB into a dd zeroing process and no errors thus far. Looks like everything's working correctly now. Thanks again for all the knowledge you've shared. I'm always grateful to this community for your expertise and willingness to share it. :)
– Scu11y
Jun 19 at 21:27
|
show 7 more comments
Your dd
tests show the four disks all failing at the same LBA address. As it is extremely improbable that four disks all fail at the exact same location, I strongly suspect it is due to controller or cabling issues.
1
It's difficult to tell without further testing. Anyway, the first think I would control/replace is the cables attaching the controller to the backplane.
– shodanshok
Jun 17 at 12:33
4
High data-rate cables, as 6/12 Gbs SATA/SAS ones, are not only about electrical continuity, but mainly about signal clearness and low noise. Try to physically clear the connectors and reseat the cables. If the error persists, try changing them and, finally, try a different controller.
– shodanshok
Jun 17 at 13:21
2
Same-LBA seems unlikely to be a cabling issue. Unless the data in that sector just happens to be some worst-case bit-sequence for some scrambling (to prevent extended runs of all-zero defeating self-clocking) or ECC over the SATA/SAS link. I'm not sure what encoding that link uses. Controller is plausible though; same LBA on each of multiple disks needs some kind of common factor explanation.
– Peter Cordes
Jun 18 at 6:40
3
@djsmiley2k It is difficult that all fourdd
ended cached on the same, failing RAM address. Moreover, PERC's DRAM is ECC protected and, while ECC RAM also fails, it is relatively uncommon. That said, the controller can be the source of the issues so, if changing cables does not help, the OP should try swapping the controller.
– shodanshok
Jun 18 at 12:55
2
Well my friends, you were right. Cables + controllers swapped and now 600GB into a dd zeroing process and no errors thus far. Looks like everything's working correctly now. Thanks again for all the knowledge you've shared. I'm always grateful to this community for your expertise and willingness to share it. :)
– Scu11y
Jun 19 at 21:27
|
show 7 more comments
Your dd
tests show the four disks all failing at the same LBA address. As it is extremely improbable that four disks all fail at the exact same location, I strongly suspect it is due to controller or cabling issues.
Your dd
tests show the four disks all failing at the same LBA address. As it is extremely improbable that four disks all fail at the exact same location, I strongly suspect it is due to controller or cabling issues.
answered Jun 17 at 12:18
shodanshokshodanshok
28.2k3 gold badges52 silver badges96 bronze badges
28.2k3 gold badges52 silver badges96 bronze badges
1
It's difficult to tell without further testing. Anyway, the first think I would control/replace is the cables attaching the controller to the backplane.
– shodanshok
Jun 17 at 12:33
4
High data-rate cables, as 6/12 Gbs SATA/SAS ones, are not only about electrical continuity, but mainly about signal clearness and low noise. Try to physically clear the connectors and reseat the cables. If the error persists, try changing them and, finally, try a different controller.
– shodanshok
Jun 17 at 13:21
2
Same-LBA seems unlikely to be a cabling issue. Unless the data in that sector just happens to be some worst-case bit-sequence for some scrambling (to prevent extended runs of all-zero defeating self-clocking) or ECC over the SATA/SAS link. I'm not sure what encoding that link uses. Controller is plausible though; same LBA on each of multiple disks needs some kind of common factor explanation.
– Peter Cordes
Jun 18 at 6:40
3
@djsmiley2k It is difficult that all fourdd
ended cached on the same, failing RAM address. Moreover, PERC's DRAM is ECC protected and, while ECC RAM also fails, it is relatively uncommon. That said, the controller can be the source of the issues so, if changing cables does not help, the OP should try swapping the controller.
– shodanshok
Jun 18 at 12:55
2
Well my friends, you were right. Cables + controllers swapped and now 600GB into a dd zeroing process and no errors thus far. Looks like everything's working correctly now. Thanks again for all the knowledge you've shared. I'm always grateful to this community for your expertise and willingness to share it. :)
– Scu11y
Jun 19 at 21:27
|
show 7 more comments
1
It's difficult to tell without further testing. Anyway, the first think I would control/replace is the cables attaching the controller to the backplane.
– shodanshok
Jun 17 at 12:33
4
High data-rate cables, as 6/12 Gbs SATA/SAS ones, are not only about electrical continuity, but mainly about signal clearness and low noise. Try to physically clear the connectors and reseat the cables. If the error persists, try changing them and, finally, try a different controller.
– shodanshok
Jun 17 at 13:21
2
Same-LBA seems unlikely to be a cabling issue. Unless the data in that sector just happens to be some worst-case bit-sequence for some scrambling (to prevent extended runs of all-zero defeating self-clocking) or ECC over the SATA/SAS link. I'm not sure what encoding that link uses. Controller is plausible though; same LBA on each of multiple disks needs some kind of common factor explanation.
– Peter Cordes
Jun 18 at 6:40
3
@djsmiley2k It is difficult that all fourdd
ended cached on the same, failing RAM address. Moreover, PERC's DRAM is ECC protected and, while ECC RAM also fails, it is relatively uncommon. That said, the controller can be the source of the issues so, if changing cables does not help, the OP should try swapping the controller.
– shodanshok
Jun 18 at 12:55
2
Well my friends, you were right. Cables + controllers swapped and now 600GB into a dd zeroing process and no errors thus far. Looks like everything's working correctly now. Thanks again for all the knowledge you've shared. I'm always grateful to this community for your expertise and willingness to share it. :)
– Scu11y
Jun 19 at 21:27
1
1
It's difficult to tell without further testing. Anyway, the first think I would control/replace is the cables attaching the controller to the backplane.
– shodanshok
Jun 17 at 12:33
It's difficult to tell without further testing. Anyway, the first think I would control/replace is the cables attaching the controller to the backplane.
– shodanshok
Jun 17 at 12:33
4
4
High data-rate cables, as 6/12 Gbs SATA/SAS ones, are not only about electrical continuity, but mainly about signal clearness and low noise. Try to physically clear the connectors and reseat the cables. If the error persists, try changing them and, finally, try a different controller.
– shodanshok
Jun 17 at 13:21
High data-rate cables, as 6/12 Gbs SATA/SAS ones, are not only about electrical continuity, but mainly about signal clearness and low noise. Try to physically clear the connectors and reseat the cables. If the error persists, try changing them and, finally, try a different controller.
– shodanshok
Jun 17 at 13:21
2
2
Same-LBA seems unlikely to be a cabling issue. Unless the data in that sector just happens to be some worst-case bit-sequence for some scrambling (to prevent extended runs of all-zero defeating self-clocking) or ECC over the SATA/SAS link. I'm not sure what encoding that link uses. Controller is plausible though; same LBA on each of multiple disks needs some kind of common factor explanation.
– Peter Cordes
Jun 18 at 6:40
Same-LBA seems unlikely to be a cabling issue. Unless the data in that sector just happens to be some worst-case bit-sequence for some scrambling (to prevent extended runs of all-zero defeating self-clocking) or ECC over the SATA/SAS link. I'm not sure what encoding that link uses. Controller is plausible though; same LBA on each of multiple disks needs some kind of common factor explanation.
– Peter Cordes
Jun 18 at 6:40
3
3
@djsmiley2k It is difficult that all four
dd
ended cached on the same, failing RAM address. Moreover, PERC's DRAM is ECC protected and, while ECC RAM also fails, it is relatively uncommon. That said, the controller can be the source of the issues so, if changing cables does not help, the OP should try swapping the controller.– shodanshok
Jun 18 at 12:55
@djsmiley2k It is difficult that all four
dd
ended cached on the same, failing RAM address. Moreover, PERC's DRAM is ECC protected and, while ECC RAM also fails, it is relatively uncommon. That said, the controller can be the source of the issues so, if changing cables does not help, the OP should try swapping the controller.– shodanshok
Jun 18 at 12:55
2
2
Well my friends, you were right. Cables + controllers swapped and now 600GB into a dd zeroing process and no errors thus far. Looks like everything's working correctly now. Thanks again for all the knowledge you've shared. I'm always grateful to this community for your expertise and willingness to share it. :)
– Scu11y
Jun 19 at 21:27
Well my friends, you were right. Cables + controllers swapped and now 600GB into a dd zeroing process and no errors thus far. Looks like everything's working correctly now. Thanks again for all the knowledge you've shared. I'm always grateful to this community for your expertise and willingness to share it. :)
– Scu11y
Jun 19 at 21:27
|
show 7 more comments
Scu11y is a new contributor. Be nice, and check out our Code of Conduct.
Scu11y is a new contributor. Be nice, and check out our Code of Conduct.
Scu11y is a new contributor. Be nice, and check out our Code of Conduct.
Scu11y is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Server Fault!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f971722%2fdmesg-full-of-i-o-errors-smart-ok-four-disks-affected%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
When you say SMART "ok", do you just mean the overall health? Are any individual raw counters for reallocated or pending sectors non-zero? Drives don't immediately declare themselves failed on the first bad sector, even though it is unreadable. Use
smartctl -x /dev/sda
or something. But it's highly suspicious that it's the same LBA on all disks.– Peter Cordes
Jun 18 at 6:42