Dmesg full of I/O errors, smart ok, four disks affectedmdadm raid1 fails to resyncLinux external USB drive failure - corrupt filesystemmdadm raid1, [1/2] disks failed, safe to reboot?end_request: I/O error, dev sda, sector xxxxxxxxxMigrate SAS disks from E200i to B320i Smart ArrayDisk IO Errors when writing / Linux + Windows / HDD is OKZFS slow read speed on 8 drive 4 vdev striped mirrorsHow to find the cause of a single event file content corruption?dmesg errors not showing in ddrescueG-sense errors in new disks

Bash echo $-1 prints hb1. Why?

How can I convince my reader that I will not use a certain trope?

Why isn’t the tax system continuous rather than bracketed?

How can I check type T is among parameter pack Ts... in C++?

Signing using digital signatures?

Is there a short way to check uniqueness of values without using 'if' and multiple 'and's?

Why is a blank required between "[[" and "-e xxx" in ksh?

How to modify the uneven space between separate loop cuts, while they are already cut?

Averting Real Women Don’t Wear Dresses

What are good ways to spray paint a QR code on a footpath?

Why won't the ground take my seed?

How well known and how commonly used was Huffman coding in 1979?

How would a order of Monks that renounce their names communicate effectively?

Anagram Within an Anagram!

How can I create ribbons like these in Microsoft word 2010?

What's the point of DHS warning passengers about Manila airport?

SPI Waveform on Raspberry Pi Not clean and I'm wondering why

What does 2>&1 | tee mean?

Math PhD in US vs Master + PhD in Europe

“Transitive verb” + interrupter+ “object”?

Why is Madam Hooch not a professor?

Should I report a leak of confidential HR information?

A player is constantly pestering me about rules, what do I do as a DM?

One folder two different locations on ubuntu 18.04



Dmesg full of I/O errors, smart ok, four disks affected


mdadm raid1 fails to resyncLinux external USB drive failure - corrupt filesystemmdadm raid1, [1/2] disks failed, safe to reboot?end_request: I/O error, dev sda, sector xxxxxxxxxMigrate SAS disks from E200i to B320i Smart ArrayDisk IO Errors when writing / Linux + Windows / HDD is OKZFS slow read speed on 8 drive 4 vdev striped mirrorsHow to find the cause of a single event file content corruption?dmesg errors not showing in ddrescueG-sense errors in new disks






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








8















I'm working on a remote server (Dell Poweredge) that was a new install. It has four drives (2TB) and 2 SSD's (250 GB). One SSD contains the OS (RHEL7) and the four mechanical disks are eventually going to contain an oracle database.



Trying to create a software RAID array led to disks constantly being marked as faulty. Checking dmesg outputs a slew of the following errors,



[127491.711407] blk_update_request: I/O error, dev sde, sector 3907026080
[127491.719699] sd 0:0:4:0: [sde] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[127491.719717] sd 0:0:4:0: [sde] Sense Key : Aborted Command [current]
[127491.719726] sd 0:0:4:0: [sde] Add. Sense: Logical block guard check failed
[127491.719734] sd 0:0:4:0: [sde] CDB: Read(32)
[127491.719742] sd 0:0:4:0: [sde] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[127491.719750] sd 0:0:4:0: [sde] CDB[10]: e8 e0 7c a0 e8 e0 7c a0 00 00 00 00 00 00 00 08
[127491.719757] blk_update_request: I/O error, dev sde, sector 3907026080
[127491.719764] Buffer I/O error on dev sde, logical block 488378260, async page read
[127497.440222] sd 0:0:5:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[127497.440240] sd 0:0:5:0: [sdf] Sense Key : Aborted Command [current]
[127497.440249] sd 0:0:5:0: [sdf] Add. Sense: Logical block guard check failed
[127497.440258] sd 0:0:5:0: [sdf] CDB: Read(32)
[127497.440266] sd 0:0:5:0: [sdf] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[127497.440273] sd 0:0:5:0: [sdf] CDB[10]: 00 01 a0 00 00 01 a0 00 00 00 00 00 00 00 00 08
[127497.440280] blk_update_request: I/O error, dev sdf, sector 106496
[127497.901432] sd 0:0:5:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[127497.901449] sd 0:0:5:0: [sdf] Sense Key : Aborted Command [current]
[127497.901458] sd 0:0:5:0: [sdf] Add. Sense: Logical block guard check failed
[127497.901467] sd 0:0:5:0: [sdf] CDB: Read(32)
[127497.901475] sd 0:0:5:0: [sdf] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[127497.901482] sd 0:0:5:0: [sdf] CDB[10]: e8 e0 7c a0 e8 e0 7c a0 00 00 00 00 00 00 00 08
[127497.901489] blk_update_request: I/O error, dev sdf, sector 3907026080
[127497.911003] sd 0:0:5:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[127497.911019] sd 0:0:5:0: [sdf] Sense Key : Aborted Command [current]
[127497.911029] sd 0:0:5:0: [sdf] Add. Sense: Logical block guard check failed
[127497.911037] sd 0:0:5:0: [sdf] CDB: Read(32)
[127497.911045] sd 0:0:5:0: [sdf] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[127497.911052] sd 0:0:5:0: [sdf] CDB[10]: e8 e0 7c a0 e8 e0 7c a0 00 00 00 00 00 00 00 08
[127497.911059] blk_update_request: I/O error, dev sdf, sector 3907026080
[127497.911067] Buffer I/O error on dev sdf, logical block 488378260, async page read


These errors occur for all of the four mechanical disks, (sdc/sdd/sde/sdf) SMARTctl passed all four disks, long and short tests. I'm currently running badblocks (write mode test ~35 hrs in, probably another 35 to go).



The following are the errors I've suspected/considered upon research



  • Failed HDD - Seems unlikely that 4 "refurbished" disks would be DOA doesn't it?



  • Storage Controller Issue (bad cable?) - Seems like it would affect the SSD's too?



    • Kernel issue, The only change to the stock kernel was the addition of kmod-oracleasm. I really don't see how it would cause these faults, ASM isn't set up at all.


Another noteworthy event was when trying to zero the disks (part of early troubleshooting), using the command $ dd if=/dev/zero of=/dev/sdX yielded these errors,



dd: writing to ‘/dev/sdc’: Input/output error
106497+0 records in
106496+0 records out
54525952 bytes (55 MB) copied, 1.70583 s, 32.0 MB/s
dd: writing to ‘/dev/sdd’: Input/output error
106497+0 records in
106496+0 records out
54525952 bytes (55 MB) copied, 1.70417 s, 32.0 MB/s
dd: writing to ‘/dev/sde’: Input/output error
106497+0 records in
106496+0 records out
54525952 bytes (55 MB) copied, 1.71813 s, 31.7 MB/s
dd: writing to ‘/dev/sdf’: Input/output error
106497+0 records in
106496+0 records out
54525952 bytes (55 MB) copied, 1.71157 s, 31.9 MB/s


If anyone here could share some insight as to what might be causing this, I'd be grateful. I'm inclined to follow occam's razor here and go straight for the HDD's, the only doubt stems from the unlikelihood of four failed HDD's out of box.



I will be driving to the site tomorrow for a physical inspection & to report my assessment of this machine to the higher ups. If there's something I should physically inspect (beyond cables/connections/power supply) please let me know.



Thanks.










share|improve this question







New contributor



Scu11y is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



















  • When you say SMART "ok", do you just mean the overall health? Are any individual raw counters for reallocated or pending sectors non-zero? Drives don't immediately declare themselves failed on the first bad sector, even though it is unreadable. Use smartctl -x /dev/sda or something. But it's highly suspicious that it's the same LBA on all disks.

    – Peter Cordes
    Jun 18 at 6:42


















8















I'm working on a remote server (Dell Poweredge) that was a new install. It has four drives (2TB) and 2 SSD's (250 GB). One SSD contains the OS (RHEL7) and the four mechanical disks are eventually going to contain an oracle database.



Trying to create a software RAID array led to disks constantly being marked as faulty. Checking dmesg outputs a slew of the following errors,



[127491.711407] blk_update_request: I/O error, dev sde, sector 3907026080
[127491.719699] sd 0:0:4:0: [sde] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[127491.719717] sd 0:0:4:0: [sde] Sense Key : Aborted Command [current]
[127491.719726] sd 0:0:4:0: [sde] Add. Sense: Logical block guard check failed
[127491.719734] sd 0:0:4:0: [sde] CDB: Read(32)
[127491.719742] sd 0:0:4:0: [sde] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[127491.719750] sd 0:0:4:0: [sde] CDB[10]: e8 e0 7c a0 e8 e0 7c a0 00 00 00 00 00 00 00 08
[127491.719757] blk_update_request: I/O error, dev sde, sector 3907026080
[127491.719764] Buffer I/O error on dev sde, logical block 488378260, async page read
[127497.440222] sd 0:0:5:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[127497.440240] sd 0:0:5:0: [sdf] Sense Key : Aborted Command [current]
[127497.440249] sd 0:0:5:0: [sdf] Add. Sense: Logical block guard check failed
[127497.440258] sd 0:0:5:0: [sdf] CDB: Read(32)
[127497.440266] sd 0:0:5:0: [sdf] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[127497.440273] sd 0:0:5:0: [sdf] CDB[10]: 00 01 a0 00 00 01 a0 00 00 00 00 00 00 00 00 08
[127497.440280] blk_update_request: I/O error, dev sdf, sector 106496
[127497.901432] sd 0:0:5:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[127497.901449] sd 0:0:5:0: [sdf] Sense Key : Aborted Command [current]
[127497.901458] sd 0:0:5:0: [sdf] Add. Sense: Logical block guard check failed
[127497.901467] sd 0:0:5:0: [sdf] CDB: Read(32)
[127497.901475] sd 0:0:5:0: [sdf] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[127497.901482] sd 0:0:5:0: [sdf] CDB[10]: e8 e0 7c a0 e8 e0 7c a0 00 00 00 00 00 00 00 08
[127497.901489] blk_update_request: I/O error, dev sdf, sector 3907026080
[127497.911003] sd 0:0:5:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[127497.911019] sd 0:0:5:0: [sdf] Sense Key : Aborted Command [current]
[127497.911029] sd 0:0:5:0: [sdf] Add. Sense: Logical block guard check failed
[127497.911037] sd 0:0:5:0: [sdf] CDB: Read(32)
[127497.911045] sd 0:0:5:0: [sdf] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[127497.911052] sd 0:0:5:0: [sdf] CDB[10]: e8 e0 7c a0 e8 e0 7c a0 00 00 00 00 00 00 00 08
[127497.911059] blk_update_request: I/O error, dev sdf, sector 3907026080
[127497.911067] Buffer I/O error on dev sdf, logical block 488378260, async page read


These errors occur for all of the four mechanical disks, (sdc/sdd/sde/sdf) SMARTctl passed all four disks, long and short tests. I'm currently running badblocks (write mode test ~35 hrs in, probably another 35 to go).



The following are the errors I've suspected/considered upon research



  • Failed HDD - Seems unlikely that 4 "refurbished" disks would be DOA doesn't it?



  • Storage Controller Issue (bad cable?) - Seems like it would affect the SSD's too?



    • Kernel issue, The only change to the stock kernel was the addition of kmod-oracleasm. I really don't see how it would cause these faults, ASM isn't set up at all.


Another noteworthy event was when trying to zero the disks (part of early troubleshooting), using the command $ dd if=/dev/zero of=/dev/sdX yielded these errors,



dd: writing to ‘/dev/sdc’: Input/output error
106497+0 records in
106496+0 records out
54525952 bytes (55 MB) copied, 1.70583 s, 32.0 MB/s
dd: writing to ‘/dev/sdd’: Input/output error
106497+0 records in
106496+0 records out
54525952 bytes (55 MB) copied, 1.70417 s, 32.0 MB/s
dd: writing to ‘/dev/sde’: Input/output error
106497+0 records in
106496+0 records out
54525952 bytes (55 MB) copied, 1.71813 s, 31.7 MB/s
dd: writing to ‘/dev/sdf’: Input/output error
106497+0 records in
106496+0 records out
54525952 bytes (55 MB) copied, 1.71157 s, 31.9 MB/s


If anyone here could share some insight as to what might be causing this, I'd be grateful. I'm inclined to follow occam's razor here and go straight for the HDD's, the only doubt stems from the unlikelihood of four failed HDD's out of box.



I will be driving to the site tomorrow for a physical inspection & to report my assessment of this machine to the higher ups. If there's something I should physically inspect (beyond cables/connections/power supply) please let me know.



Thanks.










share|improve this question







New contributor



Scu11y is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



















  • When you say SMART "ok", do you just mean the overall health? Are any individual raw counters for reallocated or pending sectors non-zero? Drives don't immediately declare themselves failed on the first bad sector, even though it is unreadable. Use smartctl -x /dev/sda or something. But it's highly suspicious that it's the same LBA on all disks.

    – Peter Cordes
    Jun 18 at 6:42














8












8








8








I'm working on a remote server (Dell Poweredge) that was a new install. It has four drives (2TB) and 2 SSD's (250 GB). One SSD contains the OS (RHEL7) and the four mechanical disks are eventually going to contain an oracle database.



Trying to create a software RAID array led to disks constantly being marked as faulty. Checking dmesg outputs a slew of the following errors,



[127491.711407] blk_update_request: I/O error, dev sde, sector 3907026080
[127491.719699] sd 0:0:4:0: [sde] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[127491.719717] sd 0:0:4:0: [sde] Sense Key : Aborted Command [current]
[127491.719726] sd 0:0:4:0: [sde] Add. Sense: Logical block guard check failed
[127491.719734] sd 0:0:4:0: [sde] CDB: Read(32)
[127491.719742] sd 0:0:4:0: [sde] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[127491.719750] sd 0:0:4:0: [sde] CDB[10]: e8 e0 7c a0 e8 e0 7c a0 00 00 00 00 00 00 00 08
[127491.719757] blk_update_request: I/O error, dev sde, sector 3907026080
[127491.719764] Buffer I/O error on dev sde, logical block 488378260, async page read
[127497.440222] sd 0:0:5:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[127497.440240] sd 0:0:5:0: [sdf] Sense Key : Aborted Command [current]
[127497.440249] sd 0:0:5:0: [sdf] Add. Sense: Logical block guard check failed
[127497.440258] sd 0:0:5:0: [sdf] CDB: Read(32)
[127497.440266] sd 0:0:5:0: [sdf] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[127497.440273] sd 0:0:5:0: [sdf] CDB[10]: 00 01 a0 00 00 01 a0 00 00 00 00 00 00 00 00 08
[127497.440280] blk_update_request: I/O error, dev sdf, sector 106496
[127497.901432] sd 0:0:5:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[127497.901449] sd 0:0:5:0: [sdf] Sense Key : Aborted Command [current]
[127497.901458] sd 0:0:5:0: [sdf] Add. Sense: Logical block guard check failed
[127497.901467] sd 0:0:5:0: [sdf] CDB: Read(32)
[127497.901475] sd 0:0:5:0: [sdf] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[127497.901482] sd 0:0:5:0: [sdf] CDB[10]: e8 e0 7c a0 e8 e0 7c a0 00 00 00 00 00 00 00 08
[127497.901489] blk_update_request: I/O error, dev sdf, sector 3907026080
[127497.911003] sd 0:0:5:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[127497.911019] sd 0:0:5:0: [sdf] Sense Key : Aborted Command [current]
[127497.911029] sd 0:0:5:0: [sdf] Add. Sense: Logical block guard check failed
[127497.911037] sd 0:0:5:0: [sdf] CDB: Read(32)
[127497.911045] sd 0:0:5:0: [sdf] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[127497.911052] sd 0:0:5:0: [sdf] CDB[10]: e8 e0 7c a0 e8 e0 7c a0 00 00 00 00 00 00 00 08
[127497.911059] blk_update_request: I/O error, dev sdf, sector 3907026080
[127497.911067] Buffer I/O error on dev sdf, logical block 488378260, async page read


These errors occur for all of the four mechanical disks, (sdc/sdd/sde/sdf) SMARTctl passed all four disks, long and short tests. I'm currently running badblocks (write mode test ~35 hrs in, probably another 35 to go).



The following are the errors I've suspected/considered upon research



  • Failed HDD - Seems unlikely that 4 "refurbished" disks would be DOA doesn't it?



  • Storage Controller Issue (bad cable?) - Seems like it would affect the SSD's too?



    • Kernel issue, The only change to the stock kernel was the addition of kmod-oracleasm. I really don't see how it would cause these faults, ASM isn't set up at all.


Another noteworthy event was when trying to zero the disks (part of early troubleshooting), using the command $ dd if=/dev/zero of=/dev/sdX yielded these errors,



dd: writing to ‘/dev/sdc’: Input/output error
106497+0 records in
106496+0 records out
54525952 bytes (55 MB) copied, 1.70583 s, 32.0 MB/s
dd: writing to ‘/dev/sdd’: Input/output error
106497+0 records in
106496+0 records out
54525952 bytes (55 MB) copied, 1.70417 s, 32.0 MB/s
dd: writing to ‘/dev/sde’: Input/output error
106497+0 records in
106496+0 records out
54525952 bytes (55 MB) copied, 1.71813 s, 31.7 MB/s
dd: writing to ‘/dev/sdf’: Input/output error
106497+0 records in
106496+0 records out
54525952 bytes (55 MB) copied, 1.71157 s, 31.9 MB/s


If anyone here could share some insight as to what might be causing this, I'd be grateful. I'm inclined to follow occam's razor here and go straight for the HDD's, the only doubt stems from the unlikelihood of four failed HDD's out of box.



I will be driving to the site tomorrow for a physical inspection & to report my assessment of this machine to the higher ups. If there's something I should physically inspect (beyond cables/connections/power supply) please let me know.



Thanks.










share|improve this question







New contributor



Scu11y is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I'm working on a remote server (Dell Poweredge) that was a new install. It has four drives (2TB) and 2 SSD's (250 GB). One SSD contains the OS (RHEL7) and the four mechanical disks are eventually going to contain an oracle database.



Trying to create a software RAID array led to disks constantly being marked as faulty. Checking dmesg outputs a slew of the following errors,



[127491.711407] blk_update_request: I/O error, dev sde, sector 3907026080
[127491.719699] sd 0:0:4:0: [sde] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[127491.719717] sd 0:0:4:0: [sde] Sense Key : Aborted Command [current]
[127491.719726] sd 0:0:4:0: [sde] Add. Sense: Logical block guard check failed
[127491.719734] sd 0:0:4:0: [sde] CDB: Read(32)
[127491.719742] sd 0:0:4:0: [sde] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[127491.719750] sd 0:0:4:0: [sde] CDB[10]: e8 e0 7c a0 e8 e0 7c a0 00 00 00 00 00 00 00 08
[127491.719757] blk_update_request: I/O error, dev sde, sector 3907026080
[127491.719764] Buffer I/O error on dev sde, logical block 488378260, async page read
[127497.440222] sd 0:0:5:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[127497.440240] sd 0:0:5:0: [sdf] Sense Key : Aborted Command [current]
[127497.440249] sd 0:0:5:0: [sdf] Add. Sense: Logical block guard check failed
[127497.440258] sd 0:0:5:0: [sdf] CDB: Read(32)
[127497.440266] sd 0:0:5:0: [sdf] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[127497.440273] sd 0:0:5:0: [sdf] CDB[10]: 00 01 a0 00 00 01 a0 00 00 00 00 00 00 00 00 08
[127497.440280] blk_update_request: I/O error, dev sdf, sector 106496
[127497.901432] sd 0:0:5:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[127497.901449] sd 0:0:5:0: [sdf] Sense Key : Aborted Command [current]
[127497.901458] sd 0:0:5:0: [sdf] Add. Sense: Logical block guard check failed
[127497.901467] sd 0:0:5:0: [sdf] CDB: Read(32)
[127497.901475] sd 0:0:5:0: [sdf] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[127497.901482] sd 0:0:5:0: [sdf] CDB[10]: e8 e0 7c a0 e8 e0 7c a0 00 00 00 00 00 00 00 08
[127497.901489] blk_update_request: I/O error, dev sdf, sector 3907026080
[127497.911003] sd 0:0:5:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[127497.911019] sd 0:0:5:0: [sdf] Sense Key : Aborted Command [current]
[127497.911029] sd 0:0:5:0: [sdf] Add. Sense: Logical block guard check failed
[127497.911037] sd 0:0:5:0: [sdf] CDB: Read(32)
[127497.911045] sd 0:0:5:0: [sdf] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[127497.911052] sd 0:0:5:0: [sdf] CDB[10]: e8 e0 7c a0 e8 e0 7c a0 00 00 00 00 00 00 00 08
[127497.911059] blk_update_request: I/O error, dev sdf, sector 3907026080
[127497.911067] Buffer I/O error on dev sdf, logical block 488378260, async page read


These errors occur for all of the four mechanical disks, (sdc/sdd/sde/sdf) SMARTctl passed all four disks, long and short tests. I'm currently running badblocks (write mode test ~35 hrs in, probably another 35 to go).



The following are the errors I've suspected/considered upon research



  • Failed HDD - Seems unlikely that 4 "refurbished" disks would be DOA doesn't it?



  • Storage Controller Issue (bad cable?) - Seems like it would affect the SSD's too?



    • Kernel issue, The only change to the stock kernel was the addition of kmod-oracleasm. I really don't see how it would cause these faults, ASM isn't set up at all.


Another noteworthy event was when trying to zero the disks (part of early troubleshooting), using the command $ dd if=/dev/zero of=/dev/sdX yielded these errors,



dd: writing to ‘/dev/sdc’: Input/output error
106497+0 records in
106496+0 records out
54525952 bytes (55 MB) copied, 1.70583 s, 32.0 MB/s
dd: writing to ‘/dev/sdd’: Input/output error
106497+0 records in
106496+0 records out
54525952 bytes (55 MB) copied, 1.70417 s, 32.0 MB/s
dd: writing to ‘/dev/sde’: Input/output error
106497+0 records in
106496+0 records out
54525952 bytes (55 MB) copied, 1.71813 s, 31.7 MB/s
dd: writing to ‘/dev/sdf’: Input/output error
106497+0 records in
106496+0 records out
54525952 bytes (55 MB) copied, 1.71157 s, 31.9 MB/s


If anyone here could share some insight as to what might be causing this, I'd be grateful. I'm inclined to follow occam's razor here and go straight for the HDD's, the only doubt stems from the unlikelihood of four failed HDD's out of box.



I will be driving to the site tomorrow for a physical inspection & to report my assessment of this machine to the higher ups. If there's something I should physically inspect (beyond cables/connections/power supply) please let me know.



Thanks.







redhat hard-drive io






share|improve this question







New contributor



Scu11y is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.










share|improve this question







New contributor



Scu11y is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








share|improve this question




share|improve this question






New contributor



Scu11y is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








asked Jun 17 at 11:52









Scu11yScu11y

434 bronze badges




434 bronze badges




New contributor



Scu11y is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




New contributor




Scu11y is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.














  • When you say SMART "ok", do you just mean the overall health? Are any individual raw counters for reallocated or pending sectors non-zero? Drives don't immediately declare themselves failed on the first bad sector, even though it is unreadable. Use smartctl -x /dev/sda or something. But it's highly suspicious that it's the same LBA on all disks.

    – Peter Cordes
    Jun 18 at 6:42


















  • When you say SMART "ok", do you just mean the overall health? Are any individual raw counters for reallocated or pending sectors non-zero? Drives don't immediately declare themselves failed on the first bad sector, even though it is unreadable. Use smartctl -x /dev/sda or something. But it's highly suspicious that it's the same LBA on all disks.

    – Peter Cordes
    Jun 18 at 6:42

















When you say SMART "ok", do you just mean the overall health? Are any individual raw counters for reallocated or pending sectors non-zero? Drives don't immediately declare themselves failed on the first bad sector, even though it is unreadable. Use smartctl -x /dev/sda or something. But it's highly suspicious that it's the same LBA on all disks.

– Peter Cordes
Jun 18 at 6:42






When you say SMART "ok", do you just mean the overall health? Are any individual raw counters for reallocated or pending sectors non-zero? Drives don't immediately declare themselves failed on the first bad sector, even though it is unreadable. Use smartctl -x /dev/sda or something. But it's highly suspicious that it's the same LBA on all disks.

– Peter Cordes
Jun 18 at 6:42











1 Answer
1






active

oldest

votes


















14














Your dd tests show the four disks all failing at the same LBA address. As it is extremely improbable that four disks all fail at the exact same location, I strongly suspect it is due to controller or cabling issues.






share|improve this answer


















  • 1





    It's difficult to tell without further testing. Anyway, the first think I would control/replace is the cables attaching the controller to the backplane.

    – shodanshok
    Jun 17 at 12:33






  • 4





    High data-rate cables, as 6/12 Gbs SATA/SAS ones, are not only about electrical continuity, but mainly about signal clearness and low noise. Try to physically clear the connectors and reseat the cables. If the error persists, try changing them and, finally, try a different controller.

    – shodanshok
    Jun 17 at 13:21







  • 2





    Same-LBA seems unlikely to be a cabling issue. Unless the data in that sector just happens to be some worst-case bit-sequence for some scrambling (to prevent extended runs of all-zero defeating self-clocking) or ECC over the SATA/SAS link. I'm not sure what encoding that link uses. Controller is plausible though; same LBA on each of multiple disks needs some kind of common factor explanation.

    – Peter Cordes
    Jun 18 at 6:40







  • 3





    @djsmiley2k It is difficult that all four ddended cached on the same, failing RAM address. Moreover, PERC's DRAM is ECC protected and, while ECC RAM also fails, it is relatively uncommon. That said, the controller can be the source of the issues so, if changing cables does not help, the OP should try swapping the controller.

    – shodanshok
    Jun 18 at 12:55







  • 2





    Well my friends, you were right. Cables + controllers swapped and now 600GB into a dd zeroing process and no errors thus far. Looks like everything's working correctly now. Thanks again for all the knowledge you've shared. I'm always grateful to this community for your expertise and willingness to share it. :)

    – Scu11y
    Jun 19 at 21:27













Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "2"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);






Scu11y is a new contributor. Be nice, and check out our Code of Conduct.









draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f971722%2fdmesg-full-of-i-o-errors-smart-ok-four-disks-affected%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









14














Your dd tests show the four disks all failing at the same LBA address. As it is extremely improbable that four disks all fail at the exact same location, I strongly suspect it is due to controller or cabling issues.






share|improve this answer


















  • 1





    It's difficult to tell without further testing. Anyway, the first think I would control/replace is the cables attaching the controller to the backplane.

    – shodanshok
    Jun 17 at 12:33






  • 4





    High data-rate cables, as 6/12 Gbs SATA/SAS ones, are not only about electrical continuity, but mainly about signal clearness and low noise. Try to physically clear the connectors and reseat the cables. If the error persists, try changing them and, finally, try a different controller.

    – shodanshok
    Jun 17 at 13:21







  • 2





    Same-LBA seems unlikely to be a cabling issue. Unless the data in that sector just happens to be some worst-case bit-sequence for some scrambling (to prevent extended runs of all-zero defeating self-clocking) or ECC over the SATA/SAS link. I'm not sure what encoding that link uses. Controller is plausible though; same LBA on each of multiple disks needs some kind of common factor explanation.

    – Peter Cordes
    Jun 18 at 6:40







  • 3





    @djsmiley2k It is difficult that all four ddended cached on the same, failing RAM address. Moreover, PERC's DRAM is ECC protected and, while ECC RAM also fails, it is relatively uncommon. That said, the controller can be the source of the issues so, if changing cables does not help, the OP should try swapping the controller.

    – shodanshok
    Jun 18 at 12:55







  • 2





    Well my friends, you were right. Cables + controllers swapped and now 600GB into a dd zeroing process and no errors thus far. Looks like everything's working correctly now. Thanks again for all the knowledge you've shared. I'm always grateful to this community for your expertise and willingness to share it. :)

    – Scu11y
    Jun 19 at 21:27















14














Your dd tests show the four disks all failing at the same LBA address. As it is extremely improbable that four disks all fail at the exact same location, I strongly suspect it is due to controller or cabling issues.






share|improve this answer


















  • 1





    It's difficult to tell without further testing. Anyway, the first think I would control/replace is the cables attaching the controller to the backplane.

    – shodanshok
    Jun 17 at 12:33






  • 4





    High data-rate cables, as 6/12 Gbs SATA/SAS ones, are not only about electrical continuity, but mainly about signal clearness and low noise. Try to physically clear the connectors and reseat the cables. If the error persists, try changing them and, finally, try a different controller.

    – shodanshok
    Jun 17 at 13:21







  • 2





    Same-LBA seems unlikely to be a cabling issue. Unless the data in that sector just happens to be some worst-case bit-sequence for some scrambling (to prevent extended runs of all-zero defeating self-clocking) or ECC over the SATA/SAS link. I'm not sure what encoding that link uses. Controller is plausible though; same LBA on each of multiple disks needs some kind of common factor explanation.

    – Peter Cordes
    Jun 18 at 6:40







  • 3





    @djsmiley2k It is difficult that all four ddended cached on the same, failing RAM address. Moreover, PERC's DRAM is ECC protected and, while ECC RAM also fails, it is relatively uncommon. That said, the controller can be the source of the issues so, if changing cables does not help, the OP should try swapping the controller.

    – shodanshok
    Jun 18 at 12:55







  • 2





    Well my friends, you were right. Cables + controllers swapped and now 600GB into a dd zeroing process and no errors thus far. Looks like everything's working correctly now. Thanks again for all the knowledge you've shared. I'm always grateful to this community for your expertise and willingness to share it. :)

    – Scu11y
    Jun 19 at 21:27













14












14








14







Your dd tests show the four disks all failing at the same LBA address. As it is extremely improbable that four disks all fail at the exact same location, I strongly suspect it is due to controller or cabling issues.






share|improve this answer













Your dd tests show the four disks all failing at the same LBA address. As it is extremely improbable that four disks all fail at the exact same location, I strongly suspect it is due to controller or cabling issues.







share|improve this answer












share|improve this answer



share|improve this answer










answered Jun 17 at 12:18









shodanshokshodanshok

28.2k3 gold badges52 silver badges96 bronze badges




28.2k3 gold badges52 silver badges96 bronze badges







  • 1





    It's difficult to tell without further testing. Anyway, the first think I would control/replace is the cables attaching the controller to the backplane.

    – shodanshok
    Jun 17 at 12:33






  • 4





    High data-rate cables, as 6/12 Gbs SATA/SAS ones, are not only about electrical continuity, but mainly about signal clearness and low noise. Try to physically clear the connectors and reseat the cables. If the error persists, try changing them and, finally, try a different controller.

    – shodanshok
    Jun 17 at 13:21







  • 2





    Same-LBA seems unlikely to be a cabling issue. Unless the data in that sector just happens to be some worst-case bit-sequence for some scrambling (to prevent extended runs of all-zero defeating self-clocking) or ECC over the SATA/SAS link. I'm not sure what encoding that link uses. Controller is plausible though; same LBA on each of multiple disks needs some kind of common factor explanation.

    – Peter Cordes
    Jun 18 at 6:40







  • 3





    @djsmiley2k It is difficult that all four ddended cached on the same, failing RAM address. Moreover, PERC's DRAM is ECC protected and, while ECC RAM also fails, it is relatively uncommon. That said, the controller can be the source of the issues so, if changing cables does not help, the OP should try swapping the controller.

    – shodanshok
    Jun 18 at 12:55







  • 2





    Well my friends, you were right. Cables + controllers swapped and now 600GB into a dd zeroing process and no errors thus far. Looks like everything's working correctly now. Thanks again for all the knowledge you've shared. I'm always grateful to this community for your expertise and willingness to share it. :)

    – Scu11y
    Jun 19 at 21:27












  • 1





    It's difficult to tell without further testing. Anyway, the first think I would control/replace is the cables attaching the controller to the backplane.

    – shodanshok
    Jun 17 at 12:33






  • 4





    High data-rate cables, as 6/12 Gbs SATA/SAS ones, are not only about electrical continuity, but mainly about signal clearness and low noise. Try to physically clear the connectors and reseat the cables. If the error persists, try changing them and, finally, try a different controller.

    – shodanshok
    Jun 17 at 13:21







  • 2





    Same-LBA seems unlikely to be a cabling issue. Unless the data in that sector just happens to be some worst-case bit-sequence for some scrambling (to prevent extended runs of all-zero defeating self-clocking) or ECC over the SATA/SAS link. I'm not sure what encoding that link uses. Controller is plausible though; same LBA on each of multiple disks needs some kind of common factor explanation.

    – Peter Cordes
    Jun 18 at 6:40







  • 3





    @djsmiley2k It is difficult that all four ddended cached on the same, failing RAM address. Moreover, PERC's DRAM is ECC protected and, while ECC RAM also fails, it is relatively uncommon. That said, the controller can be the source of the issues so, if changing cables does not help, the OP should try swapping the controller.

    – shodanshok
    Jun 18 at 12:55







  • 2





    Well my friends, you were right. Cables + controllers swapped and now 600GB into a dd zeroing process and no errors thus far. Looks like everything's working correctly now. Thanks again for all the knowledge you've shared. I'm always grateful to this community for your expertise and willingness to share it. :)

    – Scu11y
    Jun 19 at 21:27







1




1





It's difficult to tell without further testing. Anyway, the first think I would control/replace is the cables attaching the controller to the backplane.

– shodanshok
Jun 17 at 12:33





It's difficult to tell without further testing. Anyway, the first think I would control/replace is the cables attaching the controller to the backplane.

– shodanshok
Jun 17 at 12:33




4




4





High data-rate cables, as 6/12 Gbs SATA/SAS ones, are not only about electrical continuity, but mainly about signal clearness and low noise. Try to physically clear the connectors and reseat the cables. If the error persists, try changing them and, finally, try a different controller.

– shodanshok
Jun 17 at 13:21






High data-rate cables, as 6/12 Gbs SATA/SAS ones, are not only about electrical continuity, but mainly about signal clearness and low noise. Try to physically clear the connectors and reseat the cables. If the error persists, try changing them and, finally, try a different controller.

– shodanshok
Jun 17 at 13:21





2




2





Same-LBA seems unlikely to be a cabling issue. Unless the data in that sector just happens to be some worst-case bit-sequence for some scrambling (to prevent extended runs of all-zero defeating self-clocking) or ECC over the SATA/SAS link. I'm not sure what encoding that link uses. Controller is plausible though; same LBA on each of multiple disks needs some kind of common factor explanation.

– Peter Cordes
Jun 18 at 6:40






Same-LBA seems unlikely to be a cabling issue. Unless the data in that sector just happens to be some worst-case bit-sequence for some scrambling (to prevent extended runs of all-zero defeating self-clocking) or ECC over the SATA/SAS link. I'm not sure what encoding that link uses. Controller is plausible though; same LBA on each of multiple disks needs some kind of common factor explanation.

– Peter Cordes
Jun 18 at 6:40





3




3





@djsmiley2k It is difficult that all four ddended cached on the same, failing RAM address. Moreover, PERC's DRAM is ECC protected and, while ECC RAM also fails, it is relatively uncommon. That said, the controller can be the source of the issues so, if changing cables does not help, the OP should try swapping the controller.

– shodanshok
Jun 18 at 12:55






@djsmiley2k It is difficult that all four ddended cached on the same, failing RAM address. Moreover, PERC's DRAM is ECC protected and, while ECC RAM also fails, it is relatively uncommon. That said, the controller can be the source of the issues so, if changing cables does not help, the OP should try swapping the controller.

– shodanshok
Jun 18 at 12:55





2




2





Well my friends, you were right. Cables + controllers swapped and now 600GB into a dd zeroing process and no errors thus far. Looks like everything's working correctly now. Thanks again for all the knowledge you've shared. I'm always grateful to this community for your expertise and willingness to share it. :)

– Scu11y
Jun 19 at 21:27





Well my friends, you were right. Cables + controllers swapped and now 600GB into a dd zeroing process and no errors thus far. Looks like everything's working correctly now. Thanks again for all the knowledge you've shared. I'm always grateful to this community for your expertise and willingness to share it. :)

– Scu11y
Jun 19 at 21:27










Scu11y is a new contributor. Be nice, and check out our Code of Conduct.









draft saved

draft discarded


















Scu11y is a new contributor. Be nice, and check out our Code of Conduct.












Scu11y is a new contributor. Be nice, and check out our Code of Conduct.











Scu11y is a new contributor. Be nice, and check out our Code of Conduct.














Thanks for contributing an answer to Server Fault!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f971722%2fdmesg-full-of-i-o-errors-smart-ok-four-disks-affected%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Category:9 (number) SubcategoriesMedia in category "9 (number)"Navigation menuUpload mediaGND ID: 4485639-8Library of Congress authority ID: sh85091979ReasonatorScholiaStatistics

Circuit construction for execution of conditional statements using least significant bitHow are two different registers being used as “control”?How exactly is the stated composite state of the two registers being produced using the $R_zz$ controlled rotations?Efficiently performing controlled rotations in HHLWould this quantum algorithm implementation work?How to prepare a superposed states of odd integers from $1$ to $sqrtN$?Why is this implementation of the order finding algorithm not working?Circuit construction for Hamiltonian simulationHow can I invert the least significant bit of a certain term of a superposed state?Implementing an oracleImplementing a controlled sum operation

Magento 2 “No Payment Methods” in Admin New OrderHow to integrate Paypal Express Checkout with the Magento APIMagento 1.5 - Sales > Order > edit order and shipping methods disappearAuto Invoice Check/Money Order Payment methodAdd more simple payment methods?Shipping methods not showingWhat should I do to change payment methods if changing the configuration has no effects?1.9 - No Payment Methods showing upMy Payment Methods not Showing for downloadable/virtual product when checkout?Magento2 API to access internal payment methodHow to call an existing payment methods in the registration form?