mercredi 31 décembre 2014

Hard-drive errors


My /home file system is JFS, it got to RO mode several times already, so I had to reboot/remount it. I saw this at '/var/log/messages`:



Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925711] ata2.00: configured for UDMA/133
Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925755] sd 1:0:0:0: [sda] Unhandled sense code
Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925759] sd 1:0:0:0: [sda]
Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925763] sd 1:0:0:0: [sda]
Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925770] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925778] 0e 5a b2 b8
Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925782] sd 1:0:0:0: [sda]
Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925785] sd 1:0:0:0: [sda] CDB:
Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925815] sd 1:0:0:0: [sda] Unhandled sense code
Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925817] sd 1:0:0:0: [sda]
Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925820] sd 1:0:0:0: [sda]
Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925825] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925833] 00 00 00 00
Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925836] sd 1:0:0:0: [sda]
Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925839] sd 1:0:0:0: [sda] CDB:
Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925863] sd 1:0:0:0: [sda] Unhandled sense code
Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925865] sd 1:0:0:0: [sda]
Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925868] sd 1:0:0:0: [sda]
Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925872] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925879] 00 00 00 00
Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925882] sd 1:0:0:0: [sda]
Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925885] sd 1:0:0:0: [sda] CDB:
Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925908] ata2: EH complete

And smartctl -a /dev/sda gave me this:



SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 179 174 021 Pre-fail Always - 2008
4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1005
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 082 082 000 Old_age Always - 13675
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 998
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 37
193 Load_Cycle_Count 0x0032 001 001 000 Old_age Always - 810861
194 Temperature_Celsius 0x0022 106 091 000 Old_age Always - 41
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0

Hard-drive model:



Model Family: Western Digital Scorpio Blue Serial ATA (Adv. Format)
Device Model: WDC WD7500BPVT-24HXZT3
Serial Number: WD-WX91A91R4010
LU WWN Device Id: 5 0014ee 601b831c9
Firmware Version: 03.01A03

Upd: I started another self-test (the first one I did several months ago) and got some updates:



SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 13680 229857912
# 2 Extended offline Completed without error 00% 9661 -
# 3 Extended offline Completed: read failure 90% 9654 96004576
# 4 Extended offline Completed: read failure 90% 9653 96004576

lines from #2 to #4 I already had before. I followed these guides: Badblock HOWTO and Debug the Filesystem. It seems the block is not reported as erroneous anymore, but it's not in Relocated blocks are not increased as well. The only thing that have been increased is Raw_Read_Error_Rate after I wrote zero to a bad block.


The questions is should I consider ordering a new hard-drive?



Aucun commentaire:

Enregistrer un commentaire