A data set containing hard drive failures data from Backblaze in the
start-stop format used in the survival
package.
hds
A data.frame
with the following columns:
Serial number for the hard disk which the row belongs to.
hard disk model.
manufacturer of the hard disk model.
start and stop times on the SMART 9 attribute scale.
1 if the hard disk fails at tstop
.
hard disk size in terabytes.
the raw SMART attribute x value. E.g., smart_12
is the power cycle count.
1 if the SMART attribute x value is non-zero.
cumulative sum of the prefix ...
.
number of failures in the original data. Hard disk should only fail once but this is not the case in the raw data.
number of records in the original source.
first and last date in the original source.
smallest and largest value of the SMART 9 attribute in the original source.
Details about the the SMART attributes can be found on https://en.wikipedia.org/wiki/S.M.A.R.T.. As stated in the original source
"Reported stats for the same SMART stat can vary in meaning based on the drive manufacturer and the drive model. Make sure you are comparing apples-to-apples as drive manufacturers don't generally disclose what their specific numbers mean."
There are some notes on https://en.wikipedia.org/wiki/S.M.A.R.T. regarding which attributes that have vendor specific raw value. Further,
"The values in the files are the values reported by the drives. Sometimes, those values are out of whack. For example, in a few cases the RAW value of SMART 9 (Drive life in hours) reported a value that would make a drive 10+ years old, which was not possible. In other words, it's a good idea to have bounds checks when you process the data."
See this github page for the processing steps https://github.com/boennecd/backblaze_survival_analysis_prep.