Several datasets have been embedded in this package to be used as decision table of examples.
They can be accessed by typing data(RoughSetData)
. The following is a description of each
datasets.
The hiring dataset
It is simple data taken from (Komorowski et al, 1999) where all the attributes have nominal values. It consists of eight objects with four conditional attributes and one decision attribute. The detailed description of each attribute is as follows:
Diploma: it has the following values: {"MBA", "MSc", "MCE"}.
Exprience: it has the following values: {"High", "Low", "Medium"}.
French: it has the following values: {"Yes", "No"}.
Reference: it has the following values: {"Excellent", "Good", "Neutral"}.
Decision: it is a decision attribute that contains the following values: {"Accept", "Reject"}.
The housing dataset
This data was taken from the Boston housing dataset located at the UCI Machine Learning repository, available at http://www.ics.uci.edu. It was first created by (Harrison and Rubinfeld, 1978). It contains 506 objects with 13 conditional attributes and one decision attribute. Furthermore, it should be noted that the housing dataset is a regression dataset which means that the decision attribute has continuous values. The conditional attributes contain both continuous and nominal attributes. The following is a description of each attribute:
CRIM: it is a continuous attribute that expresses per capita crime rate by town. It has values in: [0.0062, 88.9762].
ZN: it is a continuous attribute that represents the proportion of residential land zoned for lots over 25,000 sq.ft. It has values in: [0, 100].
INDUS: it is a continuous attribute that shows the proportion of non-retail business acres per town. It has values in: [0.46, 27.74].
CHAS: it is a nominal attribute that represents Charles River dummy variable. It has two values which are 1 if tract bounds river and 0 otherwise.
NOX: it is a continuous attribute that shows the nitric oxides concentration (parts per 10 million). It has values in: [0.385, 0.871].
RM: it is a continuous attribute that explains the average number of rooms per dwelling. It has values in: [3.561, 8.78].
AGE: it is a continuous attribute that expresses proportion of owner-occupied units built prior to 1940. It has values in: [2.9, 100].
DIS: it is a continuous attribute that shows weighted distances to five Boston employment centres. It has values in: [1.1296, 12.1265].
RAD: it is a nominal attribute that shows the index of accessibility to radial highways. it has the integer value from 1 to 24.
TAX: it is a continuous attribute that shows the full-value property-tax rate per $10,000. It has values in: [187, 711].
PTRATIO: it is a continuous attribute that shows the pupil-teacher ratio by town. It has values in: [12.6, 22].
B: it is a continuous attribute that can be expressed by 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town. It has values in: [0.32, 396.9].
LSTAT: it is a continuous attribute that illustrates the percentage of lower status of the population. It has values in: [1.73, 37.97].
MEDV: it is a continuous attribute that shows the median value of owner-occupied homes in $1000's. It has values in: [5, 50].
The wine dataset
This dataset is a classification dataset introduced first by (Forina, et al) which is commonly used as benchmark for simulation in the machine learning area. Additionally, it is available at the KEEL dataset repository (Alcala-Fdez, 2009), available at http://www.keel.es/. It consists of 178 instances with 13 conditional attributes and one decision attribute where all conditional attributes have continuous values. The description of each attribute is as follows:
alcohol: it has a range in: [11, 14.9].
malid_acid: it has a range in: [0.7, 5.8].
ash: it has a range in: [1.3, 3.3].
alcalinity_of_ash: it has a range in: [10.6, 30.0].
magnesium: it has a range in: [70, 162].
total_phenols: it has a range in: [0.9, 3.9].
flavanoids: it has a range in: [0.3 5.1].
nonflavanoid_phenols: it has a range in: [0.4 3.6].
proanthocyanins: it has a range in: [0.4 3.6].
color_intensity: it has a range in: [1.2 13.0].
hue: it has a range in: [0.4 1.8].
od: it has a range in: [1.2 4.0].
proline: it has a range in: [278 1680].
class: it is nominal decision attribute that has values: {1, 2, 3}.
The pima dataset
It was taken from the pima Indians diabetes dataset which is available at the KEEL dataset repository (Alcala-Fdez, 2009), available at http://www.keel.es/. It was first created by National Institute of Diabetes and Digestive and Kidney Diseases. It contains 768 objects with 8 continuous conditional attributes. The description of each attribute is as follows:
preg: it represents number of times pregnant and has values in: [1, 17].
plas: it represents plasma glucose concentration a 2 hours in an oral glucose tolerance test and has values in: [0.0, 199.0].
pres: it represents diastolic blood pressure (mm Hg) and has values in: [0.0, 122.0].
skin: it represents triceps skin fold thickness (mm) and has values in: [0.0, 99.0].
insu: it represents 2-hour serum insulin (mu U/ml) and has values in: [0.0, 846.0].
mass: it represents body mass index (weight in kg/(height in m)^2) and has values in: [0.0, 67.1].
pedi: it represents diabetes pedigree function and has values in: [0.078, 2.42].
age: it represents age (years) and has values in: [21.0, 81.0].
class: it is a decision attribute and has values in: [1, 2].
M. Forina, E. Leardi, C. Armanino, and S. Lanteri, "PARVUS - An Extendible Package for Data Exploration, Classification and Correlation", Journal of Chemonetrics, vol. 4, no. 2, p. 191 - 193 (1988).
D. Harrison, and D. L. Rubinfeld, "Hedonic Prices and the Demand for Clean Air", J. Environ. Economics & Management, vol.5, 81-102 (1978).
J. Alcala-Fdez, L. Sanchez, S. Garcia, M. J. del Jesus, S. Ventura, J. M. Garrell, J. Otero, C. Romero, J. Bacardit, V. M. Rivas, J. C. Fernandez, and F. Herrera, "KEEL: A Software Tool to Assess Evolutionary Algorithms to Data Mining Problems", Soft Computing vol. 13, no. 3, p. 307 - 318 (2009).
J. Komorowski, Z. Pawlak, L. Polwski, and A. Skowron, "Rough Sets: A Tutorial", In S. K. Pal and A. Skowron, editors, Rough Fuzzy Hybridization, A New Trend in Decision Making, pp. 3 - 98, Singopore, Springer (1999).