readMLData-package: Reading data from different sources in their original format.

Description

The package contains functions, which allow to maintain and use a structure describing a collection of machine learning datasets and read them into R environment using a unified interface, see function prepareDSList() and dsRead().

Arguments

Details

The data are not part of the package. The package requires to receive a path to a local copy of the data and their description. The description of the data sets consists of a directory, which contains an XML file contents.xml and subdirectory "scripts", which contains an R script for each data set, which reads the data set into R. File contents.xml contains information on all the data sets. In particular it contains their names for local identification, their public names, and the names of files representing the data set. The name of the script for reading a data set is derived from its identification name. The complete list of the fields in contents.xml may be obtained using getFields().

For the simplest use of the package for reading the data sets, the functions prepareDSList() and dsRead() are sufficient. The remaining functions are useful for including further data sets to the description. Use help(package=readMLData) or library(help=readMLData) to see the list of functions.

The list of fields, which should be included in "contents.xml", consists of the fields with either usage=="obligatory" or usage=="optional" in the table produced by getFields(). Fields with usage=="additional" and usage=="computed" are included automatically by the function prepareDSList().

An example of the description directory describing three UCI data sets is in exampleDescription subdirectory of the installed package. The data themselves are in exampleData subdirectory. See http://www.cs.cas.cz/~savicky/readMLData/ for description files of further data sets from UCI Machine Learning Repository.

References

UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/.

Additional resources for the CRAN package readMLData, http://www.cs.cas.cz/~savicky/readMLData/.