Loads specified data sets, or list the available data sets.
data(…, list = character(), package = NULL, lib.loc = NULL,
verbose = getOption("verbose"), envir = .GlobalEnv,
overwrite = TRUE)
literal character strings or names.
a character vector.
a character vector giving the package(s) to look
in for data sets, or NULL
.
By default, all packages in the search path are used, then
the data
subdirectory (if present) of the current working
directory.
a character vector of directory names of R libraries,
or NULL
. The default value of NULL
corresponds to all
libraries currently known.
a logical. If TRUE
, additional diagnostics are
printed.
the environment where the data should be loaded.
logical: should existing objects of the same name in
envir
be replaced?
A character vector of all data sets specified (whether found or not),
or information about all available data sets in an object of class
"packageIQR"
if none were specified.
There is no requirement for data(foo)
to create an object
named foo
(nor to create one object), although it much
reduces confusion if this convention is followed (and it is enforced
if datasets are lazy-loaded).
data()
was originally intended to allow users to load datasets
from packages for use in their examples, and as such it loaded the
datasets into the workspace .GlobalEnv
. This avoided
having large datasets in memory when not in use: that need has been
almost entirely superseded by lazy-loading of datasets.
The ability to specify a dataset by name (without quotes) is a convenience: in programming the datasets should be specified by character strings (with quotes).
Use of data
within a function without an envir
argument
has the almost always undesirable side-effect of putting an object in
the user's workspace (and indeed, of replacing any object of that name
already there). It would almost always be better to put the object in
the current evaluation environment by
data(…, envir = environment())
.
However, two alternatives are usually preferable,
both described in the ‘Writing R Extensions’ manual.
For sets of data, set up a package to use lazy-loading of data.
For objects which are system data, for example lookup tables
used in calculations within the function, use a file
R/sysdata.rda
in the package sources or create the objects by
R code at package installation time.
A sometimes important distinction is that the second approach places
objects in the namespace but the first does not. So if it is important
that the function sees mytable
as an object from the package,
it is system data and the second approach should be used. In the
unusual case that a package uses a lazy-loaded dataset as a default
argument to a function, that needs to be specified by ::
,
e.g., survival::survexp.us
.
This function creates objects in the envir
environment (by
default the user's workspace) replacing any which already
existed. data("foo")
can silently create objects other than
foo
: there have been instances in published packages where it
created/replaced .Random.seed
and hence change the seed
for the session.
Currently, four formats of data files are supported:
files ending .R
or .r
are
source()
d in, with the R working directory changed
temporarily to the directory containing the respective file.
(data
ensures that the utils package is attached, in
case it had been run via utils::data
.)
files ending .RData
or .rda
are
load()
ed.
files ending .tab
, .txt
or .TXT
are read
using read.table(…, header = TRUE, as.is=FALSE)
,
and hence
result in a data frame.
files ending .csv
or .CSV
are read using
read.table(…, header = TRUE, sep = ";", as.is=FALSE)
,
and also result in a data frame.
If more than one matching file name is found, the first on this list
is used. (Files with extensions .txt
, .tab
or
.csv
can be compressed, with or without further extension
.gz
, .bz2
or .xz
.)
The data sets to be loaded can be specified as a set of character
strings or names, or as the character vector list
, or as both.
For each given data set, the first two types (.R
or .r
,
and .RData
or .rda
files) can create several variables
in the load environment, which might all be named differently from the
data set. The third and fourth types will always result in the
creation of a single variable with the same name (without extension)
as the data set.
If no data sets are specified, data
lists the available data
sets. It looks for a new-style data index in the Meta
or, if
this is not found, an old-style 00Index
file in the data
directory of each specified package, and uses these files to prepare a
listing. If there is a data
area but no index, available data
files for loading are computed and included in the listing, and a
warning is given: such packages are incomplete. The information about
available data sets is returned in an object of class
"packageIQR"
. The structure of this class is experimental.
Where the datasets have a different name from the argument that should
be used to retrieve them the index will have an entry like
beaver1 (beavers)
which tells us that dataset beaver1
can be retrieved by the call data(beaver)
.
If lib.loc
and package
are both NULL
(the
default), the data sets are searched for in all the currently loaded
packages then in the data
directory (if any) of the current
working directory.
If lib.loc = NULL
but package
is specified as a
character vector, the specified package(s) are searched for first
amongst loaded packages and then in the default library/ies
(see .libPaths
).
If lib.loc
is specified (and not NULL
), packages
are searched for in the specified library/ies, even if they are
already loaded from another library.
To just look in the data
directory of the current working
directory, set package = character(0)
(and lib.loc = NULL
, the default).
help
for obtaining documentation on data sets,
save
for creating the second (.rda
) kind
of data, typically the most efficient one.
The ‘Writing R Extensions’ for considerations in preparing the
data
directory of a package.
# NOT RUN {
require(utils)
data() # list all available data sets
try(data(package = "rpart") ) # list the data sets in the rpart package
data(USArrests, "VADeaths") # load the data sets 'USArrests' and 'VADeaths'
# }
# NOT RUN {
## Alternatively
ds <- c("USArrests", "VADeaths"); data(list = ds)
# }
# NOT RUN {
help(USArrests) # give information on data set 'USArrests'
# }
Run the code above in your browser using DataLab