Abbreviation: rd
, rd.brief
, Read2
Reads the contents of the specified data file into an R data table, what R calls a data frame. By default the format of the file is detected from its filetype: comma or tab separated value text file from .csv
, SPSS data file from .sav
, SAS data from from .sas7bdat
, or R data file from .rda
, and Excel file from .xls
or .xlsx
using Alexander Walker's openxlsx
package. Specify a fixed width formatted text data file to be read with the required R widths
option. Identify the data file by either browsing for the file on the local computer system with Read()
, or identify the file with the first argument a character string in the form of a path name or a web URL (except for .Rda files which must be on the local computer system).
Any variable labels in a native SPSS of native R file are automatically included in the data file. See the details
section below for more information. Variable labels can also be added and modified individually with the lessR
function label
, and more comprehensively with the VariableLabels
function.
The function provides feedback regarding the data that is read by invoking the lessR
function details
. The brief form of this function invoked by default only lists the input files, the variable name table, and any variable labels.
The lessR
function corRead
reads a correlation matrix.
Read(ref=NULL, format=NULL, in.lessR=FALSE, labels=NULL, widths=NULL, stringsAsFactors=FALSE,
missing="", n.mcut=1,
miss.show=30, miss.zero=FALSE, miss.matrix=FALSE,
max.lines=30, sheet=1,
brief=TRUE, quiet=getOption("quiet"),
fun.call=NULL, …)
rd(…)
rd.brief(…, brief=TRUE)
Read2(…, sep=";", dec=",")
File reference, either omitted to browse for the data file, or (except for .Rda
files) a full path name or web URL, included in quotes. A URL begins with
http://
.
Format of the data in the file, which by default is aligned with the
file type of the file to read: .csv
, .tsv
or .txt
read as a text file, .xls
or .xlsx
read as an Excel file,
.sav
reads as an SPSS file, which also reads the variable labels
if present, .sas7bdat
reads as a SAS file, and .rda
reads as a
native R data file. If the data file is not identified by one of these
file types, then explicitly set to one.
If TRUE
then the data file has been downloaded as part of the
lessR package.
[This is a legacy option in which the labels are part of the data file,
replaced by the VariableLabels
function to have labels in mylabels.]
File name for the file of variable labels. Either a full path name,
or just the file name if in the same directory as the data file, or no
reference between the quotes, which allows the user to browse for the
labels file. Or, if row2
, then the labels are in the second line of
the data file. Must be a literal string, not a character variable.
Specifies the width of the successive columns for fixed width formatted data.
Defaults to FALSE
, so variables with at least one non-numeric
data value are read as character strings instead of factors.
Missing value code, which by default is literally a missing data value in the data table.
For the missing value analysis, list the row name and number of missing values if the number of missing exceeds or equals this cutoff.
For the missing value analysis, the number of rows, one row per observation,
that has as many or missing values as n.mcut
.
For the missing value analysis, list the variable name or the row name even for values of 0. By default only variables and rows with missing data are listed.
For the missing value analysis, if there is any missing data, list a version of the complete data table with a 0 for a non-missing value and a 1 for a missing value.
Character that separates adjacent values in a text file of data.
Character that serves as the decimal separator in a number.
Maximum number of lines to list of the data and labels.
For Excel files, specifies the work sheet to read. The default is the first work sheet.
If TRUE
, display only variable names table plus any variable labels.
If set to TRUE
, no text output. Can change the corresponding system
default with style
function.
Function call. Used with Rmd
to pass the function call when
obtained from the abbreviated function call rd
.
Other parameter values define with the R read functions, such as the
read.table
function for text files, with row.names and header.
The read data frame is returned, usually assigned the name of mydata
as in the examples below. This is the default name for the data frame input into the lessR
data analysis functions.
By default Read
reads text data files which are either comma delimited, csv
, or tab-delimited data files, native Excel files of type .xls
or .xlsx
, native R files with file type of .rda
, native SAS files with file type .sas7bdat
, and native SPSS files with file type .sav
. Invoke the widths
option to allow for the reading of fixed width formatted data. Calls the lessR
function details
to provide feedback regarding details of the data frame that was read. By default, variables defined by non-numeric variables are read as character strings. To read as factors
specify stringsAsFactors
as FALSE
, unless all the values of a variable a non-numeric and unique, in which case the variable is classified as a character string.
CREATE csv FILE One way to create a csv data file is to enter the data into a text editor. A more structured method is to use a worksheet application such as MS Excel, LibreOffice Calc. Place the variable names in the first row of the worksheet. Each column of the worksheet contains the data for the corresponding variable. Each subsequent row contains the data for a specific observation, such as for a person or a company.
All numeric data in the worksheet should be displayed in the General format, so that the only non-digit character for a numeric data value is a decimal point. The General format removes all dollar signs and commas, for example, leaving only the pure number, stripped of these extra characters which R will not properly read as part of a numeric data value.
To create the csv file from a standard worksheet application such as Microsoft Excel or LibreOffice Calc, first convert any numeric data to general format to remove characters such as dollar signs and commas, and then under the File option, do a Save As and choose the csv format.
Call help(read.table)
to view the other options that can also be implemented from Read
.
MECHANICS
Specify the file as with the Read
function for reading the data into a data frame. If no arguments are passed to the function, then interactively browse for the file.
Given a csv data file, or tab-delimited text file, read the data into an R data frame called mydata
with Read
. Because Read
calls the standard R function read.csv
, which serves as a wrapper for read.table
, the usual options that work with read.table
, such as row.names
, also can be passed through the call to Read
.
SPSS DATA
Relies upon read.spss
from the foreign
package. To read data in the SPSS .sav
format. If the file has a file type of .sav
, that is, the file specification ends in .sav
, then the format
is automatically set to "SPSS"
. To invoke this option for a relevant data file of any file type, explicitly specify format="SPSS"
. Any variable labels in the SPSS file are read and stored in the resulting R
data table (frame). However, SPSS allows value labels for integer variables, so to preserve the variable labels in R the resulting variable is typed as a factor. To preserve the integer type, invoke the read.spss
option use.value.labels=FALSE
.
R DATA
Relies upon the standard R function load
. By convention only, data files in native R format have a file type of .rda
. To read a native R data file, if the file type is .rda
, the format
is automatically set to "R"
. To invoke this option for a relevant data file of any file type, explicitly specify format="R"
. Create a native R data file by saving the current data frame, usually mydata
, with the lessR
function Write
.
Excel DATA
Relies upon the function read.xlsx
from Alexander Walker's openxlsx
package. Files with a file type of .xlsx
are assigned a format
of "Excel"
. The read_excel
parameter sheet
specifies the ordinal position of the worksheet in the Excel file, with a default value of 1. The row.names
parameter can only have a value of 1.
lessR DATA
lessR
has some data sets included with the package. Read
reads each such data set by specifying its name and setting in.lessR=TRUE
. (The older format="lessR"
is deprecated.) Also, each included data set begins with the prefix dat
, which can be deleted when specifying the name of the data set. This option is a replacement for the standard R data
function, offering the added information provided by Read
.
FIXED WIDTH FORMATTED DATA
Relies upon read.fwf
. Applies to data files in which the width of the column of data values of a variable is the same for each data value and there is no delimiter to separate adjacent data values. An example is a data file of Likert scale responses from 1 to 5 on a 50 item survey such that the data consist of 50 columns with no spaces or other delimiter to separate adjacent data values. To read this data set, invoke the widths
option of read.fwf
.
MISSING DATA
By default, Read
provides a list of each variable and each row with the display of the number of associated missing values, indicated by the standard R missing value code NA. When reading the data, Read
automatically sets any empty values as missing. Note that this is different from the R default in read.table
in which an empty value for character string variables are treated as a regular data value. Any other valid value for any data type can be set to missing as well with the missing
option. To mimic the standard R default for missing character values, set missing=NA
.
To not list the variable name or row name of variables or rows without missing data, invoke the miss.zero=FALSE
option, which can appreciably reduce the amount of output for large data sets. To view the entire data table in terms of 0's and 1's for non-missing and missing data, respectively, invoke the miss.matrix=TRUE
option.
VARIABLE LABELS
Unlike standard R, lessR
provides for variable labels, which can be provided for some or all of the variables in a data frame. The variable labels are best stored in a separate data frame mylabels
. The legacy approach is to store the variable labels directly with the data in the same data frame. The problem with this approach is that any transformations of the data with any function other than lessR
transformation functions remove the variable labels. The option for reading the variable labels with the labels
option of Read
statement is retained for compatibility.
There are, however, two reasons that are necessary to read the variable labels into the same data frame as the data. The first is when the variable labels are embedded directly in a text or Excel data file as the second row of the data file. To accomplish this read, specify the label="row2"
option. The web survey application Qualtrics downloads csv
files in this format. The second reason for embedding variable labels within the data file are when the data are read from an SPSS file, which retains the SPSS variable labels as part of the data file. The lessR
data analysis functions will properly process these variable labels, but any non-lessR
data transformations will remove the labels from the data frame. To retain the labels, copy them to the mylabels
data frame with the VariableLabels
function with the name of the data frame as the sole argument.
The lessR
functions that provide analysis, such as Histogram
for a histogram, automatically include the variable labels in their output, such as the title of a graph. Standard R functions can also use these variable labels by invoking the lessR
function label
, such as setting main=label(I4)
to put the variable label for a variable named I4 in the title of a graph.
Gerbing, D. W. (2014). R Data Analysis without Programming, Chapter 2, NY: Routledge.
Alexander Walker (2017). openxlsx: Read, Write and Edit XLSX Files. https://CRAN.R-project.org/package=openxlsx
read.csv
, read.spss
, read.xlsx
,
read.fwf
, corRead
, label
,
details
, VariableLabels
.
# NOT RUN {
# remove the # sign before each of the following Read statements to run
# to browse for a data file on the computer system, invoke Read with
# the ref argument empty
# mydata <- Read()
# abbreviated name
# mydata <- rd()
# reduced output to the console
# mydata <- rd.brief()
# browse for a file and then read the variable labels from
# the specified label file, here a Excel file with two columns,
# the first column of variable names and the second column the
# corresponding labels
# mydata <- Read(labels="employee_lbl.xlsx")
# same as above, but include standard read.csv options to indicate
# no variable names in first row of the csv data file
# and then provide the names
# also indicate that the first column is an ID field
# mydata <- Read(header=FALSE, col.names=c("X", "Y"), row.names=1)
# read a csv data file from the web
# mydata <- Read("http://web.pdx.edu/~gerbing/data/twogroup.csv")
# read a csv data file with -99 and XXX set to missing
# mydata <- Read(missing=c(-99, "XXX"))
# do not display any output
# mydata <- Read(quiet=TRUE)
# display full output
# mydata <- Read(brief=FALSE)
# read the built-in data set dataEmployee
mydata <- Read("Employee", in.lessR=TRUE)
# read a data file organized by columns, with a
# 5 column ID field, 2 column Age field
# and 75 single columns of data, no spaces between columns
# name the variables with lessR function: to
# the variable names are Q01, Q02, ..., Q74, Q75
# mydata <- Read(widths=c(5,2,rep(1,75)), col.names=c("ID", "Age", to("Q", 75)))
# }
Run the code above in your browser using DataLab