Learn R Programming

PivotalR (version 0.1.18.5)

null.data: A Data Set with lots of NA values

Description

An example data.frame which is used by examples in this user manual

Usage

data(null.data)

Arguments

Format

This data has 104 columns and 2000 rows.

Details

This data set has lots of NA values in it. By using as.db.data.frame, one can put the data set into the connected database. All the NA values will be converted into NULL values.

The MADlib wrapper functions like madlib.lm and link{madlib.glm} will throw an error if there are NULL values in the data. So one needs to clean up the data before using the regression functions supplied by MADlib.

Examples

Run this code
# NOT RUN {
<!-- %% @test .port Database port number -->
<!-- %% @test .dbname Database name -->
## set up the database connection
## Assume that .port is port number and .dbname is the database name
cid <- db.connect(port = .port, dbname = .dbname, verbose = FALSE)

## create a table from the example data.frame "abalone"
delete("null_data", conn.id = cid)
x <- as.db.data.frame(null.data, "null_data", conn.id = cid, verbose = FALSE)

## ERROR, because of NULL values
fit <- madlib.lm(sf_mrtg_pct_assets ~ ris_asset + lncrcd + lnauto +
                 lnconoth + lnconrp + intmsrfv + lnrenr1a + lnrenr2a +
                 lnrenr3a, data = x)

## select columns
y <- x[,c("sf_mrtg_pct_assets","ris_asset", "lncrcd","lnauto",
          "lnconoth","lnconrp","intmsrfv","lnrenr1a","lnrenr2a",
          "lnrenr3a")]

dim(y)

## remove NULL values
for (i in 1:10) y <- y[!is.na(y[i]),]

dim(y)

fit <- madlib.lm(sf_mrtg_pct_assets ~ ., data = y)

fit

db.disconnect(cid, verbose = FALSE)
# }

Run the code above in your browser using DataLab