Learn R Programming

memisc (version 0.11-11)

aggregateDB: Aggregate database tables

Description

aggregateDB constructs a data frame of summaries from one or several data base tables conditional on given values of independent variables given by a formula.

Usage

aggregateDB(formula,
            table,
            connection,
            catalog="catalog",
            subset=NULL,
            names=NULL,
            showSQL=TRUE,
            allow.sql.eval=FALSE)

mergeSQL(x,y,by=NULL,by.x=by,by.y=by,all=FALSE,all.x=all,all.y=all)

Arguments

formula
a formula object with an expression yielding a numeric result on the left hand side and the conditioning variables, separated by +, on the right hand side. Interactions are ignored. The left hand side of the formula is optiona
table
a character vector giving the name of a data base table.
connection
a data base connection.
catalog
an optional character vector giving the name of a data base table that contains value labels.
subset
an optional expression specifying a subset of observations to be used.
names
an optional character vector giving new names to the variables in the resulting data frame; names of this character vector are the old names, values are the new names.
showSQL
logical; should the generated SQL statements be shown?
allow.sql.eval
logical; should, where possible, the data base server compute aggregate functions?
x,y
data base table names.
by,by.x,by.y
names of data base table columns by which table rows are joined
all,all.x,all.y
logical; should rows be used that contain NULL? See also merge.

Value

  • A data frame with conditional summaries and unique value combinations of conditioning variables.

Details

If an expression is given as left hand side of the formula, its value is computed for any combination of values of the values on the right hand side.

If no expression is given as left hand side, then the frequency counts for the respective value combinations of the right hand variables are computed.

If a single factor is on the left hand side, then the left hand side is translated into an appropriate call to table(). If a single numeric variable is on the left hand side, frequency counts weighted by this variable are computed.

mergeSQL is a little helper to conveniently construct JOINs of data base tables without the need to code SQL statements.

See Also

aggregate.formula, merge, packages DBI and RSQLite

Examples

Run this code
# This is a real life data management example from the author's work.
# It is the very reason for the existence of aggregatedDB
#
# First we have to 'plug in' the data base ...
library(RSQLite)
if(!exists("drv"))
  drv <- dbDriver("SQLite")
if(!exists("con"))
  con <- dbConnect(drv, dbname = "~/forschung/pieb.db3")
# This is a 400 MB SQLite data base that contains
# several tables contructed from cumulated eurobarometer
# survey data files, spanning the period from 1973 to 2002.


poldisc.data <- aggregateDB(table(poldisc)~nation+eb+educrec+sex+age+year,
            subset = educrec < 900 & sex < 900 & age < 900 & poldisc < 900,
            connection=con,
            allow.sql.eval=TRUE,
            table="ebtrend",
            names=c(educrec="education",sex="gender")
            )

polint.poldisc.data <- aggregateDB(percent(polint)~poldisc,
            subset=polint < 900 & poldisc < 900,
            connection=con,
            table=mergeSQL("political_interest","ebtrend",by=c("nation","eb","id")))

Run the code above in your browser using DataLab