pick.from.points: Pick Variable from Spatial Dataset

Description

These functions pick (i.e. interpolate without worrying too much about theory) values of a spatial variables from a data stored in a data.frame, a point shapefile, or an ASCII or SAGA grid, using nearest neighbor or kriging interpolation. pick.from.points and [internal.]pick.from.ascii.grid are the core functions that are called by the different wrappers.

Usage

pick.from.points(
  data,
  src,
  pick,
  method = c("nearest.neighbour", "krige"),
  set.na = FALSE,
  radius = 200,
  nmin = 0,
  nmax = 100,
  sill = 1,
  range = radius,
  nugget = 0,
  model = vgm(sill - nugget, "Sph", range = range, nugget = nugget),
  log = rep(FALSE, length(pick)),
  X.name = "x",
  Y.name = "y",
  cbind = TRUE
)
pick.from.shapefile(data, shapefile, X.name = "x", Y.name = "y", ...)
pick.from.ascii.grid(
  data,
  file,
  path = NULL,
  varname = NULL,
  prefix = NULL,
  method = c("nearest.neighbour", "krige"),
  cbind = TRUE,
  parallel = FALSE,
  nsplit,
  quiet = TRUE,
  ...
)
pick.from.ascii.grids(
  data,
  file,
  path = NULL,
  varname = NULL,
  prefix = NULL,
  cbind = TRUE,
  quiet = TRUE,
  ...
)
internal.pick.from.ascii.grid(
  data,
  file,
  path = NULL,
  varname = NULL,
  prefix = NULL,
  method = c("nearest.neighbour", "krige"),
  nodata.values = c(-9999, -99999),
  at.once,
  quiet = TRUE,
  X.name = "x",
  Y.name = "y",
  nlines = Inf,
  cbind = TRUE,
  range,
  radius,
  na.strings = "NA",
  ...
)
pick.from.saga.grid(
  data,
  filename,
  path,
  varname,
  prec = 7,
  show.output.on.console = FALSE,
  env = rsaga.env(),
  ...
)

Value

If cbind=TRUE, columns with the new, interpolated variables are added to the input data.frame data.

If cbind=FALSE, a data.frame only containing the new variables is returned (possibly coerced to a vector if only one variable is processed).

Arguments

data: data.frame giving the coordinates (in columns specified by X.name, Y.name) of point locations at which to interpolate the specified variables or grid values
src: data.frame
pick: variables to be picked (interpolated) from src; if missing, use all available variables, except those specified by X.name and Y.name
method: interpolation method to be used; uses a partial match to the alternatives "nearest.neighbor" (currently the default) and "krige"
set.na: logical: if a column with a name specified in pick already exists in data, how should it be dealt with? set.na=FALSE (default) only overwrites existing data if the interpolator yields a non-NA result; set.na=TRUE passes NA values returned by the interpolator on to the results data.frame
radius: numeric value specifying the radius of the local neighborhood to be used for interpolation; defaults to 200 map units (presumably meters), or, in the functions for grid files, 2.5*cellsize.
nmin: numeric, for method="krige" only: see gstat::krige() function in package gstat
nmax: numeric, for method="krige" only: see gstat::krige() function in package gstat
sill: numeric, for method="krige" only: the overall sill parameter to be used for the variogram
range: numeric, for method="krige" only: the variogram range
nugget: numeric, for method="krige" only: the nugget effect
model: for method="krige" only: the variogram model to be used for interpolation; defaults to a spherical variogram with parameters specified by the range, sill, and nugget arguments; see gstat::vgm() in package gstat for details
log: logical vector, specifying for each variable in pick if interpolation should take place on the logarithmic scale (default: FALSE)
X.name: name of the variable containing the x coordinates
Y.name: name of the variable containing the y coordinates
cbind: logical: shoud the new variables be added to the input data.frame (cbind=TRUE, the default), or should they be returned as a separate vector or data.frame? cbind=FALSE
shapefile: point shapefile
...: arguments to be passed to pick.from.points, and to internal.pick.from.ascii.grid in the case of pick.from.ascii.grid
file: file name (relative to path, default file extension .asc) of an ASCII grid from which to pick a variable, or an open connection to such a file
path: optional path to file
varname: character string: a variable name for the variable interpolated from grid file file in pick.from.*.grid; if missing, variable name will be determined from filename by a call to create.variable.name()
prefix: an optional prefix to be added to the varname
parallel: logical (default: FALSE): enable parallel processing; requires additional packages such as doSNOW or doMC. See example below and plyr::ddply()
nsplit: split the data.frame data in nsplit disjoint subsets in order to increase efficiency by using plyr::ddply() in package plyr. The default seems to perform well in many situations.
quiet: logical: provide information on the progress of grid processing on screen? (only relevant if at.once=FALSE and method="nearest.neighbour")
nodata.values: numeric vector specifying grid values that should be converted to NA; in addition to the values specified here, the nodata value given in the input grid's header will be used
at.once: logical: should the grid be read as a whole or line by line? at.once=FALSE is useful for processing large grids that do not fit into memory; the argument is currently by default FALSE for method="nearest.neighbour", and it currently MUST be TRUE for all other methods (in these cases, TRUE is the default value); piecewise processing with at.once=FALSE is always faster than processing the whole grid at.once
nlines: numeric: stop after processing nlines lines of the input grid; useful for testing purposes
na.strings: passed on to scan()
filename: character: name of a SAGA grid file, default extension .sgrd
prec: numeric, specifying the number of digits to be used in converting a SAGA grid to an ASCII grid in pick.from.saga.grid
show.output.on.console: a logical (default: FALSE), indicates whether to capture the output of the command and show it on the R console (see system(), rsaga.geoprocessor()).
env: list: RSAGA geoprocessing environment created by rsaga.env()

Author

Alexander Brenning

Details

pick.from.points interpolates the variables defined by pick in the src data.frame to the locations provided by the data data.frame. Only nearest neighbour and ordinary kriging interpolation are currently available. This function is intended for 'data-rich' situations in which not much thought needs to be put into a geostatistical analysis of the spatial structure of a variable. In particular, this function is supposed to provide a simple, 'quick-and-dirty' interface for situations where the src data points are very densely distributed compared to the data locations.

pick.from.shapefile is a front-end of pick.from.points for point shapefiles.

pick.from.ascii.grid retrieves data values from an ASCII raster file using either nearest neighbour or ordinary kriging interpolation. The latter may not be possible for large raster data sets because the entire grid needs to be read into an R matrix. Split-apply-combine strategies are used to improve efficiency and allow for parallelization.

The optional parallelization of pick.from.ascii.grid computation requires the use of a parallel backend package such as doSNOW or doMC, and the parallel backend needs to be registered before calling this function with parallel=TRUE. The example section provides an example using doSNOW on Windows. I have seen 25-40% reduction in processing time by parallelization in some examples that I ran on a dual core Windows computer.

pick.from.ascii.grids performs multiple pick.from.ascii.grid calls. File path and prefix arguments may be specific to each file (i.e. each may be a character vector), but all interpolation settings will be the same for each file, limiting the flexibility a bit compared to individual pick.from.ascii.grid calls by the user. pick.from.ascii.grids currently processes the files sequentially (i.e. parallelization is limited to the pick.from.ascii.grid calls within this function).

pick.from.saga.grid is the equivalent to pick.from.ascii.grid for SAGA grid files. It simply converts the SAGA grid file to a (temporary) ASCII raster file and applies pick.from.ascii.grid.

internal.pick.from.ascii.grid is an internal 'workhorse' function that by itself would be very inefficient for large data sets data. This function is called by pick.from.ascii.grid, which uses a split-apply-combine strategy implemented in the plyr package.

References

Brenning, A. (2008): Statistical geocomputing combining R and SAGA: The example of landslide susceptibility analysis with generalized additive models. In: J. Boehner, T. Blaschke, L. Montanarella (eds.), SAGA - Seconds Out (= Hamburger Beitraege zur Physischen Geographie und Landschaftsoekologie, 19), 23-32.

Examples

Run this code

if (FALSE) {
# assume that 'dem' is an ASCII grid and d a data.frame with variables x and y
pick.from.ascii.grid(d, "dem")
# parallel processing on Windows using the doSNOW package:
require(doSNOW)
registerDoSNOW(cl <- makeCluster(2, type = "SOCK")) # DualCore processor
pick.from.ascii.grid(d, "dem", parallel = TRUE)
# produces two (ignorable) warning messages when using doSNOW
# typically 25-40% faster than the above on my DualCore notebook
stopCluster(cl)
}

if (FALSE) {
# use the meuse data for some tests:
require(gstat)
data(meuse)
data(meuse.grid)
meuse.nn = pick.from.points(data=meuse.grid, src=meuse,
    pick=c("cadmium","copper","elev"), method="nearest.neighbour")
meuse.kr = pick.from.points(data=meuse.grid, src=meuse,
    pick=c("cadmium","copper","elev"), method="krige", radius=100)
# it does make a difference:
plot(meuse.kr$cadmium,meuse.nn$cadmium)
plot(meuse.kr$copper,meuse.nn$copper)
plot(meuse.kr$elev,meuse.nn$elev)
}

Run the code above in your browser using DataLab