These functions pick (i.e. interpolate without worrying too much about theory) values of a spatial variables from a data stored in a data.frame, a point shapefile, or an ASCII or SAGA grid, using nearest neighbor or kriging interpolation. pick.from.points
and [internal.]pick.from.ascii.grid
are the core functions that are called by the different wrappers.
pick.from.points(
data,
src,
pick,
method = c("nearest.neighbour", "krige"),
set.na = FALSE,
radius = 200,
nmin = 0,
nmax = 100,
sill = 1,
range = radius,
nugget = 0,
model = vgm(sill - nugget, "Sph", range = range, nugget = nugget),
log = rep(FALSE, length(pick)),
X.name = "x",
Y.name = "y",
cbind = TRUE
)pick.from.shapefile(data, shapefile, X.name = "x", Y.name = "y", ...)
pick.from.ascii.grid(
data,
file,
path = NULL,
varname = NULL,
prefix = NULL,
method = c("nearest.neighbour", "krige"),
cbind = TRUE,
parallel = FALSE,
nsplit,
quiet = TRUE,
...
)
pick.from.ascii.grids(
data,
file,
path = NULL,
varname = NULL,
prefix = NULL,
cbind = TRUE,
quiet = TRUE,
...
)
internal.pick.from.ascii.grid(
data,
file,
path = NULL,
varname = NULL,
prefix = NULL,
method = c("nearest.neighbour", "krige"),
nodata.values = c(-9999, -99999),
at.once,
quiet = TRUE,
X.name = "x",
Y.name = "y",
nlines = Inf,
cbind = TRUE,
range,
radius,
na.strings = "NA",
...
)
pick.from.saga.grid(
data,
filename,
path,
varname,
prec = 7,
show.output.on.console = FALSE,
env = rsaga.env(),
...
)
If cbind=TRUE
, columns with the new, interpolated variables are added to the input data.frame data
.
If cbind=FALSE
, a data.frame only containing the new variables is returned (possibly coerced to a vector if only one variable is processed).
data.frame giving the coordinates (in columns specified by X.name, Y.name
) of point locations at which to interpolate the specified variables or grid values
data.frame
variables to be picked (interpolated) from src
; if missing, use all available variables, except those specified by X.name
and Y.name
interpolation method to be used; uses a partial match to the alternatives "nearest.neighbor"
(currently the default) and "krige"
logical: if a column with a name specified in pick
already exists in data
, how should it be dealt with? set.na=FALSE
(default) only overwrites existing data if the interpolator yields a non-NA
result; set.na=TRUE
passes NA
values returned by the interpolator on to the results data.frame
numeric value specifying the radius of the local neighborhood to be used for interpolation; defaults to 200 map units (presumably meters), or, in the functions for grid files, 2.5*cellsize
.
numeric, for method="krige"
only: see gstat::krige()
function in package gstat
numeric, for method="krige"
only: see gstat::krige()
function in package gstat
numeric, for method="krige"
only: the overall sill parameter to be used for the variogram
numeric, for method="krige"
only: the variogram range
numeric, for method="krige"
only: the nugget effect
for method="krige"
only: the variogram model to be used for interpolation; defaults to a spherical variogram with parameters specified by the range
, sill
, and nugget
arguments; see gstat::vgm()
in package gstat for details
logical vector, specifying for each variable in pick
if interpolation should take place on the logarithmic scale (default: FALSE
)
name of the variable containing the x coordinates
name of the variable containing the y coordinates
logical: shoud the new variables be added to the input data.frame (cbind=TRUE
, the default), or should they be returned as a separate vector or data.frame? cbind=FALSE
point shapefile
arguments to be passed to pick.from.points
, and to internal.pick.from.ascii.grid
in the case of pick.from.ascii.grid
file name (relative to path
, default file extension .asc
) of an ASCII grid from which to pick a variable, or an open connection to such a file
optional path to file
character string: a variable name for the variable interpolated from grid file file
in pick.from.*.grid
; if missing, variable name will be determined from file
name by a call to create.variable.name()
an optional prefix to be added to the varname
logical (default: FALSE
): enable parallel processing; requires additional packages such as doSNOW or doMC. See example below and plyr::ddply()
split the data.frame data
in nsplit
disjoint subsets in order to increase efficiency by using plyr::ddply()
in package plyr. The default seems to perform well in many situations.
logical: provide information on the progress of grid processing on screen? (only relevant if at.once=FALSE
and method="nearest.neighbour"
)
numeric vector specifying grid values that should be converted to NA
; in addition to the values specified here, the nodata value given in the input grid's header will be used
logical: should the grid be read as a whole or line by line? at.once=FALSE
is useful for processing large grids that do not fit into memory; the argument is currently by default FALSE
for method="nearest.neighbour"
, and it currently MUST be TRUE
for all other methods (in these cases, TRUE
is the default value); piecewise processing with at.once=FALSE
is always faster than processing the whole grid at.once
numeric: stop after processing nlines
lines of the input grid; useful for testing purposes
passed on to scan()
character: name of a SAGA grid file, default extension .sgrd
numeric, specifying the number of digits to be used in converting a SAGA grid to an ASCII grid in pick.from.saga.grid
a logical (default: FALSE
), indicates whether to capture the output of the command and show it on the R console (see system()
, rsaga.geoprocessor()
).
list: RSAGA geoprocessing environment created by rsaga.env()
Alexander Brenning
pick.from.points
interpolates the variables defined by pick
in the src
data.frame to the locations provided by the data
data.frame. Only nearest neighbour and ordinary kriging interpolation are currently available. This function is intended for 'data-rich' situations in which not much thought needs to be put into a geostatistical analysis of the spatial structure of a variable. In particular, this function is supposed to provide a simple, 'quick-and-dirty' interface for situations where the src
data points are very densely distributed compared to the data
locations.
pick.from.shapefile
is a front-end of pick.from.points
for point shapefiles.
pick.from.ascii.grid
retrieves data values from an ASCII raster file using either nearest neighbour or ordinary kriging interpolation. The latter may not be possible for large raster data sets because the entire grid needs to be read into an R matrix. Split-apply-combine strategies are used to improve efficiency and allow for parallelization.
The optional parallelization of pick.from.ascii.grid
computation requires the use of a parallel backend package such as doSNOW or doMC, and the parallel backend needs to be registered before calling this function with parallel=TRUE
. The example section provides an example using doSNOW on Windows. I have seen 25-40% reduction in processing time by parallelization in some examples that I ran on a dual core Windows computer.
pick.from.ascii.grids
performs multiple pick.from.ascii.grid
calls. File path
and prefix
arguments may be specific to each file
(i.e. each may be a character vector), but all interpolation settings will be the same for each file
, limiting the flexibility a bit compared to individual pick.from.ascii.grid
calls by the user. pick.from.ascii.grids
currently processes the files sequentially (i.e. parallelization is limited to the pick.from.ascii.grid
calls within this function).
pick.from.saga.grid
is the equivalent to pick.from.ascii.grid
for SAGA grid files. It simply converts the SAGA grid file
to a (temporary) ASCII raster file and applies pick.from.ascii.grid
.
internal.pick.from.ascii.grid
is an internal 'workhorse' function that by itself would be very inefficient for large data sets data
. This function is called by pick.from.ascii.grid
, which uses a split-apply-combine strategy implemented in the plyr package.
Brenning, A. (2008): Statistical geocomputing combining R and SAGA: The example of landslide susceptibility analysis with generalized additive models. In: J. Boehner, T. Blaschke, L. Montanarella (eds.), SAGA - Seconds Out (= Hamburger Beitraege zur Physischen Geographie und Landschaftsoekologie, 19), 23-32.
grid.to.xyz()
, %vgm()
, krige()
, read.ascii.grid()
, write.ascii.grid()
if (FALSE) {
# assume that 'dem' is an ASCII grid and d a data.frame with variables x and y
pick.from.ascii.grid(d, "dem")
# parallel processing on Windows using the doSNOW package:
require(doSNOW)
registerDoSNOW(cl <- makeCluster(2, type = "SOCK")) # DualCore processor
pick.from.ascii.grid(d, "dem", parallel = TRUE)
# produces two (ignorable) warning messages when using doSNOW
# typically 25-40% faster than the above on my DualCore notebook
stopCluster(cl)
}
if (FALSE) {
# use the meuse data for some tests:
require(gstat)
data(meuse)
data(meuse.grid)
meuse.nn = pick.from.points(data=meuse.grid, src=meuse,
pick=c("cadmium","copper","elev"), method="nearest.neighbour")
meuse.kr = pick.from.points(data=meuse.grid, src=meuse,
pick=c("cadmium","copper","elev"), method="krige", radius=100)
# it does make a difference:
plot(meuse.kr$cadmium,meuse.nn$cadmium)
plot(meuse.kr$copper,meuse.nn$copper)
plot(meuse.kr$elev,meuse.nn$elev)
}
Run the code above in your browser using DataLab