Select samples from along an environmental gradient by splitting the gradient into discrete chunks and sample within each chunk. This allows a test set to be selected which covers the environmental gradient of the training set, for example.
splitSample(env, chunk = 10, take, nchunk,
fill = c("head", "tail", "random"),
maxit = 1000)
A numeric vector of indices of selected samples. This vector has
attribute lengths
which indicates how many samples were
actually chosen from each chunk.
numeric; vector of samples representing the gradient values.
numeric; number of chunks to split the gradient into.
numeric; how many samples to take from the gradient. Can not be missing.
numeric; number of samples per chunk. Must be a vector
of length chunk
and sum(chunk)
must equal
take
. Can be missing (the default), in which case some simple
heuristics are used to determine the number of samples chosen per
chunk. See Details.
character; the type of filling of chunks to perform. See Details.
numeric; maximum number of iterations in which to try to
sample take
observations. Basically here to stop the loop
going on forever.
Gavin L. Simpson
The gradient is split into chunk
sections and samples are
selected from each chunk to result in a sample of length
take
. If take
is divisible by chunk
without
remainder then there will an equal number of samples selected from
each chunk. Where chunk
is not a multiple of take
and
nchunk
is not specified then extra samples have to be allocated
to some of the chunks to reach the required number of samples
selected.
An additional complication is that some chunks of the gradient may
have fewer than nchunk
samples and therefore more samples need
to be selected from the remaining chunks until take
samples are
chosen.
If nchunk
is supplied, it must be a vector stating exactly how
many samples to select from each chunk. If chunk
is not
supplied, then the number of samples per chunk is determined as
follows:
An intial allocation of floor(take / chunk)
is assigned
to each chunk
If any chunks have fewer samples than this initial allocation,
these elements of nchunk
are reset to the number of samples
in those chunks
Sequentially an extra sample is allocated to each chunk with
sufficient available samples until take
samples are
selected.
Argument fill
controls the order in which the chunks are
filled. fill = "head"
fills from the low to the high end of the
gradient, whilst fill = "tail"
fills in the opposite
direction. Chunks are filled in random order if fill =
"random"
. In all cases no chunk is filled by more than one extra
sample until all chunks that can supply one extra sample are
filled. In the case of fill = "head"
or fill = "tail"
this entails moving along the gradient from one end to the other
allocating an extra sample to available chunks before starting along
the gradient again. For fill = "random"
, a random order of
chunks to fill is determined, if an extra sample is allocated to each
chunk in the random order and take
samples are still not
selected, filling begins again using the same random ordering. In
other words, the random order of chunks to fill is chosen only once.
data(swappH)
## take a test set of 20 samples along the pH gradient
test1 <- splitSample(swappH, chunk = 10, take = 20)
test1
swappH[test1]
## take a larger sample where some chunks don't have many samples
## do random filling
set.seed(3)
test2 <- splitSample(swappH, chunk = 10, take = 70, fill = "random")
test2
swappH[test2]
Run the code above in your browser using DataLab