Learn R Programming

hddplot (version 0.59)

divideUp: Partition data into mutiple nearly equal subsets

Description

Randomly partition data into nearly equal subsets. If balanced=TRUE the requirement is imposed that the subsets should as far as possible be balanced with respect to a classifying factor. The multiple sets are suitable for use for determining the folds in a cross-validation.

Usage

divideUp(cl, nset = 2, seed = NULL, balanced=TRUE)

Value

a set of indices that identify the nset subsets

Arguments

cl

classifying factor

nset

number of subsets into which to partition data

seed

set the seed, if required, in order to obtain reproducible results

balanced

logical: should subsets be as far as possible balanced with respect to the classifying factor?

Author

John Maindonald

Examples

Run this code
foldid <- divideUp(cl=rep(1:3, c(17,14,8)), nset=10)
table(rep(1:3, c(17,14,8)), foldid)
foldid <- divideUp(cl=rep(1:3, c(17,14,8)), nset=10,
       	    balanced=FALSE)
table(rep(1:3, c(17,14,8)), foldid)


## The function is currently defined as
function(cl = rep(1:3, c(7, 4, 8)), nset=2, seed=NULL, balanced=TRUE){
    if(!is.null(seed))set.seed(seed)
    if(balanced){
      ord <- order(cl)
      ordcl <- cl[ord]
      gp0 <- rep(sample(1:nset), length.out=length(cl))
      gp <- unlist(split(gp0,ordcl), function(x)sample(x))
      gp[ord] <- gp
    } else
    gp <- sample(rep(1:nset, length.out=length(cl)))
    as.vector(gp)
  }

Run the code above in your browser using DataLab