readApdUnits: Reads Affymetrix probe data (APD) as units (probesets)

Description

Reads Affymetrix probe data (APD) as units (probesets) by using the unit and group definitions in the corresponding Affymetrix CDF file.

If more than one APD file is read, all files are assumed to be of the same chip type, and have the same read map, if any. It is not possible to read APD files of different types at the same time.

Usage

# S3 method for default
readApdUnits(filenames, units=NULL, ..., transforms=NULL, cdf=NULL,
  stratifyBy=c("nothing", "pmmm", "pm", "mm"), addDimnames=FALSE, readMap="byMapType",
  dropArrayDim=TRUE, verbose=FALSE)

Value

A named list where the names corresponds to the names of the units read. Each element of the list is in turn a list structure with groups (aka blocks).

Arguments

filenames: The filenames of the APD files. All APD files must be of the same chip type.
units: An integer vector of unit indices specifying which units to be read. If NULL, all units are read.
...: Additional arguments passed to readApd().
transforms: A list of exactly length(filenames) functions. If NULL, no transformation is performed. Values read are passed through the corresponding transform function before being returned.
cdf: A character filename of a CDF file, or a CDF list structure. If NULL, the CDF file is searched for by findCdf first starting from the current directory and then from the directory where the first APD file is.
stratifyBy: Argument passed to low-level method readCdfCellIndices.
addDimnames: If TRUE, dimension names are added to arrays, otherwise not. The size of the returned APD structure in bytes increases by 30-40% with dimension names.
readMap: A vector remapping cell indices to file indices. If "byMapType", the read map of type according to APD header will be search for and read. It is much faster to specify the read map explicitly compared with searching for it each time. If NULL, no map is used.
dropArrayDim: If TRUE and only one array is read, the elements of the group field do not have an array dimension.
verbose: See Verbose.

Speed

Since the cell indices are semi-randomized across the array and with units (probesets), it is very unlikely that the read will consist of subsequent cells (which would be faster to read). However, the speed of this method, which uses FileVector to read data, is comparable to the speed of readCelUnits, which uses the Fusion SDK (readCel) to read data.

Author

Henrik Bengtsson

Examples

Run this code


library("R.utils") # Arguments

verbose <- Arguments$getVerbose(TRUE)

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 1. Scan for existing CEL files
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# a) Scan current directory for CEL files
files <- list.files(pattern="[.](cel|CEL)$")
files <- files[!file.info(files)$isdir]

if (length(files) > 0 && require("affxparser")) {
  # b) Corresponding APD filenames
  celNames <- files
  apdNames <- gsub(pattern, ".apd", files)
 
  # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  # 1. Copy the probe intensities from a CEL to an APD file
  # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  for (kk in 1) {
    verbose && enter(verbose, "Reading CEL file #", kk)
    cel <- readCel(celNames[kk])
    verbose && exit(verbose)
 
    if (!file.exists(apdNames[kk])) {
      verbose && enter(verbose, "Creating APD file #", kk)
      chipType <- cel$header$chiptype
      writeApd(apdNames[kk], data=cel$intensities, chipType=chipType)
      verbose && exit(verbose)
    }
 
    verbose && enter(verbose, "Verifying APD file #", kk)
    apd <- readApd(apdNames[kk])
    verbose && exit(verbose)
    stopifnot(identical(apd$intensities, cel$intensities))
  }
 
 
  # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  # 2. Read a subset of the units
  # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  units <- c(1, 20:205)
  cel <- readCelUnits(celNames[1], units=units)
  apd <- readApdUnits(apdNames[1], units=units)
  stopifnot(identical(apd, cel))
 
 
  # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  # 3. The same, but stratified on PMs and MMs
  # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  apd <- readApdUnits(apdNames[1], units=units, stratifyBy="pmmm",
                                                addDimnames=TRUE)
} # if (length(files) > 0)

Run the code above in your browser using DataLab