03.ReadingData: Topic: Reading Microarray Data from Files
Description
This help page gives an overview of LIMMA functions used to read data from files.
Reading Target Information
The function readTargets
is designed to help with organizing information about which RNA sample is hybridized to each channel on each array and which files store information for each array.Reading Intensity Data
The first step in a microarray data analysis is to read into R the intensity data for each array provided by an image analysis program.
This is done using the function read.maimages
. read.maimages
optionally constructs quality weights for each spot using quality functions listed in QualityWeights. If the data is two-color, then read.maimages
produces an RGList
object.
If the data is one-color (single channel) then an EListRaw
object is produced.
In either case, read.maimages
stores only the information required from each image analysis output file.
read.maimages
uses utility functions removeExt
, read.imagene
and read.columns
.
There are also a series of utility functions which read the header information from image output files including readGPRHeader
, readImaGeneHeader
and readGenericHeader
. read.ilmn
reads probe or gene summary profile files from Illumina BeadChips,
and produces an ElistRaw
object. read.idat
reads Illumina files in IDAT format, and produces an EListRaw
object.
detectionPValues
can be used to add detection p-values. The function as.MAList can be used to convert a marrayNorm
object to an MAList
object if the data was read and normalized using the marray and marrayNorm packages.Reading the Gene List
Most image analysis software programs provide gene IDs as part of the intensity output files, for example GenePix, Imagene and the Stanford Microarray Database do this.
In other cases the probe ID and annotation information may be in a separate file.
The most common format for the probe annotation file is the GenePix Array List (GAL) file format.
The function readGAL
reads information from a GAL file and produces a data frame with standard column names. The function getLayout
extracts from the GAL-file data frame the print layout information for a spotted array.
The functions gridr
, gridc
, spotr
and spotc
use the extracted layout to compute grid positions and spot positions within each grid for each spot.
The function printorder
calculates the printorder, plate number and plate row and column position for each spot given information about the printing process.
The utility function getSpacing
converts character strings specifying spacings of duplicate spots to numeric values. The Australian Genome Research Facility in Australia often produces GAL files with composite probe IDs or names consisting of multiple strings separated by a delimiter.
These can be separated into name and annotation information using strsplit2
. If each probe is printed more than once of the arrays in a regular pattern, then uniquegenelist
will remove duplicate names from the gal-file or gene list.Identifying Control Spots
The functions readSpotTypes
and controlStatus
assist with separating control spots from ordinary genes in the analysis and data exploration.Manipulating Data Objects
cbind
, rbind
, merge
allow different RGList
or MAList
objects to be combined.
cbind
combines data from different arrays assuming the layout of the arrays to be the same.
merge
can combine data even when the order of the probes on the arrays has changed.
merge
uses utility function makeUnique
.See Also
01.Introduction,
02.Classes,
03.ReadingData,
04.Background,
05.Normalization,
06.LinearModels,
07.SingleChannel,
08.Tests,
09.Diagnostics,
10.GeneSetTests,
11.RNAseq