Data Formats: Data Formats
Description
Data formats used in cubfits.
Format
All are in simple formats as S3 default lists or data frames.Details
-
Format
b:
A named list A contains amino acids.
Each element of the list A[[i]] is a list of elements
coefficients (coefficients of log(mu) and Delta.t),
coef.mat (matrix format of coefficients), and
R (covariance matrix of coefficients).
Note that coefficients and R are typically as in the output
of vglm() of VGAM package.
Also, coef.mat and R may miss in some cases.
e.g. A[[i]]$coef.mat is the regression beta matrix of i-th
amino acid.
-
Format
bVec:
A vector simply contains all coefficients of a b object A.
Note that this is probably only used inside MCMC or the output of
vglm() of VGAM package.
e.g. do.call("c", lapply(A, function(x) x$coefficients)).
-
Format
n:
A named list A contains amino acids.
Each element of the list A[[i]] is a vector containing total
codon counts.
e.g. A[[i]][j] is for j-th ORF of i-th amino acid
names(A)[i].
-
Format
n.list:
A named list A contains ORFs.
Each element of the list A[[i]] is a named list of amino acid
containing total count.
e.g. A[[i]][[j]] contains total count of
j-th amino acid in i-th ORF.
-
Format
phi.df:
A data frame A contains two columns ORF and phi.value.
e.g. A[i,] is for i-th ORF.
-
Format
reu13.df:
A named list A contains amino acids.
Each element is a data frame summarizing ORF and expression.
The data frame has four to five columns including
ORF, phi (expression), Pos (amino acid position),
Codon (synonymous codon), and
Codon.id (synonymous codon id, for computing only).
Note that Codon.id may miss in some cases.
e.g. A[[i]][17,] is the 17-th recode of i-th amino acid.
-
Format
reu13.list:
A named list A contains ORFs.
Each element is a named list A[[i]] contains amino acids.
Each element of nested list A[[i]][[j]] is a position vector
of synonymous codon.
e.g. A[[i]][[j]][k] is the k-th synonymous codon position of
j-th amino acid in the i-th ORF.
-
Format
scuo:
A data frame of 8 named columns includes
AA (amino acid), ORF, C1, ..., C6
where C*'s are for codon counts.
-
Format
seq.string:
Default outputs of read.fasta() of seqinr package.
A named list A contains ORFs.
Each element of the list is a long string of a ORF.
e.g. A[[i]][1] or A[[i]] is the sequence of
i-th ORF.
-
Format
seq.data:
Converted from seq.string format.
A named list A contains ORFs.
Each element of the list A[[i]] is a string vector.
Each element of the vector is a codon string.
e.g. A[[i]][j] is i-th ORF and j-th codon.
-
Format
phi.Obs:
A named vector A of observed expression values and possibly
with measurement errors.
e.g. A[i] is the observed phi value of i-th ORF.
-
Format
y:
A named list A contains amino acids.
Each element of the list A[[i]] is a matrix
where ORFs are in row and synonymous codons are in column.
The element of the matrix contains codon counts.
e.g. A[[i]][j, k] is the count for i-th amino acid,
j-th ORF, and k-th synonymous codon.
-
Format
y.list:
A named list A contains ORFs.
Each element of the list A[[i]] is a named list A[[i]][[j]]
contains amino acids.
The element of amino acids list is a codon count vector.
e.g. A[[i]][[j]][k] is the count for i-th ORF,
j-th amino acid, and k-th synonymous codon.