This function is used to build new models with users' own data.
build_model(
  lncRNA.seq,
  mRNA.seq,
  frequencies.file,
  SS.features = FALSE,
  lncRNA.format = "DNA",
  mRNA.format = "DNA",
  parallel.cores = 2,
  folds.num = 10,
  seed = 1,
  gamma.range = (2^seq(-5, 0, 1)),
  cost.range = c(1, 4, 8, 16, 24, 32),
  ...
)Returns a svm model.
Long non-coding sequences. Can be a FASTA file loaded by
seqinr-package or secondary structure
sequences file (Dot-Bracket Notation) obtained from function
run_RNAfold. If lncRNA.seq is secondary structure
sequences file, parameter lncRNA.format should be defined as "SS".
mRNA sequences. FASTA file loaded by read.fasta or
secondary structure sequences (Dot-Bracket Notation) obtained from function
run_RNAfold. If mRNA.seq is secondary structure sequences
file, parameter mRNA.format should be defined as "SS".
String or a list obtained from function
make_frequencies. Input species name "human",
"mouse" or "wheat" to use pre-build frequencies files. Or assign
a users' own frequencies file (Please refer to function
make_frequencies for more information).
Logical. If SS.features = TRUE, secondary structure
features will be used to build the model. In this case, lncRNA.seq and
mRNA.seq should be secondary structure sequences (Dot-Bracket Notation)
obtained from function run_RNAfold and parameter
lncRNA.format and mRNA.format should be set as "SS".
String. Define the format of lncRNA.seq. "DNA"
for DNA sequences and "SS" for secondary structure sequences. Only when
both mRNA.format and lncRNA.format are set as "SS", can
the model with secondary structure features be built (SS.features = TRUE).
String. Define the format of mRNA.seq. Can be
"DNA" or "SS". "DNA" for DNA sequences and "SS"
for secondary structure sequences. When this parameter is defined as "DNA",
only the model without secondary structure features can be built. In this case,
parameter SS.features should be set as FALSE.
Integer. The number of cores for parallel computation.
By default the number of cores is 2, users can set as -1 to run
this function with all cores. During the process of svm tuning, if the number
of parallel.cores is more than the folds.num (number of the folds
for cross-validation), the number of parallel.cores will be set as
folds.num automatically.
Integer. Specify the number of folds for cross-validation.
(Default: 10)
Integer. Used to set the seed for cross-validation. (Default: 1)
The range of gamma. (Default: 2 ^ seq(-5, 0, 1))
The range of cost. (Default: c(1, 4, 8, 16, 24, 32))
Additional arguments passed to function svm_tune for customised SVM model training.
HAN Siyu
This function is used to build a new model with users' own sequences.
Users can use function lnc_finder to predict the sequences with
new models.
For the details of frequencies.file, please refer to function
make_frequencies.
For the details of the features, please refer to function
extract_features.
For the details of svm tuning, please refer to function svm_tune.
Siyu Han, Yanchun Liang, Qin Ma, Yangyi Xu, Yu Zhang, Wei Du, Cankun Wang & Ying Li. LncFinder: an integrated platform for long non-coding RNA identification utilizing sequence intrinsic composition, structural information, and physicochemical property. Briefings in Bioinformatics, 2019, 20(6):2009-2027.
make_frequencies, lnc_finder,
         extract_features, svm_tune,
         svm.