This function is used to build new models with users' own data.
build_model(
lncRNA.seq,
mRNA.seq,
frequencies.file,
SS.features = FALSE,
lncRNA.format = "DNA",
mRNA.format = "DNA",
parallel.cores = 2,
folds.num = 10,
seed = 1,
gamma.range = (2^seq(-5, 0, 1)),
cost.range = c(1, 4, 8, 16, 24, 32),
...
)
Returns a svm model.
Long non-coding sequences. Can be a FASTA file loaded by
seqinr-package
or secondary structure
sequences file (Dot-Bracket Notation) obtained from function
run_RNAfold
. If lncRNA.seq
is secondary structure
sequences file, parameter lncRNA.format
should be defined as "SS"
.
mRNA sequences. FASTA file loaded by read.fasta
or
secondary structure sequences (Dot-Bracket Notation) obtained from function
run_RNAfold
. If mRNA.seq
is secondary structure sequences
file, parameter mRNA.format
should be defined as "SS"
.
String or a list obtained from function
make_frequencies
. Input species name "human"
,
"mouse"
or "wheat"
to use pre-build frequencies files. Or assign
a users' own frequencies file (Please refer to function
make_frequencies
for more information).
Logical. If SS.features = TRUE
, secondary structure
features will be used to build the model. In this case, lncRNA.seq
and
mRNA.seq
should be secondary structure sequences (Dot-Bracket Notation)
obtained from function run_RNAfold
and parameter
lncRNA.format
and mRNA.format
should be set as "SS"
.
String. Define the format of lncRNA.seq
. "DNA"
for DNA sequences and "SS"
for secondary structure sequences. Only when
both mRNA.format
and lncRNA.format
are set as "SS"
, can
the model with secondary structure features be built (SS.features = TRUE
).
String. Define the format of mRNA.seq
. Can be
"DNA"
or "SS"
. "DNA"
for DNA sequences and "SS"
for secondary structure sequences. When this parameter is defined as "DNA"
,
only the model without secondary structure features can be built. In this case,
parameter SS.features
should be set as FALSE
.
Integer. The number of cores for parallel computation.
By default the number of cores is 2
, users can set as -1
to run
this function with all cores. During the process of svm tuning, if the number
of parallel.cores
is more than the folds.num
(number of the folds
for cross-validation), the number of parallel.cores
will be set as
folds.num
automatically.
Integer. Specify the number of folds for cross-validation.
(Default: 10
)
Integer. Used to set the seed for cross-validation. (Default: 1
)
The range of gamma. (Default: 2 ^ seq(-5, 0, 1)
)
The range of cost. (Default: c(1, 4, 8, 16, 24, 32)
)
Additional arguments passed to function svm_tune
for customised SVM model training.
HAN Siyu
This function is used to build a new model with users' own sequences.
Users can use function lnc_finder
to predict the sequences with
new models.
For the details of frequencies.file
, please refer to function
make_frequencies
.
For the details of the features, please refer to function
extract_features
.
For the details of svm tuning, please refer to function svm_tune
.
Siyu Han, Yanchun Liang, Qin Ma, Yangyi Xu, Yu Zhang, Wei Du, Cankun Wang & Ying Li. LncFinder: an integrated platform for long non-coding RNA identification utilizing sequence intrinsic composition, structural information, and physicochemical property. Briefings in Bioinformatics, 2019, 20(6):2009-2027.
make_frequencies
, lnc_finder
,
extract_features
, svm_tune
,
svm
.