Learn R Programming

feamiR (version 0.1.0)

Classification and Feature Selection for microRNA/mRNA Interactions

Description

Comprises a pipeline for predicting microRNA/mRNA interactions, as detailed in Williams, Calinescu, Mohorianu (2020) . Its input consists of [a] a messenger RNA (mRNA) dataset (either in fasta format, focused on 3' UTRs or in gtf format; for the latter, the sequences of the 3<80><99> UTRs are generated using the genomic coordinates), [b] a microRNA dataset (in fasta format, retrieved from miRBase, ) and [c] an interaction dataset (in csv format, from miRTarBase ). To characterise and predict microRNA/mRNA interactions, we use [a] statistical analyses based on Chi-squared and Fisher exact tests and [b] Machine Learning classifiers (decision trees, random forests and support vector machines). To enhance the accuracy of the classifiers we also employ feature selection approaches used in on conjunction with the classifiers. The feature selection approaches include a voting scheme for decision trees, a measure based on Gini index for random forests, forward feature selection and Genetic Algorithms on SVMs. The pipeline also includes a novel approach based on embryonic Genetic Algorithms which combines and optimises the forward feature selection and Genetic Algorithms. All analyses, including the classification and feature selection, are applicable on the microRNA seed features (default), on the full microRNA features and/or flanking features on the mRNA. The sets of features can be combined.

Copy Link

Version

Install

install.packages('feamiR')

Monthly Downloads

40

Version

0.1.0

License

GPL-2

Issues

Pull Requests

Stars

Forks

Maintainer

Eleanor Williams

Last Published

January 19th, 2021

Functions in feamiR (0.1.0)

preparedataset

Dataset preparation This step performs all preparation necessary to perform feamiR analysis, taking a set of mRNAs, a set of miRNAs and an interaction dataset and creating corresponding positive and negative datasets for ML modelling.
forwardfeatureselection

Forward Feature Selection. Performs forward feature selection on the given list of features, placing them in order of discriminative power using a given model on the given dataset up to the accuracy plateau.
feamiR

feamiR: Classification and feature selection for microRNA/mRNA interactions
decisiontree

Decision tree Trains a decision on the given training dataset and uses it to predict classification for test dataset. The resulting accuracy, sensitivity and specificity are returned, as well as a tree summary.
geneticalgorithm

Standard Genetic Algorithm. Implements a standard genetic algorithm using GA package (ga) with a fitness function specialised for feature selection.
rfgini

Random Forest cumulative MeanDecreaseGini feature selection. Implements a feature selection approach based on cumulative MeanDecreaseGini using Random Forests trained on multiple subsamples.
randomforest

Random Forest. Trains a random forest on the training dataset and uses it to predict the classification of the test dataset. The resulting accuracy, sensitivity and specificity are returned, as well as a summary of the importance of features in the dataset.
dtreevoting

Decision tree voting scheme. Implements a feature selection approach based on Decision Trees, using a voting scheme across the top levels on trees trained on multiple subsamples.
eGA

Embryonic Genetic Algorithm. Feature selection based on Embryonic Genetic Algorithms. It performs feature selection by maintaining an ongoing set of 'good' set of features which are improved run by run. It outputs training and test accuracy, sensitivity and specificity and a list of <=k features.
runallmodels

Run all models. Trains and tests Decision Tree, Random Forest and SVM models on 100 subsamples and provides a summary of the results, to select the best model. The number of trees and kernel chosen by selectsvmkernel and selectrfnumtrees should be used for SVM and Random Forest respectively. We can use this function to inform feature selection, using a Decision Tree voting scheme and a Random Forest measure based on the Gini index.
svmsigmoid

Sigmoid SVM Implements a sigmoid SVM using general svm function (for ease of use in feature selection)
selectsvmkernel

Tuning SVM kernel. Trains SVMs with a range of kernels (linear, polynomial degree 2, 3 and 4, radial and sigmoid) using cross validation so the optimal kernel can be chosen (using the resulting plots). If specified (by showplots=FALSE) the plots are saved as jpegs.
selectrfnumtrees

Tuning number of trees hyperparameter. Trains random forests with a range of number of trees so the optimal number can be identified (using the resulting plot) with cross validation
svmpolynomial4

Polynomial degree 4 SVM Implements a polynomial degree 4 SVM using the general svm function (for ease of use in feature selection)
svmradial

Radial SVM Implements a radial SVM using the general svm function (for ease of use in feature selection)
svmpolynomial3

Polynomial degree 3 SVM Implements a polynomial degree 3 SVM using the general svm function (for ease of use in feature selection)
svmpolynomial2

Polynomial degree 2 SVM Implements a polynomial degree 2 SVM using the general svm function (for ease of use in feature selection)
svm

SVM
svmlinear

Linear SVM Implements a linear SVM using the general svm function (for ease of use in feature selection)