Learn R Programming

biogram package

This package contains tools for extraction and analysis of various n-grams (sequences of n items) derived from biological sequences (proteins or nucleic acids). To deal with the curse of dimensionality of the n-grams, biogram uses Quick Permutation Test (QuiPT) for fast feature filtering.

Installation

biogram is available on CRAN, so installation is as simple as:

install.packages("biogram")

You can install the latest development version of the code using the devtools R package.

# Install devtools, if you haven't already.
install.packages("devtools")

library(devtools)
install_github("michbur/biogram")

For citation type:

citation("biogram")

or use: Michal Burdukiewicz, Piotr Sobczyk and Chris Lauber (2016). biogram: N-Gram Analysis of Biological Sequences. R package version 1.3. https://cran.r-project.org/package=biogram

Copy Link

Version

Install

install.packages('biogram')

Monthly Downloads

305

Version

1.6.3

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Michal Burdukiewicz

Last Published

March 31st, 2020

Functions in biogram (1.6.3)

calc_kl

Calculate KL divergence of features
as.data.frame.feature_test

Coerce feature_test object to a data frame
calc_ig

Calculate IG for single feature
calc_ed

Calculate encoding distance
binarize

Binarize
biogram-package

biogram - analysis of biological sequences using n-grams
aaprop

Normalized amino acids properties
calc_criterion

Calculate value of criterion
add_1grams

Add 1-grams
calc_cs

Calculate Chi-squared-based measure
check_criterion

Check chosen criterion
calc_pi

Calculate partition index
create_ngrams

Get all possible n-Grams
full2simple

Convert encoding from full to simple format
cluster_reg_exp

Clustering of sequences based on regular expression
n2l

Convert numbers to letters
criterion_distribution

criterion_distribution class
gap_ngrams

Gap n-grams
count_total

Count total number of n-grams
count_specified

Count specified n-grams
calc_si

Compute similarity index
list2matrix

Convert list of sequences to matrix
distr_crit

Compute criterion distribution
table_ngrams

Tabulate n-grams
generate_single_region

Generate single region
degenerate_ngrams

Degenerate n-grams
degenerate

Degenerate protein sequence
generate_sequence

Generate sequence
is_ngram

Validate n-gram
test_features

Permutation test for feature selection
code_ngrams

Code n-grams
write_fasta

Write FASTA files
construct_ngrams

Construct and filter n-grams
count_multigrams

Detect and count multiple n-grams in sequences
encoding2df

Convert encoding to data frame
create_feature_target

Create feature according to given contingency matrix
l2n

Convert letters to numbers
decode_ngrams

Decode n-grams
count_ngrams

Count n-grams in sequences
get_ngrams_ind

Get indices of n-grams
generate_single_unigram

Generate single unigram
fast_crosstable

2d cross-tabulation
create_encoding

Create encoding
generate_unigrams

Generate unigrams
feature_test

feature_test class
read_fasta

Read FASTA files
cut.feature_test

Categorize tested features
position_ngrams

Position n-grams
ngrams2df

n-grams to data frame
regenerate

Regenerate n-grams
plot.criterion_distribution

Plot criterion distribution
print.feature_test

Print tested features
simple2full

Convert encoding from simple to full format
human_cleave

Human signal peptides cleavage sites
regional_param

regional_param class
validate_encoding

Validate encoding
write_encoding

Write encodings to a file
seq2ngrams

Extract n-grams from sequence
summary.feature_test

Summarize tested features