Get a descriptor (fingerprints and/or CDK physical descriptors) from SMILES strings with possbility to request the scaling (for continuous descriptors, e.g. physical) or re-casting (for binary descriptors, e.g. fingerprints) of the output descriptors.
get_descriptor(smis = c("C1=CC=C(C=C1)O"), desctypes = c("standard"),
scale = F, scale_init = F, mdesc = 0, sddesc = 1, quiet = F)
is a SMILES strings vector ("C1=CC=C(C=C1)O", canonical SMILES of a phenol by default).
is a vector of characters defining the fingerprints and/or physical descriptors types to compute ("standard" by default). The actual entire list of available fingerprints: "standard", "extended", "graph", "hybridization", "maccs", "estate", "pubchem", "kr", "shortestpath" and "circular", and physical descriptors: "constitutional","topological","electronic" can be computed.
sets to TRUE (FALSE by default) for scaling the physical descriptors only (i.e. continuous features) - mean = 0, s.d. = 1.
sets to TRUE (FALSE by default) to keep in memory the means and s.d. related to each descriptor after a first scaling. Indeed, after the descriptors on a training set have been first computed, the mean and s.d. have to be kept fixed for future descriptors computation on test and/or validation sets. In this last case, the scale_init variable is set to FALSE.
is a scalar (0 by default) or vector of means for a post-scaling of physical descriptors.
is a scalar (1 by default) or vector of standard deviations for a post-scaling of physical descriptors.
keeps the console's outputs quiet if sets to TRUE (FALSE by default).
the descriptor(s) with the associated means and standard deviations for scaling.
# NOT RUN {
descriptors <- get_descriptor(smis = "C1=CC=C(C=C1)O", desctypes = c("standard","topological"))
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab