Learn R Programming

ChemmineR (version 2.24.2)

cmp.search: Search a descriptor database for compounds similar to query compound

Description

Given descriptor of a query compound and a database of compound descriptors, search for compounds that are similar to the query compound. User can limit the output by supplying a cutoff similarity score or a cutoff that limits the number of returned compounds. The function can also return the scores together with the compounds.

Usage

cmp.search(db, query, type=1, cutoff = 0.5, return.score = FALSE, quiet = FALSE, mode = 1,visualize = FALSE, visualize.browse = TRUE, visualize.query = NULL)

Arguments

db
The compound descriptor database returned by 'cmp.parse'.
query
The query descriptor, which is usually returned by 'cmp.parse1'.
type
Returns results in form of position indices (type=1), named vector with compound IDs (type=2) or data frame (type=3).
cutoff
The cutoff similarity (when cutoff <= 1)="" or="" the="" number="" of="" maximum="" compounds="" to="" be="" returned="" (when="" cutoff=""> 1).
return.score
Whether to return similarity scores. If set to TRUE, a data frame will be returned; otherwise, only the compounds' indices in the database will be returned in the order of decreasing scores.
quiet
Whether to disable progress information.
mode
Mode used when computing similarity scores. This value is passed to 'cmp.similarity'.
visualize
visualize.browse
visualize.query

Value

When 'return.score' is set to FALSE, a vector of matching compounds' indices in the database will be returned. Otherwise, a data frame will be returned:
ids
The indices of matching compounds in the database.
scores
The similarity scores between the matching compounds and the query compound

Details

'cmp.search' will go through all the compound descriptors in the database and calculate the similarity between the query compound and compounds in the database. When cutoff similarity score is set, compounds having a similarity score higher than the cutoff will be returned. When maximum number of compounds to return is set to N via 'cutoff', the compounds having the highest N similarity scores will be returned.

References

Chen X and Reynolds CH (2002). "Performance of similarity measures in 2D fragment-based similarity searching: comparison of structural descriptors and similarity coefficients", in J Chem Inf Comput Sci.

See Also

cmp.parse1, cmp.parse, cmp.search, cmp.cluster, cmp.similarity, sdf.visualize

Examples

Run this code
## Load sample SD file
# data(sdfsample); sdfset <- sdfsample

## Generate atom pair descriptor database for searching
# apset <- sdf2ap(sdfset) 

## Loads same atom pair sample data set provided by library
data(apset) 
db <- apset
query <- db[1]

## Ooptinally, save the db for future use
save(db, file="db.rda", compress=TRUE)

## Search for similar compounds using similarity cutoff
cmp.search(db, query, cutoff=0.2, type=1) # returns index
cmp.search(db, query, cutoff=0.2, type=2) # returns named vector
cmp.search(db, query, cutoff=0.2, type=3) # returns data frame

## in the next session, you may use load a saved db and do the search:
load("db.rda")
cmp.search(db, query, cutoff=3)
## you may also use the loaded db to do clustering:
cmp.cluster(db, cutoff=0.35)

Run the code above in your browser using DataLab