solr_mlt: Solr "more like this" search

Description

Solr "more like this" search

Usage

solr_mlt(q = "*:*", fq = NULL, mlt.count = NULL, mlt.fl = NULL, mlt.mintf = NULL, mlt.mindf = NULL, mlt.minwl = NULL, mlt.maxwl = NULL, mlt.maxqt = NULL, mlt.maxntp = NULL, mlt.boost = NULL, mlt.qf = NULL, fl = NULL, wt = "json", start = 0, rows = NULL, key = NULL, base = NULL, callopts = list(), raw = FALSE, parsetype = "df", concat = ",", verbose = TRUE)

Arguments

Query terms, defaults to '*:*', or everything.

Filter query, this does not affect the search, only what gets returned

mlt.count

The number of similar documents to return for each result. Default is 5.

mlt.fl

The fields to use for similarity. NOTE: if possible these should have a stored TermVector DEFAULT_FIELD_NAMES = new String[] "contents"

mlt.mintf

Minimum Term Frequency - the frequency below which terms will be ignored in the source doc. DEFAULT_MIN_TERM_FREQ = 2

mlt.mindf

Minimum Document Frequency - the frequency at which words will be ignored which do not occur in at least this many docs. DEFAULT_MIN_DOC_FREQ = 5

mlt.minwl

minimum word length below which words will be ignored. DEFAULT_MIN_WORD_LENGTH = 0

mlt.maxwl

maximum word length above which words will be ignored. DEFAULT_MAX_WORD_LENGTH = 0

mlt.maxqt

maximum number of query terms that will be included in any generated query. DEFAULT_MAX_QUERY_TERMS = 25

mlt.maxntp

maximum number of tokens to parse in each example doc field that is not stored with TermVector support. DEFAULT_MAX_NUM_TOKENS_PARSED = 5000

mlt.boost

[true/false] set if the query will be boosted by the interesting term relevance. DEFAULT_BOOST = false

mlt.qf

Query fields and their boosts using the same format as that used in DisMaxQParserPlugin. These fields must also be specified in mlt.fl.

Fields to return. We force 'id' to be returned so that there is a unique identifier with each record.

Data type returned, defaults to 'json'

start

Record to start at, default to beginning.

rows

Number of records to return. Defaults to 10.

key

API key, if needed.

base

URL endpoint.

callopts

Call options passed on to httr::GET

raw

(logical) If TRUE, returns raw data in format specified by wt param

parsetype

(character) One of 'list' or 'df'

concat

(character) Character to concatenate elements of longer than length 1. Note that this only works reliably when data format is json (wt='json'). The parsing is more complicated in XML format, but you can do that on your own.

verbose

If TRUE (default) the url call used printed to console.

Value

XML, JSON, a list, or data.frame

References

See http://wiki.apache.org/solr/MoreLikeThis for more information.

Examples

Run this code

## Not run: 
# url <- 'http://api.plos.org/search'
# 
# solr_mlt(q='*:*', mlt.count=2, mlt.fl='abstract', fl='score', base=url,
#    fq="doc_type:full")
# solr_mlt(q='*:*', rows=2, mlt.fl='title', mlt.mindf=1, mlt.mintf=1, fl='alm_twitterCount',
#    base=url)
# solr_mlt(q='title:"ecology" AND body:"cell"', mlt.fl='title', mlt.mindf=1, mlt.mintf=1,
#    fl='counter_total_all', rows=5, base=url)
# solr_mlt(q='ecology', mlt.fl='abstract', fl='title', rows=5, base=url)
# solr_mlt(q='ecology', mlt.fl='abstract', fl=c('score','eissn'), rows=5, base=url)
# solr_mlt(q='ecology', mlt.fl='abstract', fl=c('score','eissn'), rows=5, base=url)
# 
# # get raw data, and parse later if needed
# out=solr_mlt(q='ecology', mlt.fl='abstract', fl='title', rows=2, base=url,
#    raw=TRUE)
# library(rjson)
# fromJSON(out)
# solr_parse(out, "df")
# ## End(Not run)

Run the code above in your browser using DataLab