psel: Preference selection

Description

Evaluates a preference on a given dataset, i.e., return the maximal elements of a dataset for a given preference order.

Usage

psel(df, pref, ...)

psel.indices(df, pref, ...)

Arguments

A dataframe or, for a grouped preference selection, a grouped dataframe. See below for details.

pref

The preference order constructed via complex_pref and base_pref. All variables occuring in the definition of pref must be either columns

...

Additional parameters for Top(-Level)-k selections: [object Object],[object Object],[object Object],[object Object],[object Object]

Top-k preference selection

For a given top value "k" the k best elements and their level values are returned. The level values are determined as follows:

All the maxima of a dataset w.r.t. a preference have level 1.

The maxima of the remainder, i.e. the dataset without the level-1 maxima, have level 2. The n-th iteration of "Take the maxima from the remainder" leads to tuples of level n.

code

df

Grouped preference selection

With psel it is also possible to perform a preference selection where the maxima are calculated for every group seperately. The groups have to be created with group_by from the dplyr package. The preference selection preserves the grouping, i.e., the groups are restored after the preference selection. For example the summarize function from dplyr refers to the set of maxima of each group. This can be used to e.g. calculate the number of maxima in each group, see examples below. A {top, at_least, top_level} preference selection is applied to each group seperately. A top=k selection returns the k best tuples for each group. Hence if there are 3 groups in df, each containing at least 2 elements, and we have top = 2 then 6 tuples will be returned.

Parallel computation

On multicore machines the preference selection runs in parellel using a divide-and-conquer approach. If you prefer a single-threaded computation, use the following code to deactivate parallel compuation within rPref: options(rPref.parallel = FALSE) If this option is not set, rPref will use parallel computation by default.

Details

The difference between the two variants of the preference selection is:

Thepselfunction returns a subset of the dataset which are the maxima according to the given preference.
The functionpsel.indicesreturns just the row indices of the maxima (except Top-k queries withshow_level = 1, see Top-k preference selection). Hencepsel(df,pref)is equivalent todf[psel.indices(df,pref),]for non-grouped dataframes.

Examples

Run this code

# Skyline and Top-K/At-least skyline
psel(mtcars, low(mpg) * low(hp))
psel(mtcars, low(mpg) * low(hp), top = 5)
psel(mtcars, low(mpg) * low(hp), at_least = 5)

# Visualize the skyline in a plot
sky1 <- psel(mtcars, high(mpg) * high(hp))
plot(mtcars$mpg, mtcars$hp)
points(sky1$mpg, sky1$hp, lwd=3)

# Grouped preference with dplyr
library(dplyr)
psel(group_by(mtcars, cyl), low(mpg))

# Return size of each maxima group
summarise(psel(group_by(mtcars, cyl), low(mpg)), n())

Run the code above in your browser using DataLab