Learn R Programming

arules (version 1.6-8)

itemCoding: Item Coding --- Conversion between Item Labels and Column IDs

Description

The order in which items are stored in an itemMatrix is called the item coding. The following generic functions and S4 methods are used to translate between the binary representation in the itemMatrix format (used in transactions, rules and itemsets), item labels and numeric item IDs (i.e., the column numbers in the binary representation).

Usage

encode(x, …)
# S4 method for list
encode(x, itemLabels, itemMatrix = TRUE)
# S4 method for character
encode(x, itemLabels, itemMatrix = TRUE)
# S4 method for numeric
encode(x, itemLabels, itemMatrix = TRUE)

compatible(x, y)

recode(x, …) # S4 method for itemMatrix recode(x, itemLabels = NULL, match = NULL) # S4 method for itemsets recode(x, itemLabels = NULL, match = NULL) # S4 method for rules recode(x, itemLabels = NULL, match = NULL)

decode(x, …) # S4 method for list decode(x, itemLabels) # S4 method for numeric decode(x, itemLabels)

Arguments

x

a vector or a list of vectors of character strings (for encode) or of numeric (for decode), or an object of class itemMatrix (for recode).

itemLabels

a vector of character strings used for coding where the position of an item label in the vector gives the item's column ID. Alternatively, a itemMatrix, transactions or associations object can be specified and the item labels or these objects are used.

itemMatrix

return an object of class itemMatrix otherwise an object of the same class as x is returned.

y

an object of class itemMatrix, transactions or associations to compare item coding to x.

match

deprecated: used itemLabels instead.

further arguments.

Value

recode always returns an object of class itemMatrix.

For encode with itemMatrix = TRUE an object of class itemMatrix is returned. Otherwise the result is of the same type as x, e.g., a list or a vector.

Details

Item compatibility: If you deal with several datasets or different subsets of the same dataset and want to combine or compate the found itemsets or rules, then you need to make sure that all transaction sets have a compatible item coding. That is, the sparse matrices representing the items have columns for the same items in exactly the same order. The coercion to transactions with as(x, "transactions") will create the item coding by adding items when they are encountered in the dataset. This can lead to different item codings (different order, missing items) for even only slightly different datasets. You can use the method compatible to check if two sets have the same item coding.

If you work with many sets, then you should first define a common item coding by creating a vector with all possible item labels and then use either encode to create transactions or recode to make a different set compatible.

The following function help with creating and changing the item coding to make them compatible.

encode converts from readable item labels to an itemMatrix using a given coding. With this method it is possible to create several compatible itemMatrix objects (i.e., use the same binary representation for items) from data.

decode converts from the column IDs used in the itemMatrix representation to item labels. decode is used by LIST.

recode recodes an itemMatrix object so its coding is compatible with another itemMatrix object specified in itemLabels (i.e., the columns are reordered to match).

See Also

LIST, associations-class, itemMatrix-class

Examples

Run this code
# NOT RUN {
data("Adult")

## Example 1: Manual decoding
## Extract the item coding as a vector of item labels.
iLabels <- itemLabels(Adult)
head(iLabels)

## get undecoded list (itemIDs)
list <- LIST(Adult[1:5], decode = FALSE)
list

## decode itemIDs by replacing them with the appropriate item label
decode(list, itemLabels = iLabels)


## Example 2: Manually create an itemMatrix using iLabels as the common item coding
data <- list(
    c("income=small", "age=Young"),
    c("income=large", "age=Middle-aged")
    )

# Option a: encode to match the item coding in Adult
iM <- encode(data, itemLabels = Adult)
iM
inspect(iM)
compatible(iM, Adult)

# Option b: coercion plus recode to make it compatible to Adult 
#           (note: the coding has 115 item columns after recode)
iM <- as(data, "itemMatrix")
iM
compatible(iM, Adult)

iM <- recode(iM, itemLabels = Adult)
iM
compatible(iM, Adult)


## Example 3: use recode to make itemMatrices compatible
## select first 100 transactions and all education-related items
sub <- Adult[1:100, itemInfo(Adult)$variables ==  "education"]
itemLabels(sub)
image(sub)

## After choosing only a subset of items (columns), the item coding is now 
## no longer compatible with the Adult dataset
compatible(sub, Adult)

## recode to match Adult again
sub.recoded <- recode(sub, itemLabels = Adult)
image(sub.recoded)


## Example 4: manually create 2 new transaction for the Adult data set
##            Note: check itemLabels(Adult) to see the available labels for items
twoTransactions <- as(
    encode(list(
        c("age=Young", "relationship=Unmarried"), 
        c("age=Senior")
      ), itemLabels = Adult),
    "transactions")

twoTransactions
inspect(twoTransactions)


## Example 5: Use a common item coding

# coercion to transactions will produce different item codings
trans1 <- as(list(
        c("age=Young", "relationship=Unmarried"), 
        c("age=Senior")
      ), "transactions")
trans1

trans2 <- as(list(
        c("age=Middle-aged", "relationship=Married"), 
        c("relationship=Unmarried", "age=Young")
      ), "transactions")
trans2

compatible(trans1, trans2)

# produce common item coding (all item labels in the two sets)
commonItemLabels <- union(itemLabels(trans1), itemLabels(trans2))
commonItemLabels

trans1 <- recode(trans1, itemLabels = commonItemLabels)
trans1
trans2 <- recode(trans2, itemLabels = commonItemLabels)
trans2

compatible(trans1, trans2)


## Example 6: manually create a rule and calculate interest measures
aRule <- new("rules", 
  lhs = encode(list(c("age=Young", "relationship=Unmarried")), 
    itemLabels = Adult),
  rhs = encode(list(c("income=small")), 
    itemLabels = Adult)
)

quality(aRule) <- interestMeasure(aRule, 
  measure = c("support", "confidence", "lift"), transactions = Adult)

inspect(aRule)
# }

Run the code above in your browser using DataLab