Learn R Programming

QCAGUI (version 2.4)

calibrate: Calibrate raw data to crisp or fuzzy sets

Description

This function transforms (calibrates) the raw data to either crisp or fuzzy sets values, using the direct method of calibration.

Usage

calibrate(x, type = "crisp", thresholds = NA, include = TRUE, logistic = TRUE, idm = 0.95, ecdf = FALSE, above = 1, below = 1, ...)

Arguments

x
A numerical causal condition.
type
Calibration type, either "crisp" or "fuzzy".
thresholds
A vector of (named) thresholds.
include
Logical, include threshold(s) in the set (type = "crisp" only).
logistic
Calibrate to fuzzy sets using the logistic function.
idm
The set inclusion degree of membership for the logistic function.
ecdf
Calibrate to fuzzy sets using the empirical cumulative distribution function of the raw data.
above
Numeric (non-negative), determines the shape above crossover.
below
Numeric (non-negative), determines the shape below crossover.
...
Additional parameters, mainly for backwards compatibility.

Value

A numeric vector of set membership scores, either crisp (starting from 0 with increments of 1), or fuzzy numeric values between 0 and 1.

Details

Calibration is a transformational process from raw numerical data (interval or ratio level of measurement) to set membership scores, based on a certain number of qualitative anchors.

When type = "crisp", the process is similar to recoding the original values to a number of categories defined by the number of thresholds. For one threshold, the calibration produces two categories (intervals): 0 if below, 1 if above. For two thresholds, the calibration produces three categories: 0 if below the first threshold, 1 if in the interval between the thresholds and 2 if above the second threshold etc.

The include argument decides whether a value equal to a certain threshold is included in the interval above or left in the interval below.

When type = "fuzzy", calibration produces fuzzy set membership scores, using three anchors for the increasing or decreasing s-shaped distributions (including the logistic function), and six anchors for the increasing or decreasing bell-shaped distributions.

The argument thresholds can be specified either as a simple numeric vector, or as a named numeric vector. If used as a named vector, for the first category of s-shaped distributions, the names of the thresholds should be:

"e"
for the full set exclusion
"c"
for the set crossover
"i"
for the full set inclusion

For the second category of bell-shaped distributions, the names of the thresholds should be:

"e1"
for the first (left) threshold for full set exclusion
"c1"
for the first (left) threshold for set crossover
"i1"
for the first (left) threshold for full set inclusion
"i2"
for the second (right) threshold for full set inclusion
"c2"
for the second (right) threshold for set crossover
"e2"
for the second (right) threshold for full set exclusion

If used as a simple numerical vector, the order of the values matter.

If e $<$ c $<$ i, then the membership function is increasing from e to i. If i $<$ c $<$ e, then the membership function is decreasing from i to e.

Same for the bell-shaped distribution, if e1 $<$ c1 $<$ i1 $\le$ i2 $<$ c2 $<$ e2, then the membership function is first increasing from e1 to i1, then flat between i1 and i2, and then decreasing from i2 to e2. In contrast, if i1 $<$ c1 $<$ e1 $\le$ e2 $<$ c2 $<$ i1, then the membership function is first decreasing from i1 to e1, then flat between e1 and e2, and finally increasing from e2 to i2.

When logistic = TRUE (the default), the argument idm specifies the inclusion degree of membership for the logistic function. If logistic = FALSE, the function returns linear s-shaped or bell-shaped distributions (curved using the arguments below and above), unless activating the argument ecdf.

If there is no prior knowledge on the shape of the distribution, the argument ecdf asks the computer to determine the underlying distribution of the empirical, observed points, and the calibrated measures are found along that distribution.

Both logistic and ecdf arguments can be used only for s-shaped distributions (using 3 thresholds), and they are mutually exclusive.

The parameters below and above (active only when both logistic = TRUE and ecdf are deactivated, establish the degree of concentration and dilation (convex or concave shape) between the threshold and crossover:

0 < below < 1
dilates in a concave shape below the crossover
below = 1
produces a linear shape (neither convex, nor concave)
below > 1
concentrates in a convex shape below the crossover
0 < above < 1
dilates in a concave shape above the crossover
above = 1
produces a linear shape (neither convex, nor concave)
above > 1
concentrates in a convex shape above the crossover

Usually, below and above have equal values, unless specific reasons exist to make them different.

References

Thiem, A.; Dusa, A. (2013) Qualitative Comparative Analysis with R: A User's Guide. New York: Springer.

Thiem, A. (2014) “Membership Function Sensitivity of Descriptive Statistics in Fuzzy-Set Relations.” International Journal of Social Research Methodology vol.17, no.6, pp.625-642.

Examples

Run this code
if (require("QCA")) {

# generate heights for 100 people
# with an average of 175cm and a standard deviation of 10cm
set.seed(12345)
x <- rnorm(n = 100, mean = 175, sd = 10)


cx <- calibrate(x, thresholds = 175)
plot(x, cx, main="Binary crisp set using 1 threshold",
     xlab = "Raw data", ylab = "Calibrated data", yaxt="n")
axis(2, at = 0:1)


cx <- calibrate(x, thresholds = c(170, 180))
plot(x, cx, main="3 value crisp set using 2 thresholds",
     xlab = "Raw data", ylab = "Calibrated data", yaxt="n")
axis(2, at = 0:2)


# calibrate to a increasing, s-shaped fuzzy-set
cx <- calibrate(x, type = "fuzzy", thresholds = "e=165, c=175, i=185")
plot(x, cx, main = "Membership scores in the set of tall people", 
     xlab = "Raw data", ylab = "Calibrated data")

     
# calibrate to an decreasing, s-shaped fuzzy-set
cx <- calibrate(x, type = "fuzzy", thresholds = "i=165, c=175, e=185")
plot(x, cx, main = "Membership scores in the set of short people", 
     xlab = "Raw data", ylab = "Calibrated data")


# when not using the logistic function, linear increase
cx <- calibrate(x, type = "fuzzy", logistic = FALSE,
      thresholds = "e=165, c=175, i=185")
plot(x, cx, main = "Membership scores in the set of tall people", 
     xlab = "Raw data", ylab = "Calibrated data")


# tweaking the parameters "above" and "below" the crossover,
# at value 3.5 approximates a logistic distribution, when e=155 and i=195
cx <- calibrate(x, type = "fuzzy", logistic = FALSE, above = 3.5, below = 3.5,
      thresholds = "e=155, c=175, i=195")
plot(x, cx, main = "Membership scores in the set of tall people", 
     xlab = "Raw data", ylab = "Calibrated data")


# calibrate to a bell-shaped fuzzy set
cx <- calibrate(x, type = "fuzzy", below = 3, above = 3,
      thresholds = "e1=155, c1=165, i1=175, i2=175, c2=185, e2=195")
plot(x, cx, main = "Membership scores in the set of average height",
     xlab = "Raw data", ylab = "Calibrated data")


# calibrate to an inverse bell-shaped fuzzy set
cx <- calibrate(x, type = "fuzzy", below = 3, above = 3,
      thresholds = "i1=155, c1=165, e1=175, e2=175, c2=185, i2=195")
plot(x, cx, main = "Membership scores in the set of non-average height",
     xlab = "Raw data", ylab = "Calibrated data")


# the default values of "above" and "below" will produce a triangular shape
cx <- calibrate(x, type = "fuzzy",
      thresholds = "e1=155, c1=165, i1=175, i2=175, c2=185, e2=195")
plot(x, cx, main = "Membership scores in the set of average height",
     xlab = "Raw data", ylab = "Calibrated data")


# different thresholds to produce a linear trapezoidal shape
cx <- calibrate(x, type = "fuzzy",
      thresholds = "e1=155, c1=165, i1=172, i2=179, c2=187, e2=195")
plot(x, cx, main = "Membership scores in the set of average height",
     xlab = "Raw data", ylab = "Calibrated data")


# larger values of above and below will increase membership in or out of the set
cx <- calibrate(x, type = "fuzzy", below = 10, above = 10,
      thresholds = "e1=155, c1=165, i1=175, i2=175, c2=185, e2=195")
plot(x, cx, main = "Membership scores in the set of average height",
     xlab = "Raw data", ylab = "Calibrated data")


# while extremely large values will produce virtually crisp results
cx <- calibrate(x, type = "fuzzy", below = 10000, above = 10000,
      thresholds = "e1=155, c1=165, i1=175, i2=175, c2=185, e2=195")
plot(x, cx, main = "Binary crisp scores in the set of average height",
     xlab = "Raw data", ylab = "Calibrated data", yaxt="n")
axis(2, at = 0:1)
abline(v = c(165, 185), col = "red", lty = 2)

# check if crisp
round(cx, 0)


# using the empirical cumulative distribution function
# require manually setting logistic to FALSE
cx <- calibrate(x, type = "fuzzy", logistic = FALSE, ecdf = TRUE,
      thresholds = "e=155, c=175, i=195")
plot(x, cx, main = "Membership scores in the set of tall people", 
     xlab = "Raw data", ylab = "Calibrated data")
}

Run the code above in your browser using DataLab