DFA: Discriminant function analysis

Description

Produces SPSS- and SAS-like output for linear discriminant function analysis.

Usage

DFA(data, groups, variables, plot, predictive, priorprob, covmat_type, CV, verbose)

Value

If verbose = TRUE, the displayed output includes descriptive statistics for the groups, tests of univariate and multivariate normality, the results of tests of the homogeneity of the group variance-covariance matrices, eigenvalues & canonical correlations, Wilks lambda & peel-down statistics, raw and standardized discriminant function coefficients, structure coefficients, functions at group centroids, one-way ANOVA tests of group differences in scores on each discriminant function, one-way ANOVA tests of group differences in scores on each original DV, significance tests for group differences on the original DVs according to Bird et al. (2014), a plot of the group means on the standardized discriminant functions, and extensive output from predictive discriminant function analyses (if requested).

The returned output is a list with elements

rawCoef: canonical discriminant function coefficients
structCoef: structure coefficients
standCoef: standardized coefficients
standCoefSPSS: standardized coefficients from SPSS
centroids: unstandardized canonical discriminant functions evaluated at the group means
centroidSDs: group standard deviations on the unstandardized functions
centroidsZ: standardized canonical discriminant functions evaluated at the group means
centroidSDsZ: group standard deviations on the standardized functions
DFAscores: scores on the discriminant functions
anovaDFoutput: One-way ANOVAs using the scores on a discriminant function as the DV
anovaDVoutput: One-way ANOVAs on the original DVs
MFWER1.sigtest: Significance tests when controlling the MFWER by (only) carrying out multiple t tests
MFWER2.sigtest: Significance tests for the two-stage approach to controling the MFWER
dfa_class: The predicted group classifications
posteriors: The posterior probabilities for the predicted group classifications
freqs_OR: Cross-tabulation of the original and predicted group memberships
PropOrigCorrect: Proportion of original grouped cases correctly classified
chi_square_OR: Chi-square test of independence
PressQ_OR: Press's Q significance test of classifiation accuracy for original vs. predicted group memberships
rowfreqs_OR: Row Frequencies
colfreqs_OR: Column Frequencies
cellprops_OR: Cell Proportions
rowprops_OR: Row-Based Proportions
colprops_OR: Column-Based Proportions
kappas_cvo_OR: Agreement (kappas) between the predicted and original group memberships
dfa_class_CV: Classifications from leave-one-out cross-validations
freqs_CV: Cross-Tabulation of the cross-validated and predicted group memberships
PropCrossValCorrect: Proportion of cross-validated grouped cases correctly classified
chi_square_CV: Chi-square test of indepedence
PressQ_CV: Press's Q significance test of classifiation accuracy for cross-validated vs. predicted group memberships
rowfreqs_CV: Row frequencies
colfreqs_CV: Column frequencies
cellprops_CV: Cell proportions
rowprops_CV: Row-based proportions
colprops_CV: Column-based proportions
kappas_cvoCV: Agreement (kappas) between the cross-validated and original group memberships
kappas_CVP: Agreement (kappas) between the cross-validated and predicted group memberships

Arguments

data: A dataframe where the rows are cases & the columns are the variables.
groups: The name of the groups variable in the dataframe,
e.g., groups = 'Group'.
variables: The names of the continuous variables in the dataframe that will be used in the DFA, e.g., variables = c('varA', 'varB', 'varC').
plot: Should a plot of the mean standardized discriminant function scores
for the groups be produced? The options are: TRUE (default) or FALSE.
predictive: Should a predictive DFA be conducted?
The options are: TRUE (default) or FALSE.
priorprob: If predictive = TRUE, how should the prior probabilities of the group sizes be computed? The options are:
'EQUAL' for equal group sizes; or
'SIZES' (default) for the group sizes to be based on the sizes of the groups in the dataframe.
covmat_type: The kind of covariance to be used for a predictive DFA. The options are:
'within' (for the pooled within-groups covariance matrix, which is the default) or
'separate' (for separate-groups covariance matrices).
CV: If predictive = TRUE, should cross-validation (leave-one-out cross-validation) analyses also be conducted? The options are: TRUE (default) or FALSE.
verbose: Should detailed results be displayed in console?
The options are: TRUE (default) or FALSE.

Author

Brian P. O'Connor

Details

The predictive DFA option using separate-groups covariance matrices (which is often called 'quadratic DFA') is conducted following the procedures described by Rencher (2002). The covariance matrices in this case are based on the scores on the continuous variables. In contrast, the 'separate-groups' option in SPSS involves use of the group scores on the discriminant functions (not the original continuous variables), which can produce different classifications.

See the documentation below for the GROUP.DIFFS function for information on the interpretation of the Bayes factors and effect sizes that are produced for the group comparisons.

References

Bird, K. D., & Hadzi-Pavlovic, D. (2013). Controlling the maximum familywise Type I error rate in analyses of multivariate experiments. Psychological Methods, 19(2), p. 265-280.

Manly, B. F. J., & Alberto, J. A. (2017). Multivariate statistical methods: A primer (4th Edition). Chapman & Hall/CRC, Boca Raton, FL.

Rencher, A. C. (2002). Methods of Multivariate Analysis (2nd ed.). New York, NY: John Wiley & Sons.

Sherry, A. (2006). Discriminant analysis in counseling research. Counseling Psychologist, 34, 661-683.

Tabachnik, B. G., & Fidell, L. S. (2019). Using multivariate statistics (7th ed.). New York, NY: Pearson.

Examples

Run this code

DFA_Field=DFA(data = data_DFA_Field, 
    groups = 'Group', 
    variables = c('Actions','Thoughts'),
    predictive = TRUE, priorprob = 'SIZES', 
    covmat_type='separate', # altho better to used 'separate' for these data
    verbose = TRUE)

# \donttest{

# plots of posterior probabilities by group
# hoping to see correct separations between cases from different groups

# first, display the posterior probabilities
print(cbind(round(DFA_Field$posteriors[1:3],3), DFA_Field$posteriors[4]))

# group NT vs CBT
plot(DFA_Field$posteriors$posterior_NT, DFA_Field$posteriors$posterior_CBT, 
     pch = 16, col = c('red', 'blue', 'green')[DFA_Field$posteriors$Group],
     xlim=c(0,1), ylim=c(0,1),
     main = 'DFA Posterior Probabilities by Original Group Memberships',
     xlab='Posterior Probability of Being in Group NT',
     ylab='Posterior Probability of Being in Group CBT' )
legend(x=.8, y=.99, c('CBT','BT','NT'), cex=1.2, col=c('red', 'blue', 'green'), pch=16, bty='n')

# group NT vs BT
plot(DFA_Field$posteriors$posterior_NT, DFA_Field$posteriors$posterior_BT, 
     pch = 16, col = c('red', 'blue', 'green')[DFA_Field$posteriors$Group],
     xlim=c(0,1), ylim=c(0,1),
     main = 'DFA Posterior Probabilities by Group Membership',
     xlab='Posterior Probability of Being in Group NT',
     ylab='Posterior Probability of Being in Group BT' )
legend(x=.8, y=.99, c('CBT','BT','NT'), cex=1.2,col=c('red', 'blue', 'green'), pch=16, bty='n')

# group CBT vs BT
plot(DFA_Field$posteriors$posterior_CBT, DFA_Field$posteriors$posterior_BT, 
     pch = 16, col = c('red', 'blue', 'green')[DFA_Field$posteriors$Group],
     xlim=c(0,1), ylim=c(0,1),
     main = 'DFA Posterior Probabilities by Group Membership',
     xlab='Posterior Probability of Being in Group CBT',
     ylab='Posterior Probability of Being in Group BT' )
legend(x=.8, y=.99, c('CBT','BT','NT'), cex=1.2, col=c('red', 'blue', 'green'), pch=16, bty='n')


DFA_Sherry <- DFA(data = data_DFA_Sherry, 
                  groups = 'Group',
                  variables = c('Neuroticism','Extroversion','Openness', 
                                'Agreeableness','Conscientiousness'),
                  predictive = TRUE, priorprob = 'SIZES', 
                  covmat_type='separate', 
                  verbose = TRUE)

# plots of posterior probabilities by group
# hoping to see correct separations between cases from different groups

# first, display the posterior probabilities
print(cbind(round(DFA_Sherry$posteriors[1:3],3), DFA_Sherry$posteriors[4]))

# group 1 vs 2
plot(DFA_Sherry$posteriors$posterior_1, DFA_Sherry$posteriors$posterior_2, 
     pch = 16, cex = 1, col = c('red', 'blue', 'green')[DFA_Sherry$posteriors$Group],
     xlim=c(0,1), ylim=c(0,1),
     main = 'DFA Posterior Probabilities by Original Group Memberships',
     xlab='Posterior Probability of Being in Group 1',
     ylab='Posterior Probability of Being in Group 2' )
legend(x=.8, y=.99, c('1','2','3'), cex=1.2, col=c('red', 'blue', 'green'), pch=16, bty='n')

# group 1 vs 3
plot(DFA_Sherry$posteriors$posterior_1, DFA_Sherry$posteriors$posterior_3, 
     pch = 16, col = c('red', 'blue', 'green')[DFA_Sherry$posteriors$Group],
     xlim=c(0,1), ylim=c(0,1),
     main = 'DFA Posterior Probabilities by Group Membership',
     xlab='Posterior Probability of Being in Group 1',
     ylab='Posterior Probability of Being in Group 3' )
legend(x=.8, y=.99, c('1','2','3'), cex=1.2,col=c('red', 'blue', 'green'), pch=16, bty='n')

# group 2 vs 3
plot(DFA_Sherry$posteriors$posterior_2, DFA_Sherry$posteriors$posterior_3, 
     pch = 16, col = c('red', 'blue', 'green')[DFA_Sherry$posteriors$Group],
     xlim=c(0,1), ylim=c(0,1),
     main = 'DFA Posterior Probabilities by Group Membership',
     xlab='Posterior Probability of Being in Group 2',
     ylab='Posterior Probability of Being in Group 3' )
legend(x=.8, y=.99, c('1','2','3'), cex=1.2, col=c('red', 'blue', 'green'), pch=16, bty='n')


    
# data from Tabachnik & Fiddel (2019, p 307)
table9.1 <- '
1  87  5 31 6.4
1  97  7 36 8.3
1 112  9 42 7.2
2 102 16 45 7.0
2  85 10 38 7.6
2  76  9 32 6.2
3 120 12 30 8.4
3  85  8 28 6.3
3  99  9 27 8.2'
table9.1 <- data.frame(read.table(text=table9.1, 
                       col.names=c('group','perf','info','verbexp','age')))

DFA(data = table9.1, 
    groups = 'group', 
    variables = c('perf','info','verbexp','age'),
    predictive = TRUE, priorprob = 'SIZES', covmat_type='within', 
    verbose = TRUE)  
# }

Run the code above in your browser using DataLab