preprocessing_choice_regression: Preprocessing Choice Regressions

Description

Assessing the effects of preprocessing decisions on an outcome variable.

Usage

preprocessing_choice_regression(Y, choices, dataset = "UK",
  base_case_index = 128)

Arguments

A vector of length 128 (usually) containing a numeric outcome variable. This should be the preText (or other) score for a particular preprocessing specification.

choices

A 128 x 7 data.frame produced by the `factorial_preprocessing()` function and output in the `$choices` field.

dataset

The name to be given to the data we are analyzing.

base_case_index

An optional argument which removes a base case row from the choices data before performing the regression.

Value

A data.frame

Examples

Run this code

# NOT RUN {
# *** note that this function is already called in the preText() function and
# its output is returned in the results.
# load the package
library(preText)
# load in the data
data("UK_Manifestos")
# preprocess data
preprocessed_documents <- factorial_preprocessing(
    UK_Manifestos,
    use_ngrams = TRUE,
    infrequent_term_threshold = 0.02,
    verbose = TRUE)
# run preText
preText_results <- preText(
    preprocessed_documents,
    dataset_name = "Inaugural Speeches",
    distance_method = "cosine",
    num_comparisons = 100,
    verbose = TRUE)
# get regression results
reg_results <- preprocessing_choice_regression(
     preText_results$preText_scores$preText_score,
     preprocessed_documents$choices,
     dataset = "UK Manifestos",
     base_case_index = 128)
# }

Run the code above in your browser using DataLab