exportDDI: Export to a DDI metadata file

Description

This function creates a DDI version 2.5, XML file structure.

Usage

exportDDI(codebook, file = "", embed = TRUE, OS = "", indent = 4)

Arguments

codebook

A list object containing the metadata, or a path to a directory where these objects are located, for batch processing

file

either a character string naming a file or a connection open for writing. "" indicates output to the console.

embed

Embed the CSV datafile in the XML file, if present.

The target operating system, for the eol - end of line character(s)

indent

Indent width, in number of spaces

Value

An XML file containing a DDI version 2.5 metadata.

Details

The information object is essentially a list having two main list components:

- fileDscr, if the data is provided in a subcomponent named datafile

- dataDscr, having as many components as the number of variables in the (meta)data. For each variable, there should a mandatory subcomponent called label (that contains the variable's label) and, if the variable is of a categorical type, another subcomponent called values.

Additional informations about the variables can be specified as further subcomponents, combining DDI specific data but also other information that might not be covered by DDI:

- measurement is the equivalent of the specific DDI attribute nature of the var element, and it accepts these values: "nominal", "ordinal", "interval", "ratio", "percent", and "other".

- type is useful for multiple reasons. A first one, if the variable is numerical, is to differentiate between discrete and contin values of the attribute intrvl from the same DDI element var. Another reason is to help identifying pure string variables (containing text), when the subcomponent type is equal to "char". It is also used for the subelement varFormat of the element var. Finally, another reason is to differentiate between pure categorical ("cat") and pure numerical ("num") variables, as well as mixed ones, among which "numcat" referring to a numerical variable with very few values (such as the number of children), for which it is possible to also produce a table of frequencies along the numerical summaries. There are also categorical variables that can be interpreted as numeric ("catnum"), such as a Likert type response scale with 7 values, where numerical summaries are also routinely performed along with the usual table of frequencies.

- missing is an important subcomponent, indicating which of the values in the variable are going to be treated as missing values, and it is going to be exported as the attribute missing of the DDI subelement catgry.

There are many more possible attributes and DDI elements to be added in the information object, future versions of this function will likely expand.

For the moment, only DDI codebook version 2.5 is exported, but DDI Lifecycle is also possible.

The argument OS can be either: "windows" (default), or "Windows", "Win", "win", "MacOS", "Darwin", "Apple", "Mac", "mac", "Linux", "linux".

The end of line separator changes only when the target OS is different from the running OS.

The argument indent controls how many spaces will be used in the XML file, to indent the different subelements.

Examples

Run this code

# NOT RUN {
codeBook <- list(dataDscr = list(
ID = list(
    label = "Questionnaire ID",
    type = "num",
    measurement = "interval"
),
V1 = list(
    label = "Label for the first variable",
    labels = c(
        "No"             =  0, 
        "Yes"            =  1,
        "Not applicable" = -97,
        "Not answered"   = -99),
    na_values = c(-99, -97),
    type = "cat",
    measurement = "nominal"
),
V2 = list(
    label = "Label for the second variable",
    labels = c(
        "Very little"    =  1, 
        "Little"         =  2,
        "So, so"         =  3,
        "Much"           =  4,
        "Very much"      =  5,
        "Don't know"     = -98),
    na_values = c(-98),
    type = "cat",
    measurement = "ordinal"
),
V3 = list(
    label = "Label for the third variable",
    labels = c(
        "First answer"   = "A", 
        "Second answer"  = "B",
        "Don't know"     = -98),
    na_values = c(-98),
    type = "cat",
    measurement = "nominal"
),
V4 = list(
    label = "Number of children",
    labels = c(
        "Don't know"     = -98,
        "Not answered"   = -99),
    na_values = c(-99, -98),
    type = "numcat",
    measurement = "ratio"
),
V5 = list(
    label = "Political party reference",
    type = "char",
    txt = "When the respondent indicated his political party reference, his/her open response
was recoded on a scale of 1-99 with parties with a left-wing orientation coded on the low end
of the scale and parties with a right-wing orientation coded on the high end of the scale.
Categories 90-99 were reserved miscellaneous responses."
)))

# }
# NOT RUN {
exportDDI(codeBook, file = "codebook.xml")
# }
# NOT RUN {
# }