read.tridas: Read Tree Ring Data Standard (TRiDaS) File

Description

This function reads in a TRiDaS format XML file. Measurements, derived series and various kinds of metadata are supported.

Usage

read.tridas(fname, ids.from.titles = FALSE,
            ids.from.identifiers = TRUE, combine.series = TRUE,
            trim.whitespace = TRUE, warn.units = TRUE)

Value

A list with a variable number of components according to the contents of the input file. The possible list components are:

measurements

A data.frame or a list of data.frames with the series in columns and the years as rows. Contains measurements (<measurementSeries>) with known years. The series IDs are the column names and the years are the row names. The series IDs are derived from <title> elements in the input file. Each unique combination of <project>, <object>, <unit>, <taxon>, and <variable> gets a separate data.frame.

ids

A data.frame or a list of data.frames with columns named "tree", "core", "radius", and "measurement", together giving a unique numeric ID for each column of the data.frame(s) in measurements.

If !combine.series && (ids.from.titles || ids.from.identifiers), some rows may be non-unique.

titles

A data.frame or a list of data.frames with columns named "tree", "core", "radius", and "measurement", containing the <title> hierarchy of each column of the data.frame(s) in measurements.

wood.completeness

A data.frame or a list of data.frames containing wood completeness information. Column names are a subset of the following, almost self-explanatory set: "pith.presence", "heartwood.presence", "sapwood.presence",
"last.ring.presence", "last.ring.details", "bark.presence",
"n.sapwood", "n.missing.heartwood", "n.missing.sapwood",
"missing.heartwood.foundation", "missing.sapwood.foundation",
"n.unmeasured.inner", "n.unmeasured.outer".

unit

A character vector giving the unit of the measurements. Length equals the number of data.frames in measurements.

project.id

A numeric vector giving the project ID, i.e. the position of the corresponding <project> element), of the measurements in each data.frame in measurements. Length equals the number of data.frames.

project.title

A character vector giving the title of the project of each data.frame in measurements. Length equals the number of data.frames.

site.id

A data.frame giving the site ID (position of <object> element(s) within a <project>) of each data.frame in measurements. May have several columns to reflect the possibly nested <object> elements.

site.title

A data.frame giving the site (<object>) title of each data.frame in measurements. May have several columns to reflect the possibly nested <object> elements.

taxon

A data.frame showing the taxonomic name for each data.frame in measurements. Contains some of the following columns: "text", "lang", "normal", "normalId", "normalStd". The first two are a free-form name and its language, and the rest are related to a normalized name.

variable

A data.frame showing the measured variable of each data.frame in measurements. Contains some of the following columns: "text", "lang", "normal", "normalId", "normalStd", "normalTridas". The first two are a free-form name and its language, and the rest are related to a normalized name.

undated

A list of measurements with unknown years, together with metadata. Elements are a subset of the following:

data

A numeric vector or a list of such vectors containing measurement series

unit

A character vector giving the unit of the measurements. Length equals the number of measurement series in undated$data

ids

A data.frame with columns named "tree", "core", "radius", and "measurement", together giving a numeric ID for each measurement series in undated$data. The rows are guaranteed to be unique only when comparing measurement series with the same project.id and site.id, but not if ids.from.titles || ids.from.identifiers.

titles

A data.frame with columns named "tree", "core", "radius", and "measurement", containing the <title> hierarchy of each measurement series in undated$data

project.id

A numeric vector giving the project ID of each measurement series in undated$data

project.title

A character vector giving the project title of each measurement series in undated$data

site.id

A data.frame giving the site ID of each measurement series in undated$data

site.title

A data.frame giving the site title of each measurement series in undated$data

variable

A data.frame containing the variable of each measurement series in undated$data

taxon

A data.frame containing taxonomic names of each measurement series in undated$data

wood.completeness

A data.frame containing wood completeness information of each measurement series in undated$data

derived

A list of calculated series of values, together with metadata. Elements are a subset of the following:

data

A numeric vector or a list of such vectors containing calculated series of values.

link

A list of data.frames, one for each series in derived$data, giving links to the measurements used to form the corresponding derived series. Each data.frame has a subset of the following columns: "idRef" (reference to a series in the same file), "xLink" (URI), "identifier", and "domain" (identifier and its domain, not necessarily in the same file).

project.id

A numeric vector giving the project ID of each derived series in derived$data

A numeric vector giving the ID (order of appearance in the project) of each derived series in derived$data

title

A character vector giving the title of each derived series in derived$data

project.title

A character vector giving the project title of each derived series in derived$data

unit

A character vector giving the unit of the derived series. Length equals the number of series in derived$data.

standardizing.method

A character vector giving the standardizing method of the derived series. Length equals the number of series in derived$data.

variable

A data.frame containing the variable of each series in derived$data

type

A data.frame containing the type of various entities, and metadata related to each type element. Contents are NA where the metadata is not applicable (e.g., no tree.id when the type element refers to a project). Columns are a subset of the following:

text

The text of the type element

lang

The language of the text

normal

The normalized name of the type

normalId

The ID value of the type in the standard dictionary

normalStd

The name of the standard

project.id

The ID of the project

site.id

One or more columns with this prefix, depending on the maximum depth of the <object> hierarchy. Gives the ID of the site where the <type> element appeared.

tree.id

The ID of the tree

core.id

The ID of the core

derived.id

The ID of the derived series

project.title

The title of the project

site.title

One or more columns with this prefix, depending on the maximum depth of the <object> hierarchy. Gives the title of the site where the <type> element appeared.

tree.title

The title of the tree

core.title

The title of the core

derived.title

The title of the derived series

comments

A data.frame containing comments to various entities, and metadata related to each comments element. Contents are NA where the metadata is not applicable. Columns are a subset of the following:

text

The text of the comments element

project.id

The ID of the project

site.id

One or more columns with this prefix, depending on the maximum depth of the <object> hierarchy. Gives the ID of the site.

tree.id

The ID of the tree

core.id

The ID of the core

radius.id

The ID of the radius

measurement.id

The ID of the measurement series

derived.id

The ID of the derived series

project.title

The title of the project

site.title

One or more columns with this prefix, depending on the maximum depth of the <object> hierarchy. Gives the title of the site.

tree.title

The title of the tree

core.title

The title of the core

radius.title

The title of the radius

measurement.title

The title of the measurement series

derived.title

The title of the derived series

identifier

A data.frame containing identifiers of various entities, and metadata related to each identifier element. Contents are NA where the metadata is not applicable. Columns are a subset of the following:

text

The text of the identifier element

domain

The domain which the identifier is applicable to

project.id

The ID of the project

site.id

One or more columns with this prefix, depending on the maximum depth of the <object> hierarchy. Gives the ID of the site.

tree.id

The ID of the tree

core.id

The ID of the core

radius.id

The ID of the radius

measurement.id

The ID of the measurement series

derived.id

The ID of the derived series

project.title

The title of the project

site.title

One or more columns with this prefix, depending on the maximum depth of the <object> hierarchy. Gives the title of the site.

tree.title

The title of the tree

core.title

The title of the core

radius.title

The title of the radius

measurement.title

The title of the measurement series

derived.title

The title of the derived series

remark

A list of remarks concerning individual measured or derived values, with some of the following items:

measurements

Remarks related to measurements with a known year. A data.frame with the following columns:

text

The remark

frame

Index to a data.frame in measurements

row

Index to a row of the data.frame

col

Index to a column of the data.frame

undated

Remarks related to measurements without a known year. A data.frame with the following columns:

text

The remark

series

Index to a series in undated$data

idx

Index to a value in the series

derived

Remarks related to derived values. A data.frame with the following columns:

text

The remark

series

Index to a series in derived$data

idx

Index to a value in the series

laboratory

A data.frame or a list of data.frames with one item per project. Each data.frame contains information about the research laboratories involved in the project. Columns are a subset of the following:

name

Name of the laboratory

acronym

Acronym of the name

identifier

Identifier

domain

Domain which the identifier is applicable to

addressLine1

Address

addressLine2

Another address line

cityOrTown

City or town

stateProvinceRegion

State, province or region

postalCode

Postal code

country

Country

research

A data.frame or a list of data.frames with one item per project. Each data.frame contains information about the systems in which the research project is registered. Columns are the following:

identifier

Identifier

domain

Domain which the identifier is applicable to

description

General description

altitude

A data.frame containing the altitude of trees. Columns are the following:

metres

The altitude in metres

project.id

The ID of the project

site.id

One or more columns with this prefix, depending on the maximum depth of the <object> hierarchy. Gives the ID of the site.

tree.id

The ID of the tree

project.title

The title of the project

site.title

One or more columns with this prefix, depending on the maximum depth of the <object> hierarchy. Gives the title of the site.

tree.title

The title of the tree

preferred

A data.frame containing links to preferred measurement series. Columns are a subset of the following:

idRef

Reference to a series in the same file

xLink

URI

identifier

Identifier of a series not necessarily in the same file

domain

Domain which the identifier is applicable to

project.id

The ID of the project

site.id

One or more columns with this prefix, depending on the maximum depth of the <object> hierarchy. Gives the ID of the site.

tree.id

The ID of the tree

project.title

The title of the project

site.title

One or more columns with this prefix, depending on the maximum depth of the <object> hierarchy. Gives the title of the site.

tree.title

The title of the tree

Arguments

fname: character vector giving the file name of the TRiDaS file.
ids.from.titles: logical flag indicating whether to override the (tree, core, radius, measurement) structure imposed by the element hierarchy (element, sample, radius, measurementSeries) of the file. If TRUE, measurement series will be rearranged by matching titles in the file at the aforementioned four levels of the hierarchy. Defaults to FALSE, i.e. the element hierarchy of the file will be used.
ids.from.identifiers: logical flag indicating whether to (partially) override the element hierarchy of the file. If TRUE, measurement series will be grouped according to matching identifiers at the measurementSeries level, where identifiers are available. The changes caused by this option are applied on top of the structure imposed by the file or computed from matching titles if ids.from.titles == TRUE. Defaults to TRUE.
combine.series: logical flag indicating whether to combine two or more measurement series with the same set of (tree, core, radius, measurement) ID numbers. Each set of combined measurement series will be represented by one column of a resulting data.frame. Overlapping years of combined series do not produce a warning. If several data points are available for a given year, the function chooses one in a rather arbitrary manner. This option can only have effect when ids.from.titles || ids.from.identifiers.
trim.whitespace: logical flag indicating whether to replace repeated white spaces in the text content of the file with only one space. Defaults to TRUE, i.e. excess white space will be trimmed from the text.
warn.units: logical flag indicating whether to warn about unitless measurements and “strange” units. The function expects measurements in units that can be converted to millimetres. Defaults to TRUE: warnings will be given. For example, density measurements will trigger warnings, which can be disabled by setting this option to FALSE.

Author

Mikko Korpela

Details

The Tree Ring Data Standard (TRiDaS) is described in Jansma et. al (2010).

The parameters used for rearranging (ids.from.titles, ids.from.identifiers) and combining (combine.series) measurement series only affect the four lowest levels of document structure: element, sample, radius, measurementSeries. Series are not reorganized or combined at the upper structural levels (project, object).

References

Jansma, E., Brewer, P. W., and Zandhuis, I. (2010) TRiDaS 1.1: The tree-ring data standard. Dendrochronologia, 28(2), 99--130.

Description

Usage

Value

Arguments

Author

Details

References

See Also