aux_organisecubefiles: Convert Datacube files and organise them in directory structure

Description

The function converts Omnirecs/Digos Datacube files to mseed or sac files and organises these in a coherent directory structure (see details) for available structures. The conversion depends on the gipptools software package (see details) provided externally.

Usage

aux_organisecubefiles(
  station,
  input,
  output,
  gipptools,
  format = "sac",
  pattern = "eseis",
  component = "BH",
  mode = "dir-wise",
  fringe = "constant",
  cpu,
  verbose = TRUE
)

Value

A set of converted and organised seismic files written to disk.

Arguments

station: data frame with seismic station information See aux_stationinfofile. This data frame can also be provided manually, in which case it must contain two elements: a first vector that contains the Cube IDs, and a second that contains the corresponding station IDs that will be used in the meta info and file names. If the argument is omitted, the Cube IDs will be used as station IDs.
input: Character value, path to directory where the Cube files to be processed are stored.
output: Character value, path to directory where output data is written to.
gipptools: Character value, path to gipptools or cubetools directory.
format: Character value, output file format. One out of "mseed" and "sac". Default is "sac".
pattern: Character value, file organisation scheme keyword. One out of "eseis" and "seiscomp". Default is "eseis". See details and read_data for further information.
component: Character vector, component code and output file extension prefix. It is assumed that this prefix comprises two characters, the first describing the band code, the second the instrument code. See details for further information. Default is "BH", hence broadband and high gain sensor. The spatial component ("E", "N", "Z") will be added automatically. See details for a tabular overview of common band codes and instrument codes.
mode: Character value, mode of file conversion. One out of "file-wise" and "dir-wise". Default is "file-wise". See details for further important information.
fringe: Character value, option to handle data outside the GPS-tagged time span. One out of "skip", "nominal" or "constant". Default is "constant".
cpu: Numeric value, fraction of CPUs to use for parallel processing. If omitted, one CPU is used.
verbose: Logical value, option to enable extended screen output of cubetools operations. Default is FALSE. This option might not work with Windows operating systems.

Author

Michael Dietze

Details

The function converts seismic data from the binary cube file format to mseed (cf. read_mseed) or sac (cf. read_sac) and organises the resulting files into a consistent structure, expected by 'eseis' for convenient data handling (cf. read_data).

Currently, there are two data structure schemes supported, "eseis" and "seiscomp". In the "eseis" case, the daily cube files are cut to hourly files and organised in directories structured by four digit year and three digit Julian day numbers. In each Julian day directory, the hourly files are placed and named after the following scheme: STATION.YEAR.JULIANDAY.HOUR.MINUTE.SECOND.COMPONENT.

The "seiscomp" case will yield daily files, which are organised by four digit year, seismic network, seismic station, and seismic component, each building a separate directory. In the deepest subdirectory, files are named by: NETWORK.STATION.LOCATION.COMPONENT.TYPE.YEAR.JULIANDAY.

The component naming scheme defines the codes for the sensor's band code (first letter) and instrument code (second letter). The third letter, defining the spatial component, will be added automatically. For definitions of channel codes see https://migg-ntu.github.io/SeisTomo_Tutorials/seismology/seismic-data/seismic-time-series-data.html.

The function requires that the software gipptools (http://www.gfz-potsdam.de/en/section/geophysical-deep-sounding/infrastructure/geophysical-instrument-pool-potsdam-gipp/software/gipptools/) is installed. Note that the gipptools are provided at regular update intervals, including an up to date GPS leap second table, essential to convert recently recorded files.

The Cube files will be imported in place but a series of temporary files will be created in a temporary directory in the specified output directory. Hence, if the routine stops due to a processing issue, one needs to delete the temporary data manually. The path to the temporary directory will be provided as screen output when the argument verbose = TRUE.

The Cube files can be converted in two modes: "file-wise" and "dir-wise". In "file-wise" mode, each Cube file will be converted individually. This option has the advantage that if one file in a month-long sequence of records is corrupt, the conversion will not stop, but only discard the part from the corrupted section until the file end. The disadvantage is however, that the data before the first and after the last GPS tags will not be converted unless the option fringe = "constant" (by default this is the case) is used.

In "dir-wise" mode, the fringe sample issue reduces to the margins of the total sequence of daily files but the corrupt file issue will become a more severe danger to the success when converting a large number of files.

Specifying an input directory (input) is mandatory. That directory should only contain the directories with the cube files to process. Files downloaded from a Cube are usually contained in one or more further directories, which should be moved into a single one before running this function.

Each set of cube files from a given logger should be located in a separate directory per logger and these directories should have the same name as the logger IDs (logger_ID). An appropriate structure for files from two loggers, A1A and A1B, would be something like:

input
1. A1A
  1. file1.A1A
  2. file2.A1A
2. A1B
  1. file1.A1B
  2. file2.A1B

The component definition can follow the typical keywords and key letters defined in seismology: https://migg-ntu.github.io/SeisTomo_Tutorials/seismology/seismic-data/seismic-time-series-data.html, hence the first letter indicating the instrument's band type and the second letter indicating the instrument code or instrument type.

Band code	Explanation band type
E	Extremely short period
S	Short period
H	High broad band
B	Broad band
M	Mid band
L	Long band
V	Very long band

Instrument code	Explanation
H	High gain seismometer
L	Low gain seismometer
G	Gravimeter
M	Mass position seismometer
N	Accelerometer
P	Geophone

Examples

Run this code


if (FALSE) {

## basic example with minimum effort
aux_organisecubefiles(stationfile = data.frame(logger = c("A1A", "A1B"),
                                               station = c("ST1", "ST2")), 
                      input = "input", 
                      gipptools = "software/gipptools-2023.352")

}

Run the code above in your browser using DataLab