Learn R Programming

enviPick (version 1.5)

mzpart: Divisive partitioning of raw LC-HRMS measurements

Description

Divisive recursive partition of LC-HRMS measurements. Preparatory step for mzclust and mzpick; altenative to mzagglom. Requires an MSlist initilialized by readMSdata as input.

Usage

mzpart(MSlist, dmzgap = 10, drtgap = 500, ppm = TRUE, minpeak = 4, peaklimit = 2500, cutfrac = 0.1, drtsmall=50, progbar = FALSE, stoppoints = 2e+05)

Arguments

MSlist
MSlist generated by readMSdata
dmzgap
m/z gap width for partitioning
drtgap
RT gap width for partitioning
ppm
dmzgap given in ppm (TRUE) or as absolute value (FALSE)?
minpeak
Minimum number of measurements in a partition
peaklimit
Maximum number of measurements in a partition
cutfrac
Fraction of low density measurements to be discarded
drtsmall
RT tolerance used to estimate density
progbar
For debugging, ignore
stoppoints
For debugging, ignore

Value

Returns the argument MSlist, with entries made:
Parameters
MSlist[[2]]: saves the parameter settings.
Scans
MSlist[[4]]: matrix with raw measurements and tags resorted for partitions.
Partition_Index
MSlist[[5]]: Index assigning partitions to sections in the raw measurement of MSlist[[4]]; required for fast (random) access.

Imbecile

Do not set minpeak bigger than its counterpart in mzclust or mzpick. Too complicated? Then rather use enviPickwrap for adjusting all function arguments.

Warning

Despite optimized code, this function has a potential to run for a intolerable long time or out of memory if (a) the parameters are set wrongly, (b) the .mzML/.mzXML-file was not centroided or (c) the underlying data is inadequate for this peak picker. With regards to (a), do not assume gaps being larger than actually present. Instead, use plotMSlist to have a look at your data contained in MSlist after upload with readMSdata; set progbar=TRUE to monitor where a function fails. Once settled, set progbar=FALSE for faster execution. To avoid running out of memory, stoppoints sets the maximum number of measurements that can be handled in the routines to delete those of lowest intensity (in cases where peaklimit cannot be reached by partitioning by dmzgap and drtgap alone). If above stoppoints, execution aborts.

Details

This function searchs recursively for gaps in retention time (RT) and m/z in the LC-HRMS measurements and thus partitions (and resorts) the matrix contained in MSlist[[4]]. If neither partitioning by RT nor by m/z results in a small enough partition of <= peaklimit measurements, a fraction cutfrac of lowest-density measurements is discarded and the partition procedure resumed. Measurement-wise density is based on a gaussian kernel density estimate scaled to dmzgap and drtsmall, i.e., to the local neighbourhood of each measurement.

Partitioning is necessary to speed up the clustering procedure of mzclust. Hence, there is a trade-off: large values of peaklimit leads to faster execution of mzpart but to slower computation of mzclust and vice versa.

See Also

mzclust