A LAScatalog
object is a representation of a set of las/laz files, since a computer cannot load
all the data at once. A LAScatalog
is a simple way to manage the entire dataset by reading only
the file headers. A LAScatalog
enables the user to process a large area or to
selectively clip data from a large area without loading the large area itself. A LAScatalog
can be built with the function catalog. Also a LAScatalog
contains extra information
that enables users to control how the catalog is processed (see details).
data
data.table. A table representing the header of each file.
crs
A CRS object.
cores
integer. Numer of cores used to make parallel computations in compatible functions that
support a LAScatalog
as input. Default is 1.
buffer
numeric. When applying a function to an entire catalog by sequentially processing sub-areas (clusters), some algorithms (such as grid_terrain) require a buffer around the area to avoid edge effects. Default is 15 units.
progress
logical. Display an estimation of progress while processing. Default is TRUE.
by_file
logical. This option overwrites the option tiling_size
. Instead of processing
the catalog by arbitrary split areas, it forces processing by file. Buffering around each file is
still available. Default is FALSE.
tiling_size
numeric. To process an entire catalog, the algorithm splits the dataset into several square sub-areas (called clusters) to process them sequentially. This is the size of each square cluster. Default is 1000 unit^2.
vrt
character. Path to a folder. In grid_*
functions such as grid_metrics,
grid_terrain and others, the functions can write RasterLayers
in this folder and
return a lightweight virtual raster mosaic (VRT). In other functions where it is not relevant,
it is not used.
stop_early
logical. If TRUE
the catalog processing stops if an error occurs during the
computation. If FALSE
, the catalog will be processed until the end anyway and clusters with
errors will be skipped.
opt_changed
Internal use only for compatibility with older deprecated code.
A LAScatalog
contains a slot @data that contains the useful information about the point cloud
that is used internally, as well as several other slots that contain processing options. Each
lidR
function that supports a LAScatalog
as input will respect this processing option
when it is relevant. When it is not relevant these options are not considered. Examples of some non-
relevant situations:
@vrt
options is not relevant in functions that do not rasterize the point cloud.
@tiling_size
is always respected but can be slighly modified to align the clusters with
the grid in grid_*
functions.
@buffer
is not relevant in grid_metrics because lidR
aligns the
clusters with the resolution to get a continuous output. However it is relevant in grid_terrain
to avoid edge artifacts, for example.
@cores
may not be respected if it is known internally that a single core is better
than four (no current case currently exists)
Internally, processing a catalog is almost always the same and relies on few steps:
Create a set of clusters. A cluster is the representation of a region of interest that can be buffered or not.
Loop over each cluster (in parallel or not)
For each cluster, load the points inside the region of interest in R, run some R functions, return the expected output.
Merge the outputs of the different clusters once they are all processed.
So basically, a LAScatalog
is a built in batch process with the specificity that lidR
does not loop through files but loops seamlessly through clusters that do not not necessarily match
with the files. This is why point cloud indexation with lax files may significantly speed-up the
processing.
It is important to note that buffered datasets (i.e. files that overlap each other) are not natively
supported by lidR
. When encountering such datasets the user should always filter the
overlap if possible. This is possible if the overlapping points are flagged, for example in the
'withheld' field. Otherwise lidR
will not be able to process the dataset correctly.