The Hadoop Distributed File System (HDFS) is typically part of a Hadoop
cluster or can be used as a stand-alone general purpose distributed file
system (DFS). Several high-level functions provide easy access to
distributed storage.
DFS_cat is useful for producing output in user-defined
functions. It reads from files on the DFS and typically prints the
output to the standard output. Its behaviour is similar to the base
function cat.
DFS_dir_create creates directories with the given path names if
they do not already exist. It's behaviour is similar to the base
function dir.create.
DFS_dir_exists and DFS_file_exists return a logical
vector indicating whether the directory or file respectively named by
its argument exist. See also function file.exists.
DFS_dir_remove attempts to remove the directory named in its
argument and if recursive is set to TRUE also attempts
to remove subdirectories in a recursive manner.
DFS_list produces a character vector of the names of files
in the directory named by its argument.
DFS_read_lines is a reader for (plain text) files stored on the
DFS. It returns a vector of character strings representing lines in
the (text) file. If n is given as an argument it reads that
many lines from the given file. It's behaviour is similar to the base
function readLines.
DFS_put copies files named by its argument to a given path in
the DFS.
DFS_put_object serializes an R object to the DFS.
DFS_write_lines writes a given vector of character strings to a
file stored on the DFS. It's behaviour is similar to the base
function writeLines.