splitPathFile: Analyze pathfile-strings

Description

splitPathFile splits a vector of pathfile-strings into path- and file-components without loss of information. unsplitPathFile restores the original pathfile-string vector. standardPathFile standardizes a vector of pathfile-strings: backslashes are replaced by slashes, except for the first two leading backslashes indicating a network share. tempPathFile returns - similar to tempfile - a vector of filenames given path(s) and file-prefix(es) and an optional extension. fftempfile returns - similar to tempPathFile - a vector of filenames following a vector of pathfile patterns that are intrepreted in a ff-specific way.

Usage

splitPathFile(x)
unsplitPathFile(splitted)
standardPathFile(x)
tempPathFile(splitted=NULL, path=splitted$path, prefix=splitted$file, extension=NULL)
fftempfile(x)

Arguments

a character vector of pathfile strings

splitted

a return value from splitPathFile

path

a character vector of path components

prefix

a character vector of file components

extension

optional extension like "csv" (or NULL)

Value

A list with components

path

a character vector of path components

fsep

a character vector of file separators or ""

file

a character vector of file components

Details

dirname and basename remove trailing file separators and therefore cannot distinguish pathfile string that contains ONLY a path from a pathfile string that contains a path AND file. Therefore file.path(dirname(pathfile), basename(pathfile)) cannot always restore the original pathfile string.

splitPathFile decomposes each pathfile string into three parts: a path BEFORE the last file separator, the file separator, the filename component AFTER the last file separator. If there is no file separator in the string, splitPathFile tries to guess whether the string is a path or a file component: ".", ".." and "~" are recognized as path components. No tilde expansion is done, see path.expand. Backslashes are converted to the current .Platform$file.sep using splitPathFile except for the first two leading backslashes indicating a network share.

unsplitPathFile restores the original pathfile-string vector up to translated backslashes.

tempPathFile internally uses tempfile to create its filenames, if an extension is given it repeats filename creation until none of them corresponds to an existing file.

fftempfile takes a path-prefix pattern as input, splits it, will replace an empty path by getOption("fftempdir") and will use getOption("ffextension") as extension.

Examples

Run this code

# NOT RUN {
  pathfile <- c("", ".", "/.", "./", "./.", "/"
  , "a", "a/", "/a", "a/a", "./a", "a/.", "c:/a/b/c", "c:/a/b/c/"
  , "..", "../", "/..", "../..", "//", "\\\\a\\", "\\\\a/"
  , "\\\\a/b", "\\\\a/b/", "~", "~/", "~/a", "~/a/")
  splitted <- splitPathFile(pathfile)
  restored <- unsplitPathFile(splitted)
  stopifnot(all(gsub("\\\\","/",restored)==gsub("\\\\","/",pathfile)))

  dirnam <- dirname(pathfile)
  basnam <- basename(pathfile)

  db  <- file.path(dirnam,basnam)
  ident = gsub("\\\\","/",db) == gsub("\\\\","/",pathfile)
  sum(!ident)

  do.call("data.frame", c(list(ident=ident, pathfile=pathfile
   , dirnam=dirnam, basnam=basnam), splitted))

  
# }
# NOT RUN {
    message("show the difference between tempfile and fftempfile")
    do.call("data.frame", c(list(ident=ident, pathfile=pathfile, dirnam=dirnam, basnam=basnam)
, splitted, list(filename=tempPathFile(splitted), fftempfile=fftempfile(pathfile))))

    message("for a single string splitPathFile is slower, 
for vectors of strings it scales much better than dirname+basename")

    system.time(for (i in 1:1000){
      d <- dirname(pathfile)
      b <- basename(pathfile)
    })
    system.time(for (i in 1:1000){
      s <- splitPathFile(pathfile)
    })

    len <- c(1,10,100,1000)
    timings <- matrix(0, 2, length(len), dimnames=list(c("dir.base.name", "splitPathFile"), len))
    for (j in seq(along=len)){
      l <- len[j]
      r <- 10000 / l
      x <- rep("\\\\a/b/", l)
      timings[1,j] <- system.time(for (i in 1:r){
          d <- dirname(x)
          b <- basename(x)
        })[3]
      timings[2,j] <- system.time(for (i in 1:r){
          s <- splitPathFile(x)
        })[3]
    }
    timings
  
# }

Run the code above in your browser using DataLab