file(description = "", open = "", blocking = TRUE, encoding = getOption("encoding"), raw = FALSE, method = getOption("url.method", "default"))
url(description, open = "", blocking = TRUE, encoding = getOption("encoding"), method = getOption("url.method", "default"))
gzfile(description, open = "", encoding = getOption("encoding"), compression = 6)
bzfile(description, open = "", encoding = getOption("encoding"), compression = 9)
xzfile(description, open = "", encoding = getOption("encoding"), compression = 6)
unz(description, filename, open = "", encoding = getOption("encoding"))
pipe(description, open = "", encoding = getOption("encoding"))
fifo(description, open = "", blocking = FALSE, encoding = getOption("encoding"))
socketConnection(host = "localhost", port, server = FALSE, blocking = FALSE, open = "a+", encoding = getOption("encoding"), timeout = getOption("timeout"))
open(con, ...)
"open"(con, open = "r", blocking = TRUE, ...)
close(con, ...)
"close"(con, type = "rw", ...)
flush(con)
isOpen(con, rw = "")
isIncomplete(con)
c("default", "internal", "wininet", "libcurl")
:see Details.
xzfile
can also be negative: see the Compression
section."read"
or "write"
,
partial matches allowed.file
, pipe
, fifo
, url
, gzfile
,
bzfile
, xzfile
, unz
and socketConnection
return a connection object which inherits from class
"connection"
and has a first more specific class.open
and flush
return NULL
, invisibly.close
returns either NULL
or an integer status,
invisibly. The status is from when the connection was last closed and
is available only for some types of connections (e.g., pipes, files and
fifos): typically zero values indicate success.isOpen
returns a logical value, whether the connection is
currently open.isIncomplete
returns a logical value, whether the last read
attempt was blocked, or for an output text connection whether there is
unflushed output.
url
and file
support URL schemes file://,
http://, https:// and ftp://. method = "libcurl"
allows more schemes: exactly which schemes
is platform-dependent (see libcurlVersion
), but all
Unix-alike platforms will support https:// and most platforms
will support ftps://. Most methods do not percent-encode special characters such as spaces
in http:// URLs (see URLencode
), but it seems the
"wininet"
method does. A note on file:// URLs. The most general form (from RFC1738) is
file://host/path/to/file, but R only accepts the form with an
empty host
field referring to the local machine. On a Unix-alike, this is then file:///path/to/file, where
path/to/file is relative to /. So although the third
slash is strictly part of the specification not part of the path, this
can be regarded as a way to specify the file /path/to/file. It
is not possible to specify a relative path using a file URL. In this form the path is relative to the root of the filesystem, not a
Windows concept. The standard form on Windows is
file:///d:/R/repos: for compatibility with earlier versions of
R and Unix versions, any other form is parsed as R as file://
plus path_to_file
. Also, backslashes are accepted within the
path even though RFC1738 does not allow them. No attempt is made to decode a percent-encoded file: URL: call
URLdecode
if necessary. The "internal"
method does not follow re-directed HTTP URLs:
both methods "wininet"
(the default on Windows) and
"libcurl"
do (including for HTTPS URLs). Server-side cached data is always accepted. Function download.file
and contributed package
\href{https://CRAN.R-project.org/package=#1}{\pkg{#1}}RCurlRCurl provide more comprehensive facilities to download
from URLs.open
are
"r"
or "rt"
"w"
or "wt"
"a"
or "at"
"rb"
"wb"
"ab"
"r+"
, "r+b"
"w+"
, "w+b"
"a+"
, "a+b"
umask
(see
Sys.umask
). For many connections there is little or no difference between text and
binary modes. For file-like connections on Windows, translation of
line endings (between LF and CRLF) is done in text mode only (but text
read operations on connections such as readLines
,
scan
and source
work for any form of line
ending). Various R operations are possible in only one of the modes:
for example pushBack
is text-oriented and is only
allowed on connections open for reading in text mode, and binary
operations such as readBin
, load
and
save
can only be done on binary-mode connections. The mode of a connection is determined when actually opened, which is
deferred if open = ""
is given (the default for all but socket
connections). An explicit call to open
can specify the mode,
but otherwise the mode will be "r"
. (gzfile
,
bzfile
and xzfile
connections are exceptions, as the
compressed file always has to be opened in binary mode and no
conversion of line-endings is done even on Windows, so the default
mode is interpreted as "rb"
.) Most operations that need write
access or text-only or binary-only mode will override the default mode
of a non-yet-open connection. Append modes need to be considered carefully for compressed-file
connections. They do not produce a single compressed stream
on the file, but rather append a new compressed stream to the file.
Readers may or may not read beyond end of the first stream: currently
R does so for gzfile
, bzfile
and xzfile
connections.gzip
, bzip2
and xz
compression (added in R 2.10.0: also read-only support for its
precursor lzma
compression). For reading, the type of compression (if any) can be determined from
the first few bytes of the file. Thus for file(raw = FALSE)
connections, if open
is ""
, "r"
or "rt"
the connection can read any of the compressed file types as well as
uncompressed files. (Using "rb"
will allow compressed files to
be read byte-by-byte.) Similarly, gzfile
connections can read
any of the forms of compression and uncompressed files in any read
mode. (The type of compression is determined when the connection is created
if open
is unspecified and a file of that name exists. If the
intention is to open the connection to write a file with a
different form of compression under that name, specify
open = "w"
when the connection is created or
unlink
the file before creating the connection.) For write-mode connections, compress
specifies how hard the
compressor works to minimize the file size, and higher values need
more CPU time and more working memory (up to ca 800Mb for
xzfile(compress = 9)
). For xzfile
negative values of
compress
correspond to adding the xz
argument
-e: this takes more time (double?) to compress but may
achieve (slightly) better compression. The default (6
) has
good compression and modest (100Mb memory) usage: but if you are using
xz
compression you are probably looking for high compression. Choosing the type of compression involves tradeoffs: gzip
,
bzip2
and xz
are successively less widely supported,
need more resources for both compression and decompression, and
achieve more compression (although individual files may buck the
general trend). Typical experience is that bzip2
compression
is 15% better on text files than gzip
compression, and
xz
with maximal compression 30% better. The experience with
R save
files is similar, but on some large .rda
files xz
compression is much better than the other two. With
current computers decompression times even with compress = 9
are typically modest and reading compressed files is usually faster
than uncompressed ones because of the reduction in disc activity.iconv
: see that help page for how to find out what
encoding names are recognized on your platform. Additionally,
""
and "native.enc"
both mean the native
encoding, that is the internal encoding of the current locale and
hence no translation is done. Re-encoding only works for connections in text mode: reading from a
connection with re-encoding specified in binary mode will read the
stream of bytes, but mixing text and binary mode reads (e.g., mixing
calls to readLines
and readChar
) is likely
to lead to incorrect results. The encodings "UCS-2LE"
and "UTF-16LE"
are treated
specially, as they are appropriate values for Windows Unicode
text files. If the first two bytes are the Byte Order Mark
0xFEFF
then these are removed as some implementations of
iconv
do not accept BOMs. Note that whereas most
implementations will handle BOMs using encoding "UCS-2"
and
choose the appropriate byte order, some (including earlier versions of
glibc
) will not. There is a subtle distinction between
"UTF-16"
and "UCS-2"
(see
https://en.wikipedia.org/wiki/UTF-16: the use of characters in
the Supplementary Planes which need surrogate pairs is very
rare so "UCS-2LE"
is an appropriate first choice (as it is more
widely implemented). One caveat: R's implementation of "UCS-2LE"
and similar for
output does not currently work on Windows, and on Unix it will default
to Unix-style line endings. We recommend use of UTF-8
instead. As from R 3.0.0 the encoding "UTF-8-BOM"
is accepted for
reading and will remove a Byte Order Mark if present (which it often
is for files and webpages generated by Microsoft applications). If a
BOM is required (it is not recommended) when writing it should be
written explicitly, e.g.\ifelse{latex}{\out{~}}{ } by writeChar("\ufeff", con, eos
= NULL)
or writeBin(as.raw(c(0xef, 0xbb, 0xbf)), binary_con)
Encoding names "utf8"
, "mac"
and "macroman"
are
not portable, and not supported on all current R platforms.
"UTF-8"
is portable and "macintosh"
is the official
(and most widely supported) name for Mac Roman. Requesting a conversion that is not supported is an error, reported
when the connection is opened. Exactly what happens when the
requested translation cannot be done for invalid input is in general
undocumented. On output the result is likely to be that up to the
error, with a warning. On input, it will most likely be all or some
of the input up to the error. It may be possible to deduce the current native encoding from
Sys.getlocale("LC_CTYPE")
, but not all OSes record it.readLines
behaves differently in respect of
incomplete last lines in the two modes: see its help page. Even when a connection is in blocking mode, attempts are made to
ensure that it does not block the event loop and hence the operation
of GUI parts of R. These do not always succeed, and the whole R
process will be blocked during a DNS lookup on Unix, for example. Most blocking operations on HTTP/FTP URLs and on sockets are subject to the
timeout set by options("timeout")
. Note that this is a timeout
for no response, not for the whole operation. The timeout is set at
the time the connection is opened (more precisely, when the last
connection of that type -- http:, ftp: or socket -- was
opened).file()
.file
can be used with description = "clipboard"
windows
in modes "r"
and "w"
only.
unix
in mode "r"
only. This reads the X11 primary selection (see
http://standards.freedesktop.org/clipboards-spec/clipboards-latest.txt),
which can also be specified as "X11_primary"
and the secondary
selection as "X11_secondary"
. On most systems the clipboard
selection (that used by Copy from an Edit menu) can
be specified as "X11_clipboard"
. When a clipboard is opened for reading, the contents are immediately
copied to internal storage in the connection. windows
When writing to the clipboard, the output is copied to the clipboard
only when the connection is closed or flushed. There is a 32Kb limit
on the text to be written to the clipboard. This can be raised by
using e.g.\ifelse{latex}{\out{~}}{ } file("clipboard-128")
to give 128Kb. The clipboard works in Unicode wide characters, so encodings might
not work as one might expect.
unix
Unix users wishing to write to one of the X11 selections may be
able to do so via xclip
(http://sourceforge.net/projects/xclip/) or xsel
(http://www.vergenet.net/~conrad/software/xsel/), for example by
pipe("xclip -i", "w")
for the primary selection. OS X users can use pipe("pbpaste")
and
pipe("pbcopy", "w")
to read from and write to that system's
clipboard.socketConnection
), but may
be opened by setting a non-empty value of argument open
. For file
the description is a path to the file to be opened or
a complete URL (when it is the same as calling url
), or
""
(the default) or "clipboard"
(see the
Clipboard section). Use "stdin"
to refer to the
C-level standard input of the process (which need not be
connected to anything in a console or embedded version of R, and is
not in RGui
on Windows). See also stdin()
for
the subtly different R-level concept of stdin
.
For url
the description is a complete URL including scheme
(such as http://, https://, ftp:// or
file://). Method "internal"
is that available since
connections were introduced, method "wininet"
is only available
on Windows (it uses the WinINet functions of that OS) and method
"libcurl"
(using the library of that name:
http://curl.haxx.se/libcurl/) is required on a Unix-alike but
optional on Windows. Method "default"
uses method
"internal"
for file: URLs and "libcurl"
for
ftps:
URLs. On a Unix-alike it uses "internal"
for
http: and ftp:
URLs and "libcurl"
for
https: URLs; on Windows "wininet"
for http:,
ftp:
and https: URLs. Proxies can be specified: see
download.file
.
For gzfile
the description is the path to a file compressed by
gzip
: it can also open for reading uncompressed files and
those compressed by bzip2
, xz
or lzma
.
For bzfile
the description is the path to a file compressed by
bzip2
.
For xzfile
the description is the path to a file compressed by
xz
(https://en.wikipedia.org/wiki/Xz) or (for reading
only) lzma
(https://en.wikipedia.org/wiki/LZMA).
unz
reads (only) single files within zip files, in binary mode.
The description is the full path to the zip file, with .zip
extension if required.
For pipe
the description is the command line to be piped to or
from. This is run in a shell, on Windows that specified by the
COMSPEC environment variable.
For fifo
the description is the path of the fifo. (Support for
fifo
connections is optional but they are available on most
Unix platforms and on Windows.)
The intention is that file
and gzfile
can be used
generally for text input (from files, http:// and
https:// URLs) and binary input respectively.
open
, close
and seek
are generic functions: the
following applies to the methods relevant to connections.
open
opens a connection. In general functions using
connections will open them if they are not open, but then close them
again, so to leave a connection open call open
explicitly.
close
closes and destroys a connection. This will happen
automatically in due course (with a warning) if there is no longer an
R object referring to the connection.
A maximum of 128 connections can be allocated (not necessarily open)
at any one time. Three of these are pre-allocated (see
stdout
). The OS will impose limits on the numbers of
connections of various types, but these are usually larger than 125.
flush
flushes the output stream of a connection open for
write/append (where implemented, currently for file and clipboard
connections, stdout
and stderr
).
If for a file
or (on most platforms) a fifo
connection
the description is ""
, the file/fifo is immediately opened (in
"w+"
mode unless open = "w+b"
is specified) and unlinked
from the file system. This provides a temporary file/fifo to write to
and then read from.
Ripley, B. D. (2001) Connections. R News, 1/1, 16--7. https://www.r-project.org/doc/Rnews/Rnews_2001-1.pdf
textConnection
, seek
,
showConnections
, pushBack
. Functions making direct use of connections are (text-mode)
readLines
, writeLines
, cat
,
sink
, scan
, parse
,
read.dcf
, dput
, dump
and
(binary-mode) readBin
, readChar
,
writeBin
, writeChar
, load
and save
.
capabilities
to see if fifo
connections are
supported by this build of R.
gzcon
to wrap gzip
(de)compression around a
connection.
options
HTTPUserAgent
, internet.info
and
timeout
are used by some of the methods for URL connections.
memCompress
for more ways to (de)compress and references
on data compression.
windows
To flush output to the console, see flush.console
.
zz <- file("ex.data", "w") # open an output file connection
cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file = zz, sep = "\n")
cat("One more line\n", file = zz)
close(zz)
readLines("ex.data")
unlink("ex.data")
zz <- gzfile("ex.gz", "w") # compressed file
cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file = zz, sep = "\n")
close(zz)
readLines(zz <- gzfile("ex.gz"))
close(zz)
unlink("ex.gz")
zz <- bzfile("ex.bz2", "w") # bzip2-ed file
cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file = zz, sep = "\n")
close(zz)
print(readLines(zz <- bzfile("ex.bz2")))
close(zz)
unlink("ex.bz2")
## An example of a file open for reading and writing
Tfile <- file("test1", "w+")
c(isOpen(Tfile, "r"), isOpen(Tfile, "w")) # both TRUE
cat("abc\ndef\n", file = Tfile)
readLines(Tfile)
seek(Tfile, 0, rw = "r") # reset to beginning
readLines(Tfile)
cat("ghi\n", file = Tfile)
readLines(Tfile)
close(Tfile)
unlink("test1")
## We can do the same thing with an anonymous file.
Tfile <- file()
cat("abc\ndef\n", file = Tfile)
readLines(Tfile)
close(Tfile)
## Not run: ## fifo example -- may hang even with OS support for fifos
# if(capabilities("fifo")) {
# zz <- fifo("foo-fifo", "w+")
# writeLines("abc", zz)
# print(readLines(zz))
# close(zz)
# unlink("foo-fifo")
# }## End(Not run)
unix
## Unix examples of use of pipes
# read listing of current directory
readLines(pipe("ls -1"))
# remove trailing commas. Suppose
## Not run: % cat data2_
# 450, 390, 467, 654, 30, 542, 334, 432, 421,
# 357, 497, 493, 550, 549, 467, 575, 578, 342,
# 446, 547, 534, 495, 979, 479## End(Not run)
# Then read this by
scan(pipe("sed -e s/,$// data2_"), sep = ",")
# convert decimal point to comma in output: see also write.table
# both R strings and (probably) the shell need \ doubled
zz <- pipe(paste("sed s/\\\\./,/ >", "outfile"), "w")
cat(format(round(stats::rnorm(48), 4)), fill = 70, file = zz)
close(zz)
file.show("outfile", delete.file = TRUE)
## Not run:
# ## example for a machine running a finger daemon
#
# con <- socketConnection(port = 79, blocking = TRUE)
# writeLines(paste0(system("whoami", intern = TRUE), "\r"), con)
# gsub(" *$", "", readLines(con))
# close(con)
# ## End(Not run)
## Not run:
# ## Two R processes communicating via non-blocking sockets
# # R process 1
# con1 <- socketConnection(port = 6011, server = TRUE)
# writeLines(LETTERS, con1)
# close(con1)
#
# # R process 2
# con2 <- socketConnection(Sys.info()["nodename"], port = 6011)
# # as non-blocking, may need to loop for input
# readLines(con2)
# while(isIncomplete(con2)) {
# Sys.sleep(1)
# z <- readLines(con2)
# if(length(z)) print(z)
# }
# close(con2)
#
# ## examples of use of encodings
# # write a file in UTF-8
# cat(x, file = (con <- file("foo", "w", encoding = "UTF-8"))); close(con)
# # read a 'Windows Unicode' file
# A <- read.table(con <- file("students", encoding = "UCS-2LE")); close(con)
# ## End(Not run)
Run the code above in your browser using DataLab