RC.use
selects the keyspace (aka database) to use for all
subsequent operations. All functions described below require keyspace
to be set using this function.
RC.get
queries one key and a fixed list of columns RC.get.range
queries one key and multiple columns
RC.mget.range
queries multiple keys and multiple columns
RC.get.range.slices
queries a range of keys (or tokens) and a
range of columns
RC.consistency
sets the desired consistency level for all query
operations
RC.use(conn, keyspace, cache.def = TRUE)
RC.get(conn, c.family, key, c.names, comparator = NULL, validator = NULL)
RC.get.range(conn, c.family, key, first = "", last = "", reverse = FALSE, limit = 1e+07, comparator = NULL, validator = NULL)
RC.mget.range(conn, c.family, keys, first = "", last = "", reverse = FALSE, limit = 1e+07, comparator = NULL, validator = NULL)
RC.get.range.slices(conn, c.family, k.start = "", k.end = "", first = "", last = "", reverse = FALSE, limit = 1e+07, k.limit = 1e+07, tokens = FALSE, fixed = FALSE, comparator = NULL, validator = NULL)
RC.consistency(conn, level = c("one", "quorum", "local.quorum", "each.quorum", "all", "any", "two", "three"))
RC.connect
TRUE
then in addition to setting the
keyspace a query on the keyspace definition is sent and the result
cached. This allows automatic detection of comparators and
validators, see details section for more information.TRUE
the resutl is returned in reverse orderTRUE
then keys are interpreted as tokens
(i.e. values after hashing)TRUE
then the result if be a single data frame
consisting of rows and keys and all columns ever encountered -
essentially assuming fixed column structure"one"
is the default if not explicitly set.RC.use
and RC.consistency
returns conn
RC.get
and RC.get.range
return a data frame with
columns key
(column name), value
(value in that column)
and ts
(timestamp).RC.mget.range
and RC.get.range.slices
return a named
list of data frames as described in RC.get.range
with names
being the row keys, except if fixed=TRUE
in which case the
result is a data frame with row names as keys and values as elements
(timestamps are not retrieved in that case).
RC.get
,
RC.get.range
), keys (RC.mget.range
) or key range
(RC.get.range.slices
), then selecting the columns of
interest. Empty string (""
) can be used to denote an
unspecified range (so the default is to fetch all columns). comparator
and validator
specify the types of column
keys and values respectively. Every key or value in Cassandra is
simply a byte string, so it can deal with arbitrary values, but
sometimes it is convenient to impose some structure on that content
by declaring what is represented by that byte string. Unfortunately
Cassandra does not include that information in the results, so the
user has to define how column names and values are to be
interpreted. The default interpretation is simply as a UTF-8 encoded
string, but RCassandra also supports following conversions:
"UTF8Type", "AsciiType" (stored as character vectors), "BytesType"
(opaque stream of bytes, stored as raw vector),
"LongType" (8-bytes integer, stored as real vector in R), "DateType"
(8-bytes integer, stored as POSIXct
in R), "BooleanType" (one
byte, logical vector in R), "FloatType" (4-bytes float, real vector
in R), "DoubleType" (8-bytes float, real vector in R) and "UUIDType"
(16-bytes, stored as UUID-formatted string). No other conversions
are supported at this point. If the value is NULL
then
RCassandra
attempts to guess the proper value by taking into
account the schema definition obtained by
RC.use(..., cache.def=TRUE)
, otherwise it falls back to
"UTF8Type". You can always get the raw form using "BytesType" and
decode the values in R.
The comparator
also determines how the values of first
and last
will be interpreted. Regardless of the comparator, it
is always possible to pass either NULL
, ""
(both
denoting 0-length value) or a raw vector. Other supported types must
match the comparator.
Most users will be happy with the default settings, but if you want to
save every nanosecond you can, call
RC.use(..., cache.def = FALSE)
(which saves one extra
RC.describe.keyspace
request to the Cassandra instance)
and always specify both comparator
and validator
(even
if it is just "UTF8String").
Cassandra collects results in memory so key (k.limit
) and
column (limit
) limits are mandatory. Future versions of
RCassandra may abstract this limitation out (by using a limit and
repeating queries with new start key/column based on the last result
row), but not at this point.
Note that in Cassandra keys are typically hashed, so key range may be counter-intuitive as it is based on the hash and not on the actual value. Columns are always sorted by their name (=key).
The result of queries may be also counter-intuitive, especially when
querying fixed column tables as it is not returned in the form that
would be expected from a relational database. See
RC.read.table
and RC.write.table
for
retrieving and storing relational structures in rectangular tables
(column families with fixed columns). But you have to keep in
mind that Cassandra is essentailly key/key/value storage (row key,
column key, value) with partitioning on row keys and sorting of column
keys, so designing the correct schema for a task needs some
thought. Dynamic columns are what makes it so powerful.
RC.connect
, RC.read.table
, RC.write.table
## Not run:
# c <- RC.connect("cassandra-host")
# RC.use(c, "testdb")
# ## you will have to use cassandra-cli to create the schema for the "iris" CF
# RC.write.table(c, "iris", iris)
# RC.get(c, "iris", "1", c("Sepal.Length", "Species"))
# RC.get.range(c, "iris", "1")
# ## list of 150 data frames
# r <- RC.get.range.slices(c, "iris")
# ## use limit=0 to obtain all row keys without pulling any data
# rk <- RC.get.range.slices(c, "iris", limit=0)
# y <- RC.read.table(c, "iris")
# y <- y[order(as.integer(row.names(y))),]
# RC.close(c)
# ## End(Not run)
Run the code above in your browser using DataLab