Learn R Programming

streamR (version 0.2.1)

filterStream: Connect to Twitter Streaming API and return public statuses that match one or more filter predicates.

Description

filterStream opens a connection to Twitter's Streaming API that will return public statuses that match one or more filter predicates. Tweets can be filtered by keywords, users, language, and location. The output can be saved as an object in memory or written to a text file.

Usage

filterStream(file.name = NULL, track = NULL,
    follow = NULL, locations = NULL, language = NULL,
    timeout = 0, tweets = NULL, oauth = NULL,
    verbose = TRUE)

Arguments

file.name

string, name of the file where tweets will be written. "" indicates output to the console, which can be redirected to an R object (see examples). If the file already exists, tweets will be appended (not overwritten).

track

string or string vector containing keywords to track. See the track parameter information in the Streaming API documentation for details: http://dev.twitter.com/docs/streaming-apis/parameters#track.

follow

string or numeric, vector of Twitter user IDs, indicating the users whose public statuses should be delivered on the stream. See the follow parameter information in the Streaming API documentation for details: http://dev.twitter.com/docs/streaming-apis/parameters#follow.

locations

numeric, a vector of longitude, latitude pairs (with the southwest corner coming first) specifying sets of bounding boxes to filter public statuses by. See the locations parameter information in the Streaming API documentation for details: http://dev.twitter.com/docs/streaming-apis/parameters#locations

language

string or string vector containing a list of BCP 47 language identifiers. If not NULL (default), function will only return tweets that have been detected as being written in the specified languages. Note that this parameter can only be used in combination with any of the other filter parameters. See documentation for details: https://dev.twitter.com/docs/streaming-apis/parameters#language

timeout

numeric, maximum length of time (in seconds) of connection to stream. The connection will be automatically closed after this period. For example, setting timeout to 10800 will keep the connection open for 3 hours. The default is 0, which will keep the connection open permanently.

tweets

numeric, maximum number of tweets to be collected when function is called. After that number of tweets have been captured, function will stop. If set to NULL (default), the connection will be open for the number of seconds specified in timeout parameter.

oauth

an object of class oauth that contains the access tokens to the user's twitter session. This is currently the only method for authentication. See examples for more details.

verbose

logical, default is TRUE, which generates some output to the R console with information about the capturing process.

Details

filterStream provides access to the statuses/filter Twitter stream.

It will return public statuses that match the keywords given in the track argument, published by the users specified in the follow argument, written in the language specified in the language argument, and sent within the location bounding boxes declared in the locations argument.

Note that location bounding boxes do not act as filters for other filter parameters. In the fourth example below, we capture all tweets containing the term rstats (even non-geolocated tweets) OR coming from the New York City area. For more information on how the Streaming API request parameters work, check the documentation at: http://dev.twitter.com/docs/streaming-apis/parameters.

Also note that the language parameter needs to be used in combination with another filter option (either keywords or location).

If any of these arguments is left empty (e.g. no user filter is specified), the function will return all public statuses that match the other filters. At least one predicate parameter must be specified.

Note that when no file name is provided, tweets are written to a temporary file, which is loaded in memory as a string vector when the connection to the stream is closed.

The total number of actual tweets that are captured might be lower than the number of tweets requested because blank lines, deletion notices, and incomplete tweets are included in the count of tweets downloaded.

See Also

sampleStream, userStream, parseTweets

Examples

Run this code
# NOT RUN {
## An example of an authenticated request using the ROAuth package,
## where consumerkey and consumer secret are fictitious.
## You can obtain your own at dev.twitter.com
  library(ROAuth)
  requestURL <- "https://api.twitter.com/oauth/request_token"
  accessURL <- "http://api.twitter.com/oauth/access_token"
  authURL <- "http://api.twitter.com/oauth/authorize"
  consumerKey <- "xxxxxyyyyyzzzzzz"
  consumerSecret <- "xxxxxxyyyyyzzzzzzz111111222222"
  my_oauth <- OAuthFactory$new(consumerKey=consumerKey,
    consumerSecret=consumerSecret, requestURL=requestURL,
    accessURL=accessURL, authURL=authURL)
  my_oauth$handshake(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl"))
  filterStream( file="tweets_rstats.json",
	   track="rstats", timeout=3600, oauth=my_oauth )

## capture 10 tweets mentioning the "Rstats" hashtag
  filterStream( file.name="tweets_rstats.json",
     track="rstats", tweets=10, oauth=my_oauth )

## capture tweets published by Twitter's official account
  filterStream( file.name="tweets_twitter.json",
     follow="783214", timeout=600, oauth=my_oauth )

## capture tweets sent from New York City in Spanish only, and saving as an object in memory
  tweets <- filterStream( file.name="", language="es",
      locations=c(-74,40,-73,41), timeout=600, oauth=my_oauth )

## capture tweets mentioning the "rstats" hashtag or sent from New York City
  filterStream( file="tweets_rstats.json", track="rstats",
      locations=c(-74,40,-73,41), timeout=600, oauth=my_oauth )

# }

Run the code above in your browser using DataLab