Learn R Programming

monkeylearn (version 0.2.0)

monkey_extract: Monkeylearn extract from a dataframe column or vector of texts

Description

Independent extractions for each row of a dataframe using the Monkeylearn extractor modules

Usage

monkey_extract(input, col = NULL, key = monkeylearn_key(quiet = TRUE),
  extractor_id = "ex_isnnZRbS", params = NULL, texts_per_req = NULL,
  unnest = TRUE, .keep_all = TRUE, verbose = TRUE, ...)

Arguments

input

A dataframe or vector of texts (each text smaller than 50kB)

col

If input is a dataframe, the unquoted name of the character column containing text to extract from

key

The API key

extractor_id

The ID of the extractor

params

Parameters for the module as a named list.

texts_per_req

Number of texts to be processed per requests. Minimum value is the number of texts in input; max is 200, as per [Monkeylearn documentation](docs.monkeylearn.com/article/api-reference/). If NULL, we default to 200, or, if there are fewer than 200 texts, the length of the input.

unnest

Should the output column be unnested?

.keep_all

If input is a dataframe, should non-col columns be retained in the output?

verbose

Whether to output messages about batch requests and progress of processing.

...

Other arguments

Value

A data.frame (tibble) with the cleaned input (empty strings removed) and a new column, nested by default, containing the extraction for that particular row. Attribute is a data.frame (tibble) "headers" including the number of remaining queries as "x.query.limit.remaining".

Details

Find IDs of extractors using https://app.monkeylearn.com/main/explore.

This function relates the rows in your original dataframe or elements in your vector to an extraction particular to that row. This allows you to know which row of your original dataframe is associated with which extraction. Each row of the dataframe is extracted separately from all of the others, but the number of extractions a particular input row is assigned may vary (unless you specify a fixed number of outputs in params).

The texts_per_req parameter simply specifies the number of rows to feed the API at a time; it does not lump these together for extraction as a group. Varying this parameter does not affect the final output, but does affect speed: one batched request of x texts is faster than x single-text requests: http://help.monkeylearn.com/frequently-asked-questions/queries/can-i-classify-or-extract-more-than-one-text-with-one-api-request. Even if batched, each text still counts as one query, so batching does not save you on hits to the API. See the [Monkeylearn API docs](docs.monkeylearn.com/article/api-reference/) for more details.

You can check the number of calls you can still make in the API using attr(output, "headers")$x.query.limit.remaining and attr(output, "headers")$x.query.limit.limit.

Find IDs of extractors using https://app.monkeylearn.com/main/explore. Within the free plan, you can make up to 20 requests per minute.

You can use batch to send up to 200 texts to be analyzed within the API (classification or extraction) with each request. So for example, if you need to analyze 6000 tweets, instead of doing 6000 requests to the API, you can use batch to send 30 requests, each request with 200 tweets. The function automatically makes these batch calls and waits if there is a throttle limit error, but you might want to control the process yourself using several calls to the function.

You can check the number of calls you can still make in the API using attr(output, "headers")$x.query.limit.remaining and attr(output, "headers")$x.query.limit.limit.

Examples

Run this code
# NOT RUN {
text <- "In the 19th century, the major European powers had gone to great lengths
to maintain a balance of power throughout Europe, resulting in the existence of
a complex network of political and military alliances throughout the continent by 1900.[7]
These had started in 1815, with the Holy Alliance between Prussia, Russia, and Austria.
Then, in October 1873, German Chancellor Otto von Bismarck negotiated the League of
the Three Emperors (German: Dreikaiserbund) between the monarchs of Austria-Hungary,
Russia and Germany."
output <- monkeylearn_extract(request = text)
output


# Example with parameters
text <- "A panel of Goldman Sachs employees spent a recent Tuesday night at the
Columbia University faculty club trying to convince a packed room of potential
recruits that Wall Street, not Silicon Valley, was the place to be for computer
scientists.\n\n The Goldman employees knew they had an uphill battle. They were
fighting against perceptions of Wall Street as boring and regulation-bound and
Silicon Valley as the promised land of flip-flops, beanbag chairs and million-dollar
stock options.\n\n Their argument to the room of technologically inclined students
was that Wall Street was where they could find far more challenging, diverse and,
yes, lucrative jobs working on some of the worlds most difficult technical problems."

output <- monkey_extract(text,
                            extractor_id = "ex_y7BPYzNG",
                            params = list(max_keywords = 3,
                            use_company_names = 1))
attr(output, "headers")
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab