Learn R Programming

promptr

We developed the promptr package so that researchers could easily format and submit LLM prompts using the R programming language. It provides a handful of convenient functions to query the OpenAI API and return the output as a tidy R dataframe. The package is intended to be particularly useful for social scientists using LLMs for text classification and scaling tasks.

Installation

You can install the release version of

install.packages('promptr')

You can install the development version of promptr from GitHub with:

# install.packages("devtools")
devtools::install_github("joeornstein/promptr")

You will also need a developer account with OpenAI and an API key. For best performance, you may also want to provide credit card information (this significantly boosts your API rate limit, even if you’re not spending money).

Once your account is created, copy-paste your API key into the following line of R code.

library(promptr)

openai_api_key('YOUR API KEY GOES HERE', install = TRUE)

Now you’re all set up!

Completing Prompts

The workhorse function of the promptr package is complete_prompt(). This function submits a prompt to the OpenAI API and returns a dataframe with the five most likely next word predictions and their associated probabilities.

library(promptr)

complete_prompt('I feel like a')
#>    token probability
#> 1    lot  0.20985606
#> 2 little  0.02118042
#> 3    kid  0.01374532
#> 4    new  0.01208388
#> 5    big  0.01204145

If you prefer the model to autoregressively generate text instead of outputting the next-word probabilities, you can set the max_tokens input greater than 1. The function will return a character object with the most likely completion.

complete_prompt('I feel like a', max_tokens = 18)
#> [1] " lot of people are gonna be like, \"Oh, I'm gonna be a doctor.\"\n\n"

Note that by default, the temperature input is set to 0, which means the model will always return the most likely completion for your prompt. Increasing temperature allows the model to randomly select words from its estimated probability distribution (see the API reference for more on these parameters).

You can also change which model variant the function calls using the model input. By default, it is set to “gpt-3.5-turbo-instruct”, the RLHF variant of GPT-3.5. For the base GPT-3 variants, try “davinci-002” (175 billion parameters) or “babbage-002” (1.3 billion parameters).

Formatting Prompts

Manually typing prompts with multiple few-shot examples can be tedious and error-prone, particularly if you want to include context-specific instructions or few-shot examples. We include the format_prompt() function to aid in that process.

The function is designed with classification problems in mind. If you input the text you would like to classify along with a set of instructions, the default prompt template looks like this:

prompt <- format_prompt(text = 'I feel positively morose today.', 
                        instructions = 'Decide whether this statment is happy or sad.')
prompt
#> Decide whether this statment is happy or sad.
#> 
#> Text: I feel positively morose today.
#> Classification:

You can customize the template using glue syntax, with placeholders for {text} and {label}.

format_prompt(text = 'I feel positively morose today.',
              instructions = 'Decide whether this statment is happy or sad.',
              template = 'Statement: {text}\nSentiment: {label}')
#> Decide whether this statment is happy or sad.
#> 
#> Statement: I feel positively morose today.
#> Sentiment:

This function is particularly useful when including few-shot examples in the prompt. If you input these examples as a tidy dataframe, the format_prompt() function will paste them into the prompt according to the template. The examples dataframe must have at least two columns, one called “text” and the other called “label”.

examples <- data.frame(
  text = c('What a pleasant day!', 
           'Oh bother.',
           'Merry Christmas!',
           ':-('),
  label = c('happy', 'sad', 'happy', 'sad')
)

examples
#>                   text label
#> 1 What a pleasant day! happy
#> 2           Oh bother.   sad
#> 3     Merry Christmas! happy
#> 4                  :-(   sad

prompt <- format_prompt(text = 'I feel positively morose today.',
                        instructions = 'Decide whether this statment is happy or sad.',
                        examples = examples,
                        template = 'Statement: {text}\nSentiment: {label}')

prompt
#> Decide whether this statment is happy or sad.
#> 
#> Statement: What a pleasant day!
#> Sentiment: happy
#> 
#> Statement: Oh bother.
#> Sentiment: sad
#> 
#> Statement: Merry Christmas!
#> Sentiment: happy
#> 
#> Statement: :-(
#> Sentiment: sad
#> 
#> Statement: I feel positively morose today.
#> Sentiment:

Once you’re satisfied with the format of the prompt, you can submit it with complete_prompt():

complete_prompt(prompt)
#>     token  probability
#> 1     sad 9.990284e-01
#> 2     sad 6.382159e-04
#> 3     Sad 1.961563e-04
#> 4   happy 3.677703e-05
#> 5 sadness 2.776648e-05

The full pipeline—first formatting the text into a prompt, then submitting the prompt for completion—looks like this:

'What a joyous day for our adversaries.' |> 
  format_prompt(instructions = 'Classify this text as happy or sad.',
                examples = examples) |> 
  complete_prompt()
#>     token  probability
#> 1     sad 0.9931754130
#> 2   happy 0.0023576333
#> 3     sad 0.0021634900
#> 4     Sad 0.0007275062
#> 5 unhappy 0.0006792638

The biggest advantage of using text prompts like these is efficiency. One can request up to 2,048 next-word probability distributions in a single API call, whereas ChatGPT prompts (see next section) can only be submitted one at a time. Both the format_prompt() function and the complete_prompt() function are vectorized so that users can submit multiple texts to be classified simultaneously.

texts <- c('What a wonderful world??? As if!', 'Things are looking up.', 'Me gusta mi vida.')

texts |> 
  format_prompt(instructions = 'Classify these texts as happy or sad.',
                examples = examples) |> 
  complete_prompt()
#> [[1]]
#>     token  probability
#> 1     sad 0.9845923503
#> 2   happy 0.0101702041
#> 3     sad 0.0022756506
#> 4 unhappy 0.0005526699
#> 5         0.0005016985
#> 
#> [[2]]
#>   token  probability
#> 1 happy 9.989103e-01
#> 2 happy 8.046505e-04
#> 3       7.620519e-05
#> 4       5.893237e-05
#> 5 Happy 2.052843e-05
#> 
#> [[3]]
#>    token  probability
#> 1  happy 0.9957006846
#> 2  happy 0.0012367921
#> 3        0.0009202636
#> 4 unsure 0.0002593114
#> 5        0.0001682163

Example: Supreme Court Tweets

To illustrate the entire workflow, let’s classify the sentiment of social media posts from the Supreme Court Tweets dataset included in the package.

data(scotus_tweets) # the full dataset
data(scotus_tweets_examples) # a dataframe with few-shot examples

Let’s focus on tweets posted following the Masterpiece Cakeshop v Colorado (2018) decision, formatting the prompts with a set of instructions and few-shot examples tailored to that context.

library(tidyverse)

masterpiece_tweets <- scotus_tweets |> 
  filter(case == 'masterpiece')

instructions <- 'Read these tweets posted the day after the US Supreme Court ruled in favor of a baker who refused to bake a wedding cake for a same-sex couple (Masterpiece Cakeshop, 2018). For each tweet, decide whether its sentiment is Positive, Neutral, or Negative.'

masterpiece_examples <- scotus_tweets_examples |> 
  filter(case == 'masterpiece')

masterpiece_tweets$prompt <- format_prompt(text = masterpiece_tweets$text,
                                           instructions = instructions,
                                           examples = masterpiece_examples)

masterpiece_tweets$prompt[3]
#> Read these tweets posted the day after the US Supreme Court ruled in favor of a baker who refused to bake a wedding cake for a same-sex couple (Masterpiece Cakeshop, 2018). For each tweet, decide whether its sentiment is Positive, Neutral, or Negative.
#> 
#> Text: Thank you Supreme Court I take pride in your decision!!!!✝️ #SCOTUS
#> Classification: Positive
#> 
#> Text: Supreme Court rules in favor of Colorado baker! This day is getting better by the minute!
#> Classification: Positive
#> 
#> Text: Can’t escape the awful irony of someone allowed to use religion to discriminate against people in love. 
#> Not my Jesus. 
#> #opentoall #SCOTUS #Hypocrisy #MasterpieceCakeshop
#> Classification: Negative
#> 
#> Text: I can’t believe this cake case went all the way to #SCOTUS . Can someone let me know what cake was ultimately served at the wedding? Are they married and living happily ever after?
#> Classification: Neutral
#> 
#> Text: Supreme Court rules in favor of baker who would not make wedding cake for gay couple
#> Classification: Neutral
#> 
#> Text: #SCOTUS set a dangerous precedent today. Although the Court limited the scope to which a business owner could deny services to patrons, the legal argument has been legitimized that one's subjective religious convictions trump (no pun intended) #humanrights. #LGBTQRights
#> Classification: Negative
#> 
#> Text: The @Scotus ruling was a 

Copy Link

Version

Install

install.packages('promptr')

Monthly Downloads

159

Version

1.0.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Joe Ornstein

Last Published

August 23rd, 2024

Functions in promptr (1.0.0)

openai_api_key

Install an OPENAI API KEY in Your .Renviron File for Repeated Use
occupations_examples

Labelled Occupations
format_prompt

Format an LLM prompt
complete_prompt

Complete an LLM Prompt
complete_chat

Complete an LLM Chat
format_chat

Format a Chat Prompt
occupations

Occupations
scotus_tweets

Tweets About The Supreme Court of the United States
scotus_tweets_examples

Labelled Example Tweets About The Supreme Court of the United States