Learn R Programming

stringr

Overview

Strings are not glamorous, high-profile components of R, but they do play a big role in many data cleaning and preparation tasks. The stringr package provides a cohesive set of functions designed to make working with strings as easy as possible. If you’re not familiar with strings, the best place to start is the chapter on strings in R for Data Science.

stringr is built on top of stringi, which uses the ICU C library to provide fast, correct implementations of common string manipulations. stringr focusses on the most important and commonly used string manipulation functions whereas stringi provides a comprehensive set covering almost anything you can imagine. If you find that stringr is missing a function that you need, try looking in stringi. Both packages share similar conventions, so once you’ve mastered stringr, you should find stringi similarly easy to use.

Installation

# The easiest way to get stringr is to install the whole tidyverse:
install.packages("tidyverse")

# Alternatively, install just stringr:
install.packages("stringr")

Cheatsheet

Usage

All functions in stringr start with str_ and take a vector of strings as the first argument:

x <- c("why", "video", "cross", "extra", "deal", "authority")
str_length(x) 
#> [1] 3 5 5 5 4 9
str_c(x, collapse = ", ")
#> [1] "why, video, cross, extra, deal, authority"
str_sub(x, 1, 2)
#> [1] "wh" "vi" "cr" "ex" "de" "au"

Most string functions work with regular expressions, a concise language for describing patterns of text. For example, the regular expression "[aeiou]" matches any single character that is a vowel:

str_subset(x, "[aeiou]")
#> [1] "video"     "cross"     "extra"     "deal"      "authority"
str_count(x, "[aeiou]")
#> [1] 0 3 1 2 2 4

There are seven main verbs that work with patterns:

  • str_detect(x, pattern) tells you if there’s any match to the pattern:

    str_detect(x, "[aeiou]")
    #> [1] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
  • str_count(x, pattern) counts the number of patterns:

    str_count(x, "[aeiou]")
    #> [1] 0 3 1 2 2 4
  • str_subset(x, pattern) extracts the matching components:

    str_subset(x, "[aeiou]")
    #> [1] "video"     "cross"     "extra"     "deal"      "authority"
  • str_locate(x, pattern) gives the position of the match:

    str_locate(x, "[aeiou]")
    #>      start end
    #> [1,]    NA  NA
    #> [2,]     2   2
    #> [3,]     3   3
    #> [4,]     1   1
    #> [5,]     2   2
    #> [6,]     1   1
  • str_extract(x, pattern) extracts the text of the match:

    str_extract(x, "[aeiou]")
    #> [1] NA  "i" "o" "e" "e" "a"
  • str_match(x, pattern) extracts parts of the match defined by parentheses:

    # extract the characters on either side of the vowel
    str_match(x, "(.)[aeiou](.)")
    #>      [,1]  [,2] [,3]
    #> [1,] NA    NA   NA  
    #> [2,] "vid" "v"  "d" 
    #> [3,] "ros" "r"  "s" 
    #> [4,] NA    NA   NA  
    #> [5,] "dea" "d"  "a" 
    #> [6,] "aut" "a"  "t"
  • str_replace(x, pattern, replacement) replaces the matches with new text:

    str_replace(x, "[aeiou]", "?")
    #> [1] "why"       "v?deo"     "cr?ss"     "?xtra"     "d?al"      "?uthority"
  • str_split(x, pattern) splits up a string into multiple pieces:

    str_split(c("a,b", "c,d,e"), ",")
    #> [[1]]
    #> [1] "a" "b"
    #> 
    #> [[2]]
    #> [1] "c" "d" "e"

As well as regular expressions (the default), there are three other pattern matching engines:

  • fixed(): match exact bytes
  • coll(): match human letters
  • boundary(): match boundaries

RStudio Addin

The RegExplain RStudio addin provides a friendly interface for working with regular expressions and functions from stringr. This addin allows you to interactively build your regexp, check the output of common string matching functions, consult the interactive help pages, or use the included resources to learn regular expressions.

This addin can easily be installed with devtools:

# install.packages("devtools")
devtools::install_github("gadenbuie/regexplain")

Compared to base R

R provides a solid set of string operations, but because they have grown organically over time, they can be inconsistent and a little hard to learn. Additionally, they lag behind the string operations in other programming languages, so that some things that are easy to do in languages like Ruby or Python are rather hard to do in R.

  • Uses consistent function and argument names. The first argument is always the vector of strings to modify, which makes stringr work particularly well in conjunction with the pipe:

    letters %>%
      .[1:10] %>% 
      str_pad(3, "right") %>%
      str_c(letters[2:11])
    #>  [1] "a  b" "b  c" "c  d" "d  e" "e  f" "f  g" "g  h" "h  i" "i  j" "j  k"
  • Simplifies string operations by eliminating options that you don’t need 95% of the time.

  • Produces outputs than can easily be used as inputs. This includes ensuring that missing inputs result in missing outputs, and zero length inputs result in zero length outputs.

Learn more in vignette("from-base")

Copy Link

Version

Install

install.packages('stringr')

Monthly Downloads

1,172,905

Version

1.5.1

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Last Published

November 14th, 2023

Functions in stringr (1.5.1)

str_conv

Specify the encoding of a string
str_pad

Pad a string to minimum width
str_like

Detect a pattern in the same way as SQL's LIKE operator
str_which

Find matching indices
str_order

Order, rank, or sort a character vector
str_replace

Replace matches with new text
str_match

Extract components (capturing groups) from a match
str_replace_na

Turn NA into "NA"
str_wrap

Wrap words into nicely formatted paragraphs
str_locate

Find location of match
str_split

Split up a string into pieces
str_starts

Detect the presence/absence of a match at the start/end
str_glue

Interpolation with glue
str_flatten

Flatten a string
str_sub

Get and set substrings using their positions
str_subset

Find matching elements
stringr-data

Sample character vectors for practicing string manipulations
str_remove

Remove matched patterns
stringr-package

stringr: Simple, Consistent Wrappers for Common String Operations
str_unique

Remove duplicated strings
str_interp

String interpolation
str_trim

Remove whitespace
str_length

Compute the length/width
str_trunc

Truncate a string to maximum width
word

Extract words from a sentence
str_view

View strings and matches
str_dup

Duplicate a string
invert_match

Switch location of matches to location of non-matches
modifiers

Control matching behaviour with modifier functions
str_c

Join multiple strings into one string
str_count

Count number of matches
%>%

Pipe operator
str_equal

Determine if two strings are equivalent
case

Convert string to upper case, lower case, title case, or sentence case
str_detect

Detect the presence/absence of a match
str_extract

Extract the complete match
str_escape

Escape regular expression metacharacters