selenium
selenium is a tool for the automation of web browsers. It is a low-level interface to the WebDriver specification, and an up-to-date alternative to RSelenium.
Installation
# Install selenider from CRAN
install.packages("selenium")
# Or the development version from Github
# install.packages("pak")
pak::pak("ashbythorpe/selenium-r")
However, you must also have a selenium server installed and running (see below).
Starting the server
A selenium instance consists of two parts: the client and the server. The selenium package only provides the client. This means that you have to start the server yourself.
To do this you must:
- Install a browser that you want to automate (e.g. Chrome, Firefox, Edge).
- Download Java (you need Java 11 or higher).
There are many different ways to download and start the server, one of which is provided by selenium:
library(selenium)
server <- selenium_server()
This will download the latest version of the server and start it.
By default, the server file will be stored in a temporary directory,
meaning it will be deleted when the session is closed. If you want the
server to persist, meaning that you don’t have to re-download the server
each time, you can use the temp
argument:
server <- selenium_server(temp = FALSE)
You can also do this manually if you want:
- Download the latest
.jar
file for Selenium Server. Do this by navigating to the latest GitHub release page (https://github.com/SeleniumHQ/selenium/releases/latest/), scrolling down to the Assets section, and downloading the file namedselenium-server-standalone-<VERSION>.jar
(with<VERSION>
being the latest release version). - Make sure you are in the same directory as the file you downloaded.
- In the terminal, run
java -jar selenium-server-standalone-<VERSION>.jar standalone --selenium-manager true
, replacing<VERSION>
with the version number that you downloaded. This will download any drivers you need to communicate with the server and the browser, and start the server.
There are a few other ways of starting Selenium Server:
- Using docker to start the server. See https://github.com/SeleniumHQ/docker-selenium. This is recommended in a non-interactive context (e.g. GitHub Actions).
- Using the
wdman
package to start the server from R, usingwdman::selenium()
. Note that at the time of writing, this package does not work with the latest version of Chrome.
Waiting for the server to be online
The Selenium server won’t be ready to be used immediately. If you used
selenium_server()
to create your server, you can pass it into
wait_for_server()
:
wait_for_server(server)
You can also use server$read_output()
and server$read_error()
If you used a different method to create your server, use
wait_for_selenium_available()
instead.
wait_for_selenium_available()
If any point in this process produces an error or doesn’t work, please see the Debugging Selenium article for more information.
Starting the client
Client sessions can be started using SeleniumSession$new()
session <- SeleniumSession$new()
By default, this will connect to Firefox, but you can use the browser
argument to specify a different browser if you like.
session <- SeleniumSession$new(browser = "chrome")
Here, we use the capabilities
argument to specify options for the
browser. Here, the remote-debugging-port
argument to Chrome is used to
make sure the port that the browser uses does not conflict with any
others (and may be necessary if Chrome is not working by default).
session <- SeleniumSession$new(
browser = "chrome",
capabilities = list(
`goog:chromeOptions` = list(
args = list("remote-debugging-port=9222")
)
)
)
Usage
Once the session has been successfully started, you can use the session object to control the browser. Here, we dynamically navigate through the R project homepage. Remember to close the session and the server process when you are done.
session$navigate("https://www.r-project.org/")
session$
find_element(using = "css selector", value = ".row")$
find_element(using = "css selector", value = "ul")$
find_element(using = "css selector", value = "a")$
click()
session$
find_element(using = "css selector", value = ".row")$
find_elements(using = "css selector", value = "div")[[2]]$
find_element(using = "css selector", value = "p")$
get_text()
#> [1] ""
session$close()
server$kill()
For a more detailed introduction to using selenium, see the Getting Started article.
Note that selenium is low-level and mainly aimed towards developers. If you are wanting to use browser automation for web scraping or testing, you may want to take a look at selenider instead.