chrome_print: Print a web page to PDF or capture a screenshot using the headless Chrome

Description

Print an HTML page to PDF or capture a PNG/JPEG screenshot through the Chrome DevTools Protocol. Google Chrome or Microsoft Edge (or Chromium on Linux) must be installed prior to using this function.

Usage

chrome_print(
  input,
  output = xfun::with_ext(input, format),
  wait = 2,
  browser = "google-chrome",
  format = c("pdf", "png", "jpeg"),
  options = list(),
  selector = "body",
  box_model = c("border", "content", "margin", "padding"),
  scale = 1,
  work_dir = tempfile(),
  timeout = 30,
  extra_args = c("--disable-gpu"),
  verbose = 0,
  async = FALSE,
  outline = gs_available(),
  encoding
)

Value

Path of the output file (invisibly). If async is TRUE, this is a promise value.

Arguments

input: A URL or local file path to an HTML page, or a path to a local file that can be rendered to HTML via rmarkdown::render() (e.g., an R Markdown document or an R script). If the input is to be rendered via rmarkdown::render() and you need to pass any arguments to it, you can pass the whole render() call to chrome_print(), e.g., if you need to use the params argument: pagedown::chrome_print(rmarkdown::render('input.Rmd', params = list(foo = 1:10))). This is because render() returns the HTML file, which can be passed to chrome_print().
output: The output filename. For a local web page foo/bar.html, the default PDF output is foo/bar.pdf; for a remote URL https://www.example.org/foo/bar.html, the default output will be bar.pdf under the current working directory. The same rules apply for screenshots.
wait: The number of seconds to wait for the page to load before printing (in certain cases, the page may not be immediately ready for printing, especially there are JavaScript applications on the page, so you may need to wait for a longer time).
browser: Path to Google Chrome, Microsoft Edge or Chromium. This function will try to find it automatically via find_chrome() if the path is not explicitly provided and the environment variable PAGEDOWN_CHROME is not set.
format: The output format.
options: A list of page options. See https://chromedevtools.github.io/devtools-protocol/tot/Page#method-printToPDF for the full list of options for PDF output, and https://chromedevtools.github.io/devtools-protocol/tot/Page#method-captureScreenshot for options for screenshots. Note that for PDF output, we have changed the defaults of printBackground (TRUE), preferCSSPageSize (TRUE) and when available transferMode (ReturnAsStream) in this function.
selector: A CSS selector used when capturing a screenshot.
box_model: The CSS box model used when capturing a screenshot.
scale: The scale factor used for screenshot.
work_dir: Name of headless Chrome working directory. If the default temporary directory doesn't work, you may try to use a subdirectory of your home directory.
timeout: The number of seconds before canceling the document generation. Use a larger value if the document takes longer to build.
extra_args: Extra command-line arguments to be passed to Chrome.
verbose: Level of verbosity: 0 means no messages; 1 means to print out some auxiliary messages (e.g., parameters for capturing screenshots); 2 (or TRUE) means all messages, including those from the Chrome processes and WebSocket connections.
async: Execute chrome_print() asynchronously? If TRUE, chrome_print() returns a promise value (the promises package has to be installed in this case).
outline: If not FALSE, chrome_print() will add the bookmarks to the generated pdf file, based on the table of contents informations. This feature is only available for output formats based on html_paged. It is enabled by default, as long as the Ghostscript executable can be detected by find_gs_cmd.
encoding: Not used. This argument is required by RStudio IDE.

References

https://developer.chrome.com/blog/headless-chrome/