Learn R Programming

rvest (version 0.3.1)

html_table: Parse an html table into a data frame.

Description

Parse an html table into a data frame.

Usage

html_table(x, header = NA, trim = TRUE, fill = FALSE, dec = ".")

Arguments

x
A node, node set or document.
header
Use first row as header? If NA, will use first row if it consists of tags.
trim
Remove leading and trailing whitespace within each cell?
fill
If TRUE, automatically fill rows with fewer than the maximum number of columns with NAs.
dec
The character used as decimal mark.

Assumptions

html_table currently makes a few assumptions:

  • No cells span multiple rows
  • Headers are in the first row

Examples

Run this code
tdist <- read_html("http://en.wikipedia.org/wiki/Student%27s_t-distribution")
tdist %>%
  html_node("table.infobox") %>%
  html_table(header = FALSE)

births <- read_html("https://www.ssa.gov/oact/babynames/numberUSbirths.html")
html_table(html_nodes(births, "table")[[2]])

# If the table is badly formed, and has different number of rows in
# each column use fill = TRUE. Here's it's due to incorrect colspan
# specification.
skiing <- read_html("http://data.fis-ski.com/dynamic/results.html?sector=CC&raceid=22395")
skiing %>%
  html_table(fill = TRUE)

Run the code above in your browser using DataLab