Learn R Programming

rvest (version 0.3.1)

encoding: Guess and repair faulty character encoding.

Description

These functions help you respond to web pages that declare incorrect encodings. You can use guess_encoding to figure out what the real encoding is (and then supply that to the encoding argument of html), or use repair_encoding to fix character vectors after the fact.

Usage

guess_encoding(x)

repair_encoding(x, from = NULL)

Arguments

x
A character vector.
from
The encoding that the string is actually in. If NULL,

stringi

These function are wrappers around tools from the fantastic stringi package, so you'll need to make sure to have that installed.

Examples

Run this code
# This page claims to be in iso-8859-1:
url <- 'http://www.elections.ca/content.aspx?section=res&dir=cir/list&document=index&lang=e#list'
elections <- read_html(url)
x <- elections %>% html_nodes("table") %>% .[[2]] %>% html_table() %>% .$TO
# But something looks wrong:
x

# It's acutally UTF-8!
guess_encoding(x)

# We can repair this vector:
repair_encoding(x)

# But it's better to start from scratch with correctly encoded file
elections <- read_html(url, encoding = "UTF-8")
elections %>% html_nodes("table") %>% .[[2]] %>% html_table() %>% .$TO

Run the code above in your browser using DataLab