Some characters cannot be entered directly into a LaTeX document.
This function converts the input character
vector to a form
suitable for inclusion in a LaTeX document in text mode. It can be
used together with \Sexpr in vignettes.
latexify(x, doublebackslash = TRUE, dashdash = TRUE,
quotes = c("straight", "curved"),
packages = c("fontenc", "textcomp"))
A character
vector
a character
vector
a logical
flag. If TRUE
,
backslashes in the output are doubled. It seems that Sweave needs
TRUE
and knitr FALSE
.
a logical
flag. If TRUE
(the default),
consecutive dashes (“-”) in the input will be rendered as
separate characters. If FALSE
, they will not be given any
special treatment, which will usually mean that two dashes are
rendered as an en dash and three dashes make an em dash.
a character
string specifying how single and
double quotes (ASCII codes 39 and 34) and stand-alone
grave accents (ASCII code 96) in the input are treated.
The default is to use straight quotes and the proper symbol for the
grave accent. The other option is to use curved right side
(closing) quotes, and let LaTeX convert the grave accent to opening
curved quotes. Straight double quotes are not available in the
default OT1 font encoding of LaTeX. Straight single quotes and the
grave accent symbol require the “textcomp” package. See
packages
.
a character
vector specifying the LaTeX
packages allowed. The use of some symbols in LaTeX requires
commands or characters made available by an add-on package. If a
package required for a given character is not marked as available, a
fallback solution is silently used. For example, curved quotes may
be used instead of straight quotes. The supported packages are
"eurosym"
(not used by default), "fontenc"
and
"textcomp"
. Including "fontenc"
in the vector means
that some other encoding than OT1 is going to be used. Note that
straight quotes are an exception in the sense that a reasonable
substitute (curved quotes) is available. In many other cases,
"textcomp"
and "fontenc"
are silently assumed.
Mikko Korpela
The function is intended for use with unformatted inline text.
Newlines, tabs and other whitespace characters ("[:space:]"
in
regex) are converted to spaces. Control characters
("[:cntrl:]"
) that are not whitespace are removed. Other more
or less special characters in the ASCII set are ‘{’,
‘}’, ‘\’, ‘#’, ‘$’, ‘%’,
‘^’, ‘&’, ‘_’, ‘~’, double quote,
‘/’, single quote, ‘<’, ‘>’, ‘|’, grave
and ‘-’. They are converted to the corresponding LaTeX
commands. Some of the conversions are affected by user options,
e.g. dashdash
.
Before applying the substitutions described above, input elements with
Encoding
set to "bytes"
are printed and the
output is stored using captureOutput
. The result of
this intermediate stage is ASCII text where some characters
are shown as their byte codes using a hexadecimal pair prefixed with
"\x"
. This set includes tabs, newlines and control
characters. The substitutions are then applied to the intermediate
result.
The quoting functions sQuote
and dQuote
may use non-ASCII quote characters, depending on the locale.
Also these quotes are converted to LaTeX commands. This means that
the quoting functions are safe to use with any LaTeX input encoding.
Similarly, some other non-ASCII characters, e.g. letters,
currency symbols, punctuation marks and diacritics, are converted to
commands.
Adding "eurosym"
to packages
enables the use of the
euro sign as provided by the "eurosym"
package (\euro).
The result is converted to UTF-8 encoding, Normalization Form C (NFC).
Note that this function will not add any non-ASCII
characters that were not already present in the input. On the
contrary, some non-ASCII characters, e.g. all characters in
the "latin1"
(ISO-8859-1) Encoding
(character set), are removed when converted to LaTeX commands. Any
remaining non-ASCII character has a good chance of working
when the document is processed with XeTeX or LuaTeX, but the Unicode
support available with pdfTeX is limited.
Assuming that pdflatex is used for compilation, suggested package loading commands in the document preamble are:
\usepackage[T1]{fontenc} % no '"' in OT1 font encoding
\usepackage{textcomp} % some symbols e.g. straight single quote
\usepackage[utf8]{inputenx} % UTF-8 input encoding
\input{ix-utf8enc.dfu} % more supported characters
INRIA. Tralics: a LaTeX to XML translator, HTML documentation of all TeX commands. https://www-sop.inria.fr/marelle/tralics/.
Levitt, N., Persch, C., and Unicode, Inc. (2013) GNOME Character Map, application version 3.10.1.
Mittelbach, F., Goossens, M., Braams, J., Carlisle, D., and Rowley, C. (2004) The LaTeX Companion. Addison-Wesley, second edition. ISBN-13: 978-0-201-36299-2.
Pakin, S. (2009) The Comprehensive LaTeX Symbol List. https://www.ctan.org/tex-archive/info/symbols/comprehensive.
The Unicode Consortium. The Unicode Standard. https://home.unicode.org/.
x1 <- "clich\xe9\nma\xf1ana"
Encoding(x1) <- "latin1"
x1
x2 <- x1
Encoding(x2) <- "bytes"
x2
x3 <- enc2utf8(x1)
testStrings <-
c("different kinds\nof\tspace",
"control\a characters \ftoo",
"{braces} and \\backslash",
'#various$ %other^ &characters_ ~escaped"/coded',
x1,
x2,
x3)
latexStrings <- latexify(testStrings, doublebackslash = FALSE)
## All should be "unknown"
Encoding(latexStrings)
cat(latexStrings, sep="\n")
## Input encoding does not matter
identical(latexStrings[5], latexStrings[7])
Run the code above in your browser using DataLab