Learn R Programming

⚠️There's a newer version (4.1.0) of this package.Take me there.

About

An R package for managing and analyzing text, created by Kenneth Benoit in collaboration with a team of core contributors: Kohei Watanabe, Paul Nulty, Adam Obeng, Haiyan Wang, Ben Lauderdale, and Will Lowe. Supported by the European Research Council grant ERC-2011-StG 283794-QUANTESS.

For more details, see http://quanteda.io.

How to cite the package:

Benoit K (2017). _quanteda: Quantitative Analysis of Textual
Data_. doi: 10.5281/zenodo.1004683 (URL:
http://doi.org/10.5281/zenodo.1004683), R package version 0.99.22,
<URL: http://quanteda.io>.

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {quanteda: Quantitative Analysis of Textual Data},
    author = {Kenneth Benoit},
    year = {2017},
    doi = {10.5281/zenodo.1004683},
    url = {http://quanteda.io},
    note = {R package version 0.99.22},
  }

How to Install

  1. From CRAN: Use your GUI's R package installer, or execute:

    install.packages("quanteda") 
  2. From GitHub, using:

    # devtools packaged required to install quanteda from Github 
    devtools::install_github("kbenoit/quanteda") 

    Because this compiles some C++ source code, you will need a compiler installed. If you are using a Windows platform, this means you will need also to install the Rtools software available from CRAN. If you are using macOS, you will need to to install XCode, available for free from the App Store, or if you prefer a lighter footprint set of tools, just the Xcode command line tools, using the command xcode-select --install from the Terminal.

    Also, you might need to upgrade your compiler. @kbenoit found that his macOS build only worked reliably after upgrading the default Xcode compiler to clang4, following these instructions.

  3. Additional recommended packages:

    The following packages work well with or extend quanteda and we recommend that you also install them:

    • readtext: An easy way to read text data into R, from almost any input format.

    • spacyr: NLP using the spaCy library, including part-of-speech tagging, entity recognition, and dependency parsing.

    • quantedaData: Additional textual data for use with quanteda.

      devtools::install_github("kbenoit/quantedaData")
    • LIWCalike: An R implementation of the Linguistic Inquiry and Word Count approach to text analysis.

      devtools::install_github("kbenoit/LIWCalike")

Leaving feedback

If you like quanteda, please consider leaving feedback or a testimonial here.

Contributing

Contributions in the form of feedback, comments, code, and bug reports are most welcome. How to contribute:

Copy Link

Version

Install

install.packages('quanteda')

Monthly Downloads

23,966

Version

0.99.22

License

GPL-3

Maintainer

Last Published

November 13th, 2017

Functions in quanteda (0.99.22)

as.list.dist_selection

coerce a dist_selection object into a list
as.matrix.dfm

coerce a dfm to a matrix or data.frame
attributes<-

function extending base::attributes()
bootstrap_dfm

bootstrap a dfm
corpus_segment

segment texts on a pattern match
corpus_subset

extract a subset of a corpus
data-internal

internal data sets
data_char_sampletext

a paragraph of text for testing various text-based functions
dfm_sample

randomly sample documents or features from a dfm
dfm_select

select features from a dfm or fcm
dictionary

create a dictionary
docfreq

compute the (weighted) document frequency of a feature
keyness

compute keyness (internal functions)
kwic

locate keywords-in-context
nsyllable

count syllables in a text
ntoken

count the number of tokens or types
print.dfm

print a dfm object
print.dist_selection

print a dist_selection object
scrabble

deprecated name for nscrabble
settings

Get or set the corpus settings
syllables

deprecated name for nsyllable
textmodel-internal

internal functions for textmodel objects
textstat_collocations

identify and score multi-word expressions
textstat_frequency

tabulate feature frequencies
tokens_ngrams

create ngrams and skipgrams from tokens
tokens_recompile

recompile a serialized tokens object
as.matrix.dist_selection

coerce a dist_selection object to a matrix
as.matrix.simil

Coerce a simil object into a matrix
cbind.dfm

Combine dfm objects by Rows or Columns
char_tolower

convert the case of character objects
corpus_trim

remove sentences based on their token lengths or a pattern match
corpus_trimsentences

remove sentences based on their token lengths or a pattern match
dfm-internal

internal functions for dfm objects
dfm

create a document-feature matrix
dfm_sort

sort a dfm by frequency of one or more margins
dfm_subset

extract a subset of a dfm
docnames

get or set document names
docvars

get or set for document-level variables
list2dictionary

internal function to convert a list to a dictionary
merge_dictionary_values

internal function to merge values of duplicated keys
metacorpus

get or set corpus metadata
View

View methods for quanteda
as.corpus

coerce a compressed corpus to a standard corpus
as.tokens

coercion, checking, and combining functions for tokens objects
as.yaml

convert quanteda dictionary objects to the YAML format
corpus-class

base method extensions for corpus objects
metadoc

get or set document-level meta-data
reexports

Objects exported from other packages
regex2fixed

convert regex and glob patterns to type IDs or fixed patterns
slots<-

function to assign multiple slots to a S4 object
spacyr-methods

extensions for and from spacy_parse objects
textmodel_fitted-class

the fitted textmodel classes
textmodel_nb

Naive Bayes classifier for texts
textplot_xray

plot the dispersion of key word(s)
texts

get or assign corpus texts
tokens

tokenize a set of texts
tokens_compound

convert token sequences into compound tokens
valuetype

pattern matching using valuetype
corpus

construct a corpus object
data_dictionary_LSD2015

Lexicoder Sentiment Dictionary (2015)
dfm-class

Virtual class "dfm" for a document-feature matrix
dfm_tolower

convert the case of the features of a dfm and combine
dfm_trim

trim a dfm using frequency threshold-based feature selection
dfm_weight

weight the feature frequencies in a dfm
dictionary2-class

print a dictionary object
groups

grouping variable(s) for various functions
ndoc

count the number of documents or features
nest_dictionary

utility function to generate a nested list
print.phrases

print a phrase object
quanteda-package

An R package for the quantitative analysis of textual data
featnames

get the feature labels from a dfm
sparsity

compute the sparsity of a document-feature matrix
stopwords

access built-in stopwords
textmodel

fit a text model
textmodel_ca

correspondence analysis of a document-feature matrix
textstat_readability

calculate readability
textstat_dist

Similarity and distance computation between documents or features
tokens_replace

replace types in tokens object
tokens_segment

segment tokens object by patterns
tokens_select

select or remove tokens from a tokens object
tokens_serialize

Function to serialized list-of-character tokens
dfm_group

combine documents in a dfm by a grouping variable
dfm_lookup

apply a dictionary to a dfm
escape_regex

internal function for select_types() to escape regular expressions
fcm-class

Virtual class "fcm" for a feature co-occurrence matrix
is.dfm

coercion and checking functions for dfm objects
is_regex

internal function for select_types() to check if a string is a regular expression
phrase

declare a compound character to be a sequence of separate pattern matches
predict.textmodel_nb_fitted

prediction method for Naive Bayes classifier objects
summary.character

summary statistics on a character vector
summary.corpus

summarize a corpus
textmodel_wordshoal

wordshoal text model
textplot_keyness

plot word keyness
as.corpus.corpuszip

coerce a compressed corpus to a standard corpus
as.dist.dist

coerce a dist into a dist
as.list.dist

coerce a dist object into a list
data_char_ukimmig2010

immigration-related sections of 2010 UK party manifestos
data_corpus_inaugural

US presidential inaugural address texts
as.dictionary

coercion and checking functions for dictionary objects
coef.textmodel

extract text model coefficients
collocations

deprecated function names for textstat_collocations
create

utility function to create a object with new set of attributes
data-deprecated

datasets with deprecated or defunct names
data_corpus_irishbudget2010

Irish budget speeches from 2010
data_dfm_lbgexample

dfm from data in Table 1 of Laver, Benoit, and Garry (2003)
quanteda_options

get or set package options for quanteda
read_dict_liwc

Import a LIWC-formatted dictionary
fcm

create a feature co-occurrence matrix
fcm_sort

sort an fcm in alphabetical order of the features
head.corpus

return the first or last part of a corpus
head.dfm

return the first or last part of a dfm
nscrabble

count the Scrabble letter values of text
nsentence

count the number of sentences
remove_empty_keys

utility function to remove empty keys
replace_dictionary_values

internal function to replace dictionary values
textstat_keyness

calculate keyness statistics
textstat_lexdiv

calculate lexical diversity
tokens_group

recombine documents tokens by groups
tokens_lookup

apply a dictionary to a tokens object
topfeatures

identify the most frequent features in a dfm
types

get types of tokens from a tokens object
textmodel_wordfish

wordfish text model
textmodel_wordscores

Wordscores text model
corpus_reshape

recast the document units of a corpus
corpus_sample

randomly sample documents from a corpus
convert-wrappers

convenience wrappers for dfm convert
convert

convert a dfm to a non-quanteda format
dfm2lsa

convert a dfm to an lsa "textmatrix"
dfm_compress

recombine a dfm or fcm by combining identical dimension elements
pattern

pattern for feature, token and keyword matching
pattern2id

convert various input as pattern to a vector used in tokens_select, tokens_compound and kwic.
textplot_scale1d

plot a fitted scaling model
textplot_wordcloud

plot features as a wordcloud
tf

compute (weighted) term frequency from a dfm
tfidf

compute tf-idf weights from a dfm
tokens_tolower

convert the case of tokens
tokens_wordstem

stem the terms in an object