quanteda package - RDocumentation

Learn R Programming

⚠️There's a newer version (4.3.1) of this package.Take me there.

About

An R package for managing and analyzing text, created by Kenneth Benoit. Supported by the European Research Council grant ERC-2011-StG 283794-QUANTESS.

For more details, see https://quanteda.io.

How to Install

The normal way from CRAN, using your R GUI or

install.packages("quanteda")

Or for the latest development version:

# devtools package required to install quanteda from Github 
devtools::install_github("quanteda/quanteda")

Because this compiles some C++ and Fortran source code, you will need to have installed the appropriate compilers.

If you are using a Windows platform, this means you will need also to install the Rtools software available from CRAN.

If you are using macOS, you should install the macOS tools, namely the Clang 6.x compiler and the GNU Fortran compiler (as quanteda requires gfortran to build). If you are still getting errors related to gfortran, follow the fixes here.

How to Use

See the quick start guide to learn how to use quanteda.

How to cite

Benoit, Kenneth, Kohei Watanabe, Haiyan Wang, Paul Nulty, Adam Obeng, Stefan Müller, and Akitaka Matsuo. (2018) “quanteda: An R package for the quantitative analysis of textual data”. Journal of Open Source Software. 3(30), 774. https://doi.org/10.21105/joss.00774.

For a BibTeX entry, use the output from citation(package = "quanteda").

Leaving Feedback

If you like quanteda, please consider leaving feedback or a testimonial here.

Contributing

Contributions in the form of feedback, comments, code, and bug reports are most welcome. How to contribute:

Fork the source code, modify, and issue a pull request through the project GitHub page. See our Contributor Code of Conduct and the all-important quanteda Style Guide.
Issues, bug reports, and wish lists: File a GitHub issue.
Usage questions: Submit a question on the quanteda channel on StackOverflow.
Contact the maintainer by email.

Copy Link

Version

Install

install.packages('quanteda')

Monthly Downloads

18,936

Version

1.5.1

License

GPL-3

Maintainer

Kenneth Benoit

Last Published

July 30th, 2019

Functions in quanteda (1.5.1)

Randomly sample documents from a corpus

data_corpus_irishbudget2010

Irish budget speeches from 2010

Internal functions for dfm objects

data_dfm_lbgexample

dfm from data in Table 1 of Laver, Benoit, and Garry (2003)

Segment texts on a pattern match

Create a document-feature matrix

Coercion and checking functions for dictionary objects

View methods for quanteda

as.list.dist_selection

Coerce a dist_selection object into a list

Construct a corpus object

Check if font is available on the system

coef.textmodel_ca

Extract model coefficients from a fitted textmodel_ca object

Recast the document units of a corpus

Coerce a dfm to a matrix or data.frame

Internal function to fit the likelihood scaling mixture model.

data_char_sampletext

A paragraph of text for testing various text-based functions

data_char_ukimmig2010

Immigration-related sections of 2010 UK party manifestos

Coercion functions for fcm objects

Select features from a dfm or fcm

Randomly sample documents or features from a dfm

Compute the (weighted) document frequency of a feature

Coercion, checking, and combining functions for tokens objects

Match the feature set of a dfm to given feature names

Convert quanteda dictionary objects to the YAML format

Coerce a dist object into a list

Replace features in dfm

Simpler and faster version of expand.grid() in base package

Get or set document names

head.textstat_proxy

Return the first or last part of a textstat_proxy object

as.statistics_textmodel

Coerce various objects to statistics_textmodel

Virtual class "fcm" for a feature co-occurrence matrix The fcm class of object is a special type of fcm object with additional slots, described below.

as.matrix.dist_selection

Coerce a dist_selection object to a matrix

Count the number of tokens or types

predict.textmodel_wordfish

Prediction from a textmodel_wordfish method

predict.textmodel_wordscores

Predict textmodel_wordscores

Count syllables in a text

remove_empty_keys

Utility function to remove empty keys

Objects exported from other packages

coerce a compressed corpus to a standard corpus

as.coefficients_textmodel

Coerce various objects to coefficients_textmodel This is a helper function used in summary.textmodel_*.

Function extending base::attributes()

influence.predict.textmodel_affinity

Compute feature influence from a predicted textmodel_affinity object

data-deprecated

Datasets with deprecated or defunct names

data_dictionary_LSD2015

Lexicoder Sentiment Dictionary (2015)

Convert a dfm to a non-quanteda format

Weight the feature frequencies in a dfm

Base method extensions for corpus objects

Bootstrap a dfm

Internal data sets

Virtual class "dfm" for a document-feature matrix

Internal function for select_types to search the index using fastmatch.

as.matrix.simil

Coerce a simil object into a matrix

Convert the case of character objects

Combine dfm objects by Rows or Columns

Compute the Mean Segmental Type-Token Ratio (MSTTR)

convert-wrappers

Convenience wrappers for dfm convert

Get or set document-level variables

Return the first or last part of a corpus

convert same-value pairs to NA in a textstat_proxy object

Internal function for select_types() to escape regular expressions

Check if patterns contains glob wildcard

data_corpus_dailnoconf1991

Confidence debate from 1991 Irish Parliament

as.summary.textmodel

Assign the summary.textmodel class to a list

Combine documents in a dfm by a grouping variable

Remove sentences based on their token lengths or a pattern match

Extract a subset of a corpus

Return the first or last part of a dfm

Recombine a dfm or fcm by combining identical dimension elements

corpus_trimsentences

Remove sentences based on their token lengths or a pattern match

Convert a dfm to an lsa "textmatrix"

Utility function to create a object with new set of attributes

Apply a dictionary to a dfm

Extract a subset of a dfm

Check if a glob pattern is indexed by index_types

data_corpus_inaugural

US presidential inaugural address texts

Convert the case of the features of a dfm and combine

Sort a dfm by frequency of one or more margins

set_dfm_dimnames<-

Internal functions to set dimnames

tokens_compound

Convert token sequences into compound tokens

summary.character

summary.character method to override the network::summary.character()

textmodel_wordscores

Wordscores text model

textmodel_wordfish

Wordfish text model

Internal function for special handling of multi-word dictionary values

Segment tokens object by chunks of a given size

Locate keywords-in-context

list2dictionary

Internal function to convert a list to a dictionary

Count the Scrabble letter values of text

Weight a dfm by tf-idf

Count the number of sentences

friendly_class_undefined_message

Print friendly object class not defined message

format_sparsity

format a sparsity value for printing

dfm_split_hyphenated_features

Split a dfm's hyphenated features into constituent parts

Create a dictionary

dictionary2-class

Coerce a dictionary object into a list

unlist_character

Unlist a list of character vectors safely

Unlist a list of integer vectors safely

Recombine documents tokens by groups

Apply a dictionary to a tokens object

print.coefficients_textmodel

Print methods for textmodel features estimates This is a helper function used in print.summary.textmodel.

Create a feature co-occurrence matrix

Trim a dfm using frequency threshold-based feature selection

Sort an fcm in alphabetical order of the features

flatten_dictionary

Flatten a hierarchical dictionary into a list of character vectors

Get the feature labels from a dfm

merge_dictionary_values

Internal function to merge values of duplicated keys

Converts a Matrix to a fcm

Print a dfm object

quanteda_options

Get or set package options for quanteda

read_dict_functions

Internal functions to import dictionary files

generate_groups

Generate a grouping vector from docvars

Grouping variable(s) for various functions

predict.textmodel_nb

Prediction from a fitted textmodel_nb object

Return an error message

predict.textmodel_affinity

Prediction for a fitted affinity textmodel

Get or set corpus metadata

Internal function for select_types() to check if a string is a regular expression

quanteda-package

An R package for the quantitative analysis of textual data

summary.textmodel_wordfish

summary method for textmodel_wordfish

print.textmodel_wordfish

print method for a wordfish model

Converts a Matrix to a dfm

Convert various input as pattern to a vector used in tokens_select, tokens_compound and kwic.

lowercase_dictionary_values

Internal function to lowercase dictionary values

Defunct form of nfeat

Compute keyness (internal functions)

nest_dictionary

Utility function to generate a nested list

Set values to a dfm's S4 slots

summary_character

Summary statistics on a character vector

print.statistics_textmodel

Implements print methods for textmodel_statistics

Declare a compound character to be a sequence of separate pattern matches

textmodel_wordshoal

Wordshoal text model (redirect)

Sample a vector by a group

replace_dictionary_values

Internal function to replace dictionary values

print.summary.textmodel

print method for summary.textmodel

Summarize a corpus

summary.textmodel_nb

summary method for textmodel_nb objects

textstat_frequency

Tabulate feature frequencies

textplot_influence

Influence plot for text scaling models

Latent Semantic Analysis

Extensions for and from spacy_parse objects

Set values to a fcm's S4 slots

Naive Bayes classifier for texts

textstat_collocations

Identify and score multi-word expressions

Replace tokens in a tokens object

textstat_keyness

Calculate keyness statistics

Randomly sample documents from a tokens object

Identify the most frequent features in a dfm

Compute the sparsity of a document-feature matrix

textstat_entropy

Compute entropy of documents or features

Count the number of documents or features

Get or set document-level meta-data

textplot_keyness

Plot word keyness

Pattern for feature, token and keyword matching

Convert regex and glob patterns to type IDs or fixed patterns

Get word types from a tokens object

textstat_select

Select rows of textstat objects by glob, regex or fixed patterns

Print a phrase object

print.dist_selection

Print a dist_selection object

Similarity and distance computation between documents or features

Deprecated name for nscrabble

Get or set the corpus settings

Function to assign multiple slots to a S4 object

Select types without performing slow regex search

textmodel_affinity-internal

Internal methods for textmodel_affinity

textmodel_affinity

Class affinity maximum likelihood text scaling model

textplot_network

Plot a network of feature co-occurrences

Plot the dispersion of key word(s)

[Experimental] Compute document/feature proximity

textstat_readability

Calculate readability

Create ngrams and skipgrams from tokens

tokens_recompile

recompile a serialized tokens object

Get or assign corpus texts

tokens_serialize

Function to serialized list-of-character tokens

Raise warning of unused dots

Split tokens by a separator pattern

Pattern matching using valuetype

textstat_lexdiv

Calculate lexical diversity

wordcloud_comparison

Internal function for textplot_wordcloud

Internal function for textplot_wordcloud

textstat_proxy-class

textstat_simil/dist classes

textstat_dist_old

Similarity and distance computation between documents or features

Correspondence analysis of a document-feature matrix

deprecated name for dfm_weight

Segment tokens object by patterns

textmodel_lsa-postestimation

Post-estimations methods for textmodel_lsa

textplot_scale1d

Plot a fitted scaling model

Deprecated form of dfm_tfidf

Tokenize a set of texts

textplot_wordcloud

Plot features as a wordcloud

tokens_wordstem

Stem the terms in an object

[Experimental] Change direction of words in tokens

Select or remove tokens from a tokens object

Extract a subset of a tokens

Convert the case of tokens

as.corpus.corpuszip

Coerce a compressed corpus to a standard corpus

Coercion and checking functions for dfm objects

as.data.frame.dfm

Convert a dfm to a data.frame

Convert an fcm to an igraph object

compute_lexdiv_stats

Compute lexical diversity from a dfm or tokens

Compute the Moving-Average Type-Token Ratio (MATTR)

as.matrix,textstat_simil_sparse-method

as.matrix method for textstat_simil_sparse

redefinition of network::as.network()

Coerce a dist into a dist