quanteda package - RDocumentation

Learn R Programming

⚠️There's a newer version (4.3.1) of this package.Take me there.

About

An R package for managing and analyzing text, created by Kenneth Benoit in collaboration with a team of core contributors: Kohei Watanabe, Paul Nulty, Adam Obeng, Stefan Müller, Haiyan Wang, Ben Lauderdale, and Will Lowe.
Supported by the European Research Council grant ERC-2011-StG 283794-QUANTESS.

For more details, see http://docs.quanteda.io and the quanteda vignettes.

How to Install

The normal way from CRAN, using your R GUI or

install.packages("quanteda")

Or for the latest development version:

# devtools package required to install quanteda from Github 
devtools::install_github("quanteda/quanteda")

Because this compiles some C++ source code, you will need a compiler installed. If you are using a Windows platform, this means you will need also to install the Rtools software available from CRAN. If you are using macOS, you will need to to install XCode, available for free from the App Store, or if you prefer a lighter footprint set of tools, just the Xcode command line tools, using the command xcode-select --install from the Terminal.

How to Use

See the quick start quide to learn how to use quanteda.

Leaving Feedback

If you like quanteda, please consider leaving feedback or a testimonial here.

Contributing

Contributions in the form of feedback, comments, code, and bug reports are most welcome. How to contribute:

Fork the source code, modify, and issue a pull request through the project GitHub page. See our Contributor Code of Conduct and the all-important quanteda Style Guide.
Issues, bug reports, and wish lists: File a GitHub issue.
Usage questions: Submit a question on the quanteda channel on StackOverflow.
Contact the maintainer by email.

Copy Link

Version

Install

install.packages('quanteda')

Monthly Downloads

17,675

Version

1.1.1

License

GPL-3

Maintainer

Kenneth Benoit

Last Published

March 7th, 2018

Functions in quanteda (1.1.1)

Coerce a dist object into a list

as.matrix.dist_selection

Coerce a dist_selection object to a matrix

coef.textmodel_ca

Extract model coefficients from a fitted textmodel_ca object

convert-wrappers

Convenience wrappers for dfm convert

Extract a subset of a corpus

Remove sentences based on their token lengths or a pattern match

Internal functions for dfm objects

Create a document-feature matrix

Convert the case of the features of a dfm and combine

Trim a dfm using frequency threshold-based feature selection

Create a dictionary

Compute the (weighted) document frequency of a feature

generate_groups

Generate a grouping vector from docvars

Grouping variable(s) for various functions

Count syllables in a text

Count the number of tokens or types

Declare a compound character to be a sequence of separate pattern matches

predict.textmodel_affinity

Prediction for a fitted affinity textmodel

Import a LIWC-formatted dictionary

View methods for quanteda

Objects exported from other packages

textmodel_affinity-internal

Internal methods for textmodel_affinity

Internal function to fit the likelihood scaling mixture model.

textmodel_affinity

Class affinity maximum likelihood text scaling model

as.coefficients_textmodel

Coerce various objects to coefficients_textmodel This is a helper function used in summary.textmodel_*.

Convert quanteda dictionary objects to the YAML format

coerce a compressed corpus to a standard corpus

Function extending base::attributes()

Latent Semantic Analysis

Coercion and checking functions for dictionary objects

Construct a corpus object

as.corpus.corpuszip

Coerce a compressed corpus to a standard corpus

Combine dfm objects by Rows or Columns

Bootstrap a dfm

Naive Bayes classifier for texts

Recast the document units of a corpus

data-deprecated

Datasets with deprecated or defunct names

Coercion and checking functions for dfm objects

Coerce a dist into a dist

textstat_collocations

Identify and score multi-word expressions

Randomly sample documents from a corpus

as.matrix.simil

Coerce a simil object into a matrix

Segment texts on a pattern match

textstat_frequency

Tabulate feature frequencies

Sort a dfm by frequency of one or more margins

data_char_sampletext

A paragraph of text for testing various text-based functions

data_char_ukimmig2010

Immigration-related sections of 2010 UK party manifestos

as.summary.textmodel

Assign the summary.textmodel class to a list

Get the feature labels from a dfm

Similarity and distance computation between documents or features

Select features from a dfm or fcm

Internal data sets

Convert the case of character objects

Virtual class "dfm" for a document-feature matrix

data_dictionary_LSD2015

Lexicoder Sentiment Dictionary (2015)

as.statistics_textmodel

Coerce various objects to statistics_textmodel This is a helper function used in summary.textmodel_*.

Check if font is available on the system

deprecated name for dfm_weight

Convert the case of tokens

Coercion, checking, and combining functions for tokens objects

tokens_wordstem

Stem the terms in an object

Convert a dfm to a non-quanteda format

Combine documents in a dfm by a grouping variable

data_corpus_irishbudget2010

Irish budget speeches from 2010

data_dfm_lbgexample

dfm from data in Table 1 of Laver, Benoit, and Garry (2003)

Extract a subset of a dfm

Utility function to create a object with new set of attributes

corpus_trimsentences

Remove sentences based on their token lengths or a pattern match

Base method extensions for corpus objects

Apply a dictionary to a dfm

Replace features in dfm

Weight a dfm by tf-idf

Get or set document names

Get or set document-level meta-data

Get or set corpus metadata

Get or set document-level variables

Return the first or last part of a dfm

Return the first or last part of a corpus

friendly_class_undefined_message

Print friendly object class not defined message

data_corpus_dailnoconf1991

Confidence debate from 1991 Irish Parliament

Randomly sample documents or features from a dfm

Print a dfm object

print.dist_selection

Print a dist_selection object

list2dictionary

Internal function to convert a list to a dictionary

merge_dictionary_values

Internal function to merge values of duplicated keys

print.summary.textmodel

print method for summary.textmodel

print.textmodel_wordfish

print method for a wordfish model

Count the Scrabble letter values of text

Count the number of sentences

summary.character

Summary statistics on a character vector

textplot_keyness

Plot word keyness

predict.textmodel_nb

Prediction from a fitted textmodel_nb object

predict.textmodel_wordfish

Prediction from a textmodel_wordfish method

summary.textmodel_nb

summary method for textmodel_nb objects

Deprecated name for nscrabble

replace_dictionary_values

Internal function to replace dictionary values

summary.textmodel_wordfish

summary method for textmodel_wordfish

textplot_network

Plot a network of feature co-occurrences

textplot_scale1d

Plot a fitted scaling model

Summarize a corpus

Virtual class "fcm" for a feature co-occurrence matrix

Internal function for select_types() to escape regular expressions

textmodel_wordshoal

Wordshoal text model (redirect)

textplot_wordcloud

Plot features as a wordcloud

Compute keyness (internal functions)

Locate keywords-in-context

Count the number of documents or features

Tokenize a set of texts

quanteda-package

An R package for the quantitative analysis of textual data

Deprecated form of dfm_tfidf

predict.textmodel_wordscores

Predict textmodel_wordscores

nest_dictionary

Utility function to generate a nested list

print.coefficients_textmodel

Print methods for textmodel features estimates This is a helper function used in print.summary.textmodel.

data_corpus_inaugural

US presidential inaugural address texts

Convert a dfm to an lsa "textmatrix"

textplot_influence

Influence plot for text scaling models

Recombine a dfm or fcm by combining identical dimension elements

Plot the dispersion of key word(s)

Get or assign corpus texts

Identify the most frequent features in a dfm

quanteda_options

Get or set package options for quanteda

Get or set the corpus settings

Get word types from a tokens object

textmodel_wordfish

Wordfish text model

Weight the feature frequencies in a dfm

Function to assign multiple slots to a S4 object

tokens_recompile

recompile a serialized tokens object

Replace types in tokens object

dictionary2-class

Print a dictionary object

Create a feature co-occurrence matrix

textmodel_wordscores

Wordscores text model

textstat_keyness

Calculate keyness statistics

Segment tokens object by patterns

Sort an fcm in alphabetical order of the features

textstat_lexdiv

Calculate lexical diversity

influence.predict.textmodel_affinity

Compute feature influence from a predicted textmodel_affinity object

Select or remove tokens from a tokens object

Apply a dictionary to a tokens object

wordcloud_comparison

Internal function for textplot_wordcloud

Internal function for select_types() to check if a string is a regular expression

Create ngrams and skipgrams from tokens

Pattern matching using valuetype

Internal function for textplot_wordcloud

Pattern for feature, token and keyword matching

Convert various input as pattern to a vector used in tokens_select, tokens_compound and kwic.

Print a phrase object

print.statistics_textmodel

Implements print methods for textmodel_statistics

Convert regex and glob patterns to type IDs or fixed patterns

remove_empty_keys

Utility function to remove empty keys

Extensions for and from spacy_parse objects

Compute the sparsity of a document-feature matrix

Correspondence analysis of a document-feature matrix

textmodel_lsa-postestimation

Post-estimations methods for textmodel_lsa

textstat_readability

Calculate readability

textstat_select

Select rows of textstat objects by glob, regex or fixed patterns

tokens_compound

Convert token sequences into compound tokens

Recombine documents tokens by groups

tokens_serialize

Function to serialized list-of-character tokens

Extract a subset of a tokens

as.list.dist_selection

Coerce a dist_selection object into a list

Coerce a dfm to a matrix or data.frame