Learn R Programming

⚠️There's a newer version (0.7-14) of this package.Take me there.

tm (version 0.5-10)

Text Mining Package

Description

A framework for text mining applications within R.

Copy Link

Version

Install

install.packages('tm')

Monthly Downloads

53,395

Version

0.5-10

License

GPL-3

Maintainer

Last Published

January 13th, 2014

Functions in tm (0.5-10)

findAssocs

Find Associations in a Term-Document Matrix
XMLSource

XML Source
weightTf

Weight by Term Frequency
dissimilarity

Dissimilarity
PlainTextDocument

Plain Text Document
TextDocument

Access and Modify Text Documents
stemDocument

Stem Words
FunctionGenerator

Function Generator
plot

Visualize a Term-Document Matrix
removeSparseTerms

Remove Sparse Terms from a Term-Document Matrix
getReaders

List Available Readers
tm_filter

Filter and Index Functions on Corpora
Zipf_n_Heaps

Explore Corpus Term Frequency Characteristics
acq

50 Exemplary News Articles from the Reuters-21578 XML Data Set of Topic acq
removeWords

Remove Words from a Text Document
WeightFunction

Weighting Function
tm_term_score

Compute Score for Matching Terms
URISource

Uniform Resource Identifier Source
tm_map

Transformations on Corpora
readPDF

Read In a PDF Document
DirSource

Directory Source
findFreqTerms

Find Frequent Terms
VectorSource

Vector Source
readReut21578XML

Read In a Reuters-21578 XML Document
weightTfIdf

Weight by Term Frequency - Inverse Document Frequency
readRCV1

Read In a Reuters Corpus Volume 1 Document
readXML

Read In an XML Document
removeNumbers

Remove Numbers from a Text Document
weightBin

Weight Binary
getSources

List Available Sources
readDOC

Read In a MS Word Document
inspect

Inspect Objects
readPlain

Read In a Text Document
VCorpus

Volatile Corpus
prescindMeta

Prescind Document Meta Data
as.PlainTextDocument

Create Objects of Class PlainTextDocument
removePunctuation

Remove Punctuation Marks from a Text Document
number

The Number of Rows/Columns/Dimensions/Documents/Terms of a Term-Document Matrix
tokenizer

Tokenizers
writeCorpus

Write a Corpus to Disk
weightSMART

SMART Weightings
termFreq

Term Frequency Vector
readTabular

Read In a Text Document
makeChunks

Split a Corpus into Chunks
foreign

Read Document-Term Matrices
DataframeSource

Data Frame Source
Source

Create and Access Sources
names

Row, Column, Dim Names, Document IDs, and Terms
tm_combine

Combine Corpora, Documents, Term-Document Matrices, and Term Frequency Vectors
PCorpus

Permanent Corpus Constructor
getTokenizers

List Available Tokenizers
stripWhitespace

Strip Whitespace from a Text Document
getTransformations

List Available Transformations
TermDocumentMatrix

Term-Document Matrix
meta

Meta Data Management
sFilter

Statement Filter
stopwords

Stopwords
crude

20 Exemplary News Articles from the Reuters-21578 XML Data Set of Topic crude
Reuters21578Document

Reuters-21578 Text Document
ReutersSource

Reuters-21578 XML Source
tm_reduce

Combine Transformations
RCV1Document

RCV1 Text Document
TextRepository

Text Repository
materialize

Materialize Lazy Mappings
stemCompletion

Complete Stems