Learn R Programming

tm.plugin.webmining (version 1.3)

Retrieve Structured, Textual Data from Various Web Sources

Description

Facilitate text retrieval from feed formats like XML (RSS, ATOM) and JSON. Also direct retrieval from HTML is supported. As most (news) feeds only incorporate small fractions of the original text tm.plugin.webmining even retrieves and extracts the text of the original text source.

Copy Link

Version

Install

install.packages('tm.plugin.webmining')

Monthly Downloads

70

Version

1.3

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Last Published

May 10th, 2015

Functions in tm.plugin.webmining (1.3)

trimWhiteSpaces

Trim White Spaces from Text Document.
NYTimesSource

source.update

Update WebXMLSource/WebHTMLSource/WebJSONSource
YahooFinanceSource

Get feed data from Yahoo! Finance.
removeNonASCII

Remove non-ASCII characters from Text.
WebSource

Read Web Content and respective Link Content from feedurls.
ReutersNewsSource

Get feed data from Reuters News RSS feed channels. Reuters provides numerous feed
nytimes_appid

AppID for the NYtimes-API.
extractContentDOM

Extract Main HTML Content from DOM
YahooInplaySource

Get News from Yahoo Inplay.
readWeb

Read content from WebXMLSource/WebHTMLSource/WebJSONSource.
corpus.update

Update/Extend WebCorpus with new feed items.
tm.plugin.webmining-package

Retrieve structured, textual data from various web sources
encloseHTML

Enclose Text Content in HTML tags
parse

Wrapper/Convenience function to ensure right encoding for different Platforms
feedquery

Buildup string for feedquery.
getEmpty

Retrieve Empty Corpus Elements through $postFUN.
extract

Extract main content from TextDocuments.
GoogleFinanceSource

Get feed Meta Data from Google Finance.
WebCorpus

WebCorpus constructor function.
extractHTMLStrip

Simply strip HTML Tags from Document
yahoonews

WebCorpus retrieved from Yahoo! News for the search term "Microsoft" through the YahooNewsSource. Length of retrieved corpus is 20.
getLinkContent

Get main content for corpus items, specified by links.
YahooNewsSource

GoogleNewsSource