Learn R Programming

⚠️There's a newer version (1.2.8) of this package.Take me there.

RNewsflow: Tools for analyzing content homogeneity and news diffusion using computational text analysis

Given the sheer amount of news sources in the digital age (e.g., newspapers, blogs, social media) it has become difficult to determine where news is first introduced and how it diffuses across sources. RNewsflow provides tools for analyzing content homogeneity and diffusion patterns using computational text analysis. The content of news messages is compared using techniques from the field of information retrieval, similar to plagiarism detection. By using a sliding window approach to only compare messages within a given time distance, many sources can be compared over long periods of time. Furthermore, the package introduces an approach for analyzing the news similarity data as a network, and includes various functions to analyze and visualize this network.

Installation

You can install the development version of RNewsflow directly from github:

library(devtools)
install_github("kasperwelbers/RNewsflow")

Vignette

The vignette containing a step-by-step tutorial for using RNewsflow can be called from within R.

library(RNewsflow)
vignette('RNewsflow')

Copy Link

Version

Install

install.packages('RNewsflow')

Monthly Downloads

702

Version

1.2.6

License

GPL-3

Maintainer

Last Published

April 7th, 2021

Functions in RNewsflow (1.2.6)

create_queries

Automatically infer queries from combinations of terms in a dtm
delete_duplicates

Delete duplicate (or similar) documents from a document term matrix
create_document_network

Create a document similarity network
delete.duplicates

Delete duplicate (or similar) documents from a document term matrix
compare_documents

Compare the documents in a dtm
as_document_network

Create a document similarity network
document_network_plot

Visualize (a subcomponent) of the document similarity network
document.network.plot

Visualize (a subcomponent) of the document similarity network
network.aggregate

Aggregate the edges of a network by vertex attributes
newsflow.compare

Compare the documents in a dtm with a sliding window over time
network_aggregate

Aggregate the edges of a network by vertex attributes
newsflow_compare

Create a network of document similarities over time
hourdiff_range_thresholds

Inspect effects of thresholds on matches over time
get_overlap_terms

View overlapping terms for a given pair of documents
docnet

Document similarity network for one news agency, and the print and online editions of two newspapers
document.network

Create a document similarity network
documents.compare

Compare the documents in two corpora/dtms
filter.window

Filter edges from the document similarity network based on hour difference
show.window

Show time window of document pairs
rnewsflow_dfm

quanteda dfm for RNewsflow vignette demo
directed.network.plot

A wrapper for plot.igraph for visualizing directed networks.
tcrossprod_sparse

tcrossprod with benefits, for people that like parameters
show_window

Show time window of document pairs
term_union

Combine terms in a dtm
term_intersect

Combine terms in a dtm
only.first.match

Transform document network so that each document only matches the earliest dated matching document
directed_network_plot

A wrapper for plot.igraph for visualizing directed networks.
only_first_match

Transform document network so that each document only matches the earliest dated matching document
get_doc_terms

View term scores for a given document
filter_window

Filter edges from the document similarity network based on hour difference
term_day_dist

Calculate statistics for term occurence across days
term_innovation

Experimental: Convert dtm scores to a term innovation score, based on changes in term use over time
term_char_sim

Find terms with similar spelling
term.day.dist

Calculate statistics for term occurence across days